Article published in:Multilingual Corpora and Multilingual Corpus Analysis
Edited by Thomas Schmidt and Kai Wörner
[Hamburg Studies on Multilingualism 14] 2012
► pp. 71–96
The ALeSKo learner corpus
Design – annotation – quantitative analyses
The ALesKo learner corpus is a small-scale comparable corpus consisting of two subcorpora: annotated essays by advanced Chinese learners of German and comparable essays by German native speakers. The motivation for its compilation was the investigation of discourse-related phenomena such as local coherence in second-language acquisition of German. After introducing how the texts were compiled and annotated, the article focuses on quantitative studies at the token level. We discuss problems of tokenisation and part-of-speech tagging and compare the inventory of the two subcorpora in terms of frequently used N-grams and lexical richness, among other aspects. We conclude the article by describing possible applications of the study in foreign language acquisition research and language teaching.
Published online: 15 November 2012
Cited by other publications
This list is based on CrossRef data as of 16 december 2020. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.