Article published in:Multilingual Corpora and Multilingual Corpus Analysis
Edited by Thomas Schmidt and Kai Wörner
[Hamburg Studies on Multilingualism 14] 2012
► pp. 339–346
A distributed corpus of German varieties
The Korpus C4 is designed as a reference corpus, balanced over text genres and documenting four standard varieties of the written German language throughout the 20th century, for uses such as variant lexicography, intralingual contrastive corpus linguistic studies, language teaching and learning, or investigations on the possible influence of contact languages and dialects. Implemented as a distributed corpus, the Korpus C4 comprises data from the following sources: the DWDS core corpus for the variety of German used in Germany, the Swiss Text Corpus, the Austrian Academy Corpus, and the Korpus Südtirol for the variety used in South Tyrol, Italy. These sub-corpora share a common structure, metadata set, data format, and indexing solution and can be simultaneously accessed via a single interface.
Published online: 15 November 2012