Christopher D. Sapp


2024

pdf bib
Introducing a Parsed Corpus of Historical High German
Christopher D. Sapp | Elliott Evans | Rex Sprouse | Daniel Dakota
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We outline the ongoing development of the Indiana Parsed Corpus of (Historical) High German. Once completed, this corpus will fill the gap in Penn-style treebanks for Germanic languages by spanning High German from 1050 to 1950. This paper describes the process of building the corpus: selection of texts, decisions on part-of-speech tags and other labels, the process of annotation, and illustrative annotation issues unique to historical High German. The construction of the corpus has led to a refinement of the Penn labels, tailored to the particulars of this language.