Elliott Evans


2024

pdf bib
Introducing a Parsed Corpus of Historical High German
Christopher D. Sapp | Elliott Evans | Rex Sprouse | Daniel Dakota
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We outline the ongoing development of the Indiana Parsed Corpus of (Historical) High German. Once completed, this corpus will fill the gap in Penn-style treebanks for Germanic languages by spanning High German from 1050 to 1950. This paper describes the process of building the corpus: selection of texts, decisions on part-of-speech tags and other labels, the process of annotation, and illustrative annotation issues unique to historical High German. The construction of the corpus has led to a refinement of the Penn labels, tailored to the particulars of this language.

2023

pdf bib
Parsing Early New High German: Benefits and limitations of cross-dialectal training
Christopher Sapp | Daniel Dakota | Elliott Evans
Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)

Historical treebanking within the generative framework has gained in popularity. However, there are still many languages and historical periods yet to be represented. For German, a constituency treebank exists for historical Low German, but not Early New High German. We begin to fill this gap by presenting our initial work on the Parsed Corpus of Early New High German (PCENHG). We present the methodological considerations and workflow for the treebank’s annotations and development. Given the limited amount of currently available PCENHG treebank data, we treat it as a low-resource language and leverage a larger, closely related variety—Middle Low German—to build a parser to help facilitate faster post-annotation correction. We present an analysis on annotation speeds and conclude with a small pilot use-case, highlighting potential for future linguistic analyses. In doing so we highlight the value of the treebank’s development for historical linguistic analysis and demonstrate the benefits and challenges of developing a parser using two closely related historical Germanic varieties.