Erko Jakobson


2024

pdf bib
Leveraging Domain Corpora for Enhanced Terminology: The Case of Estonian-English Remote Sensing Termbase
Liisi Jakobson | Jelena Kallas | Erko Jakobson
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This article addresses methodological issues related to developing domain corpora and a terminological database from scratch. We present an ongoing project focused on creating an Estonian-English Remote Sensing Termbase. First, we describe the compilation process of the Estonian Remote Sensing Corpus 2022 , which served as the primary data source for the termbase. The corpus was compiled by crawling the web and adding files using the Corpus Query System Sketch Engine (Kilgarriff et al., 2004). In the next step, we employed the Term Extraction module (Kilgarriff et al., 2014; Fišer et al., 2016; Blahuš et al., 2023) to identify terms, which were subsequently registered in the Estonian Remote Sensing Termbase using the Dictionary Writing System Ekilex (Tavast et al., 2018). For each term, we provided definitions, variants, and usage contexts. In the final stage, remote sensing experts reviewed and edited the terms, their variants, and usage contexts. Finally, we provide insights and outline directions for future work in this area.