A Multilingual Parallel Corpus for Aromanian

Iulia Petrariu, Sergiu Nisioi


Abstract
We report the creation of the first high-quality corpus of Aromanian - an endangered Romance language spoken in the Balkans - and the equivalent sentence-aligned translations into Romanian, English, and French. The corpus is released publicly using several orthographic standards and consists in short stories collected in the ‘70s in Romania. Additionally, we provide an corpus-based analysis of Aromanian linguistic particularities and the overall demographic and political context which impacts the contemporary development of the language.
Anthology ID:
2024.lrec-main.75
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
832–838
Language:
URL:
https://aclanthology.org/2024.lrec-main.75
DOI:
Bibkey:
Cite (ACL):
Iulia Petrariu and Sergiu Nisioi. 2024. A Multilingual Parallel Corpus for Aromanian. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 832–838, Torino, Italia. ELRA and ICCL.
Cite (Informal):
A Multilingual Parallel Corpus for Aromanian (Petrariu & Nisioi, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.75.pdf