Agettivu, Aggitivu o Aghjettivu? POS Tagging Corsican Dialects

Alice Millour; Lorenza Brasile; Alberto Ghia; Laurent Kevers

Agettivu, Aggitivu o Aghjettivu? POS Tagging Corsican Dialects

Alice Millour, Lorenza Brasile, Alberto Ghia, Laurent Kevers

Abstract

In this paper we present a series of experiments towards POS tagging Corsican, a less-resourced language spoken in Corsica and linguistically related to Italian. The first contribution is Corsican-POS, the first gold standard POS-tagged corpus for Corsica, composed of 500 sentences manually annotated with the Universal POS tagset. Our second contribution is a set of experiments and evaluation of POS tagging models which starts with a baseline model for Italian and is aimed at finding the best training configuration, namely in terms of the size and combination strategy of the existing raw and annotated resources. These experiments result in (i) the first POS tagger for Corsican, reaching an accuracy of 93.38%, (ii) a quantification of the gain provided by the use of each available resource. We find that the optimal configuration uses Italian word embeddings further specialized with Corsican embeddings and trained on the largest gold corpus for Corsican available so far.

Anthology ID:: 2024.lrec-main.52
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 600–608
Language:
URL:: https://aclanthology.org/2024.lrec-main.52
DOI:
Bibkey:
Cite (ACL):: Alice Millour, Lorenza Brasile, Alberto Ghia, and Laurent Kevers. 2024. Agettivu, Aggitivu o Aghjettivu? POS Tagging Corsican Dialects. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 600–608, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Agettivu, Aggitivu o Aghjettivu? POS Tagging Corsican Dialects (Millour et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.52.pdf

PDF Cite Search