SilverAlign: MT-Based Silver Data Algorithm for Evaluating Word Alignment

Abdullatif Koksal, Silvia Severini, Hinrich Schütze


Abstract
Word alignments are essential for a variety of NLP tasks. Therefore, choosing the best approaches for their creation is crucial. However, the scarce availability of gold evaluation data makes the choice difficult. We propose SilverAlign, a new method to automatically create silver data for the evaluation of word aligners by exploiting machine translation and minimal pairs. We show that performance on our silver data correlates well with gold benchmarks for 9 language pairs, making our approach a valid resource for evaluation of different languages and domains when gold data is not available. This addresses the important scenario of missing gold data alignments for low-resource languages.
Anthology ID:
2024.lrec-main.1290
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
14812–14825
Language:
URL:
https://aclanthology.org/2024.lrec-main.1290
DOI:
Bibkey:
Cite (ACL):
Abdullatif Koksal, Silvia Severini, and Hinrich Schütze. 2024. SilverAlign: MT-Based Silver Data Algorithm for Evaluating Word Alignment. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14812–14825, Torino, Italia. ELRA and ICCL.
Cite (Informal):
SilverAlign: MT-Based Silver Data Algorithm for Evaluating Word Alignment (Koksal et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1290.pdf