Disambiguating Homographs and Homophones Simultaneously: A Regrouping Method for Japanese

Yo Sato


Abstract
We present a method that re-groups surface forms into clusters representing synonyms, and help disambiguate homographs as well as homophone. The method is applied post-hoc to trained contextual word embeddings. It is beneficial to languages where both homographs and homophones abound, which compromise the efficiency of language model and causes the underestimation problem in evaluation. Taking Japanese as an example, we evaluate how accurate such disambiguation can be, and how much the underestimation can be mitigated.
Anthology ID:
2024.lrec-main.442
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
4935–4939
Language:
URL:
https://aclanthology.org/2024.lrec-main.442
DOI:
Bibkey:
Cite (ACL):
Yo Sato. 2024. Disambiguating Homographs and Homophones Simultaneously: A Regrouping Method for Japanese. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4935–4939, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Disambiguating Homographs and Homophones Simultaneously: A Regrouping Method for Japanese (Sato, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.442.pdf