A Japanese News Simplification Corpus with Faithfulness

Toru Urakawa, Yuya Taguchi, Takuro Niitsuma, Hideaki Tamori


Abstract
Text Simplification enhances the readability of texts for specific audiences. However, automated models may introduce unwanted content or omit essential details, necessitating a focus on maintaining faithfulness to the original input. Furthermore, existing simplified corpora contain instances of low faithfulness. Motivated by this issue, we present a new Japanese simplification corpus designed to prioritize faithfulness. Our collection comprises 7,075 paired sentences simplified from newspaper articles. This process involved collaboration with language education experts who followed guidelines balancing readability and faithfulness. Through corpus analysis, we confirmed that our dataset preserves the content of the original text, including personal names, dates, and city names. Manual evaluation showed that our corpus robustly maintains faithfulness to the original text, surpassing other existing corpora. Furthermore, evaluation by non-native readers confirmed its readability to the target audience. Through the experiment of fine-tuning and in-context learning, we demonstrated that our corpus enhances faithful sentence simplification.
Anthology ID:
2024.lrec-main.57
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
659–665
Language:
URL:
https://aclanthology.org/2024.lrec-main.57
DOI:
Bibkey:
Cite (ACL):
Toru Urakawa, Yuya Taguchi, Takuro Niitsuma, and Hideaki Tamori. 2024. A Japanese News Simplification Corpus with Faithfulness. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 659–665, Torino, Italia. ELRA and ICCL.
Cite (Informal):
A Japanese News Simplification Corpus with Faithfulness (Urakawa et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.57.pdf