Reading Does Not Equal Reading: Comparing, Simulating and Exploiting Reading Behavior across Populations

David R. Reich, Shuwen Deng, Marina Björnsdóttir, Lena Jäger, Nora Hollenstein


Abstract
Eye-tracking-while-reading corpora play a crucial role in the study of human language processing, and, more recently, have been leveraged for cognitively enhancing neural language models. A critical limitation of existing corpora is that they often lack diversity, comprising primarily native speakers. In this study, we expand the eye-tracking-while-reading dataset CopCo, which initially included only Danish L1 readers with and without dyslexia, by incorporating a new dataset of L2 readers with diverse L1 backgrounds. Thus, the extended CopCo corpus constitutes the first eye-tracking-while-reading dataset encompassing neurotypical L1 and L1 readers with dyslexia as well as L2 readers, all reading the same materials. We first provide extensive descriptive statistics of the extended CopCo corpus. Second, we investigate how different degrees of diversity of the training data affect a state-of-the-art generative model of eye movements in reading. Finally, we use this scanpath generation model for gaze-augmented language modeling and investigate the impact of diversity in the training data on the model’s performance on a range of NLP downstream tasks. The code can be found here: https://github.com/norahollenstein/copco-processing.
Anthology ID:
2024.lrec-main.1187
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
13586–13594
Language:
URL:
https://aclanthology.org/2024.lrec-main.1187
DOI:
Bibkey:
Cite (ACL):
David R. Reich, Shuwen Deng, Marina Björnsdóttir, Lena Jäger, and Nora Hollenstein. 2024. Reading Does Not Equal Reading: Comparing, Simulating and Exploiting Reading Behavior across Populations. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13586–13594, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Reading Does Not Equal Reading: Comparing, Simulating and Exploiting Reading Behavior across Populations (Reich et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1187.pdf