SciNews: From Scholarly Complexities to Public Narratives – a Dataset for Scientific News Report Generation

Dongqi Pu, Yifan Wang, Jia E. Loy, Vera Demberg


Abstract
Scientific news reports serve as a bridge, adeptly translating complex research articles into reports that resonate with the broader public. The automated generation of such narratives enhances the accessibility of scholarly insights. In this paper, we present a new corpus to facilitate this paradigm development. Our corpus comprises a parallel compilation of academic publications and their corresponding scientific news reports across nine disciplines. To demonstrate the utility and reliability of our dataset, we conduct an extensive analysis, highlighting the divergences in readability and brevity between scientific news narratives and academic manuscripts. We benchmark our dataset employing state-of-the-art text generation models. The evaluation process involves both automatic and human evaluation, which lays the groundwork for future explorations into the automated generation of scientific news reports. The dataset and code related to this work are available at https://dongqi.me/projects/SciNews.
Anthology ID:
2024.lrec-main.1258
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
14429–14444
Language:
URL:
https://aclanthology.org/2024.lrec-main.1258
DOI:
Bibkey:
Cite (ACL):
Dongqi Pu, Yifan Wang, Jia E. Loy, and Vera Demberg. 2024. SciNews: From Scholarly Complexities to Public Narratives – a Dataset for Scientific News Report Generation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14429–14444, Torino, Italia. ELRA and ICCL.
Cite (Informal):
SciNews: From Scholarly Complexities to Public Narratives – a Dataset for Scientific News Report Generation (Pu et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1258.pdf