Dataset of Quotation Attribution in German News Articles

Fynn Petersen-Frey, Chris Biemann


Abstract
Extracting who says what to whom is a crucial part in analyzing human communication in today’s abundance of data such as online news articles. Yet, the lack of annotated data for this task in German news articles severely limits the quality and usability of possible systems. To remedy this, we present a new, freely available, creative-commons-licensed dataset for quotation attribution in German news articles based on WIKINEWS. The dataset provides curated, high-quality annotations across 1000 documents (250,000 tokens) in a fine-grained annotation schema enabling various downstream uses for the dataset. The annotations not only specify who said what but also how, in which context, to whom and define the type of quotation. We specify our annotation schema, describe the creation of the dataset and provide a quantitative analysis. Further, we describe suitable evaluation metrics, apply two existing systems for quotation attribution, discuss their results to evaluate the utility of our dataset and outline use cases of our dataset in downstream tasks.
Anthology ID:
2024.lrec-main.394
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
4412–4422
Language:
URL:
https://aclanthology.org/2024.lrec-main.394
DOI:
Bibkey:
Cite (ACL):
Fynn Petersen-Frey and Chris Biemann. 2024. Dataset of Quotation Attribution in German News Articles. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4412–4422, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Dataset of Quotation Attribution in German News Articles (Petersen-Frey & Biemann, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.394.pdf