PolyNERE: A Novel Ontology and Corpus for Named Entity Recognition and Relation Extraction in Polymer Science Domain

Van-Thuy Phi, Hiroki Teranishi, Yuji Matsumoto, Hiroyuki Oka, Masashi Ishii


Abstract
Polymers are widely used in diverse fields, and the demand for efficient methods to extract and organize information about them is increasing. An automated approach that utilizes machine learning can accurately extract relevant information from scientific papers, providing a promising solution for automating information extraction using annotated training data. In this paper, we introduce a polymer-relevant ontology featuring crucial entities and relations to enhance information extraction in the polymer science field. Our ontology is customizable to adapt to specific research needs. We present PolyNERE, a high-quality named entity recognition (NER) and relation extraction (RE) corpus comprising 750 polymer abstracts annotated using our ontology. Distinctive features of PolyNERE include multiple entity types, relation categories, support for various NER settings, and the ability to assert entities and relations at different levels. PolyNERE also facilitates reasoning in the RE task through supporting evidence. While our experiments with recent advanced methods achieved promising results, challenges persist in adapting NER and RE from abstracts to full-text paragraphs. This emphasizes the need for robust information extraction systems in the polymer domain, making our corpus a valuable benchmark for future developments.
Anthology ID:
2024.lrec-main.1126
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
12856–12866
Language:
URL:
https://aclanthology.org/2024.lrec-main.1126
DOI:
Bibkey:
Cite (ACL):
Van-Thuy Phi, Hiroki Teranishi, Yuji Matsumoto, Hiroyuki Oka, and Masashi Ishii. 2024. PolyNERE: A Novel Ontology and Corpus for Named Entity Recognition and Relation Extraction in Polymer Science Domain. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 12856–12866, Torino, Italia. ELRA and ICCL.
Cite (Informal):
PolyNERE: A Novel Ontology and Corpus for Named Entity Recognition and Relation Extraction in Polymer Science Domain (Phi et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1126.pdf