A Corpus and Method for Chinese Named Entity Recognition in Manufacturing

Ruiting Li, Peiyan Wang, Libang Wang, Danqingxin Yang, Dongfeng Cai


Abstract
Manufacturing specifications are documents entailing different techniques, processes, and components involved in manufacturing. There is a growing demand for named entity recognition (NER) resources and techniques for manufacturing-specific named entities, with the development of smart manufacturing. In this paper, we introduce a corpus of Chinese manufacturing specifications, named MS-NERC, including 4,424 sentences and 16,383 entities. We also propose an entity recognizer named Trainable State Transducer (TST), which is initialized with a finite state transducer describing the morphological patterns of entities. It can directly recognize entities based on prior morphological knowledge without training. Experimental results show that TST achieves an overall 82.05% F1 score for morphological-specific entities in zero-shot. TST can be improved through training, the result of which outperforms neural methods in few-shot and rich-resource. We believe that our corpus and model will be valuable resources for NER research not only in manufacturing but also in other low-resource domains.
Anthology ID:
2024.lrec-main.24
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
264–272
Language:
URL:
https://aclanthology.org/2024.lrec-main.24
DOI:
Bibkey:
Cite (ACL):
Ruiting Li, Peiyan Wang, Libang Wang, Danqingxin Yang, and Dongfeng Cai. 2024. A Corpus and Method for Chinese Named Entity Recognition in Manufacturing. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 264–272, Torino, Italia. ELRA and ICCL.
Cite (Informal):
A Corpus and Method for Chinese Named Entity Recognition in Manufacturing (Li et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.24.pdf