Grounded Multimodal Procedural Entity Recognition for Procedural Documents: A New Dataset and Baseline

Haopeng Ren, Yushi Zeng, Yi Cai, Zhenqi Ye, Li Yuan, Pinli Zhu


Abstract
Much of commonsense knowledge in real world is the form of procudures or sequences of steps to achieve particular goals. In recent years, knowledge extraction on procedural documents has attracted considerable attention. However, they often focus on procedural text but ignore a common multimodal scenario in the real world. Images and text can complement each other semantically, alleviating the semantic ambiguity suffered in text-only modality. Motivated by these, in this paper, we explore a problem of grounded multimodal procedural entity recognition (GMPER), aiming to detect the entity and the corresponding bounding box groundings in image (i.e., visual entities). A new dataset (Wiki-GMPER) is bult and extensive experiments are conducted to evaluate the effectiveness of our proposed model.
Anthology ID:
2024.lrec-main.702
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
7971–7981
Language:
URL:
https://aclanthology.org/2024.lrec-main.702
DOI:
Bibkey:
Cite (ACL):
Haopeng Ren, Yushi Zeng, Yi Cai, Zhenqi Ye, Li Yuan, and Pinli Zhu. 2024. Grounded Multimodal Procedural Entity Recognition for Procedural Documents: A New Dataset and Baseline. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7971–7981, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Grounded Multimodal Procedural Entity Recognition for Procedural Documents: A New Dataset and Baseline (Ren et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.702.pdf