Few-Shot Multimodal Named Entity Recognition Based on Mutlimodal Causal Intervention Graph

Feihong Lu, Xiaocui Yang, Qian Li, Qingyun Sun, Ke Jiang, Cheng Ji, Jianxin Li


Abstract
Multimodal Named Entity Recognition (MNER) models typically require a significant volume of labeled data for effective training to extract relations between entities. In real-world scenarios, we frequently encounter unseen relation types. Nevertheless, existing methods are predominantly tailored for complete datasets and are not equipped to handle these new relation types. In this paper, we introduce the Few-shot Multimodal Named Entity Recognition (FMNER) task to address these novel relation types. FMNER trains in the source domain (seen types) and tests in the target domain (unseen types) with different distributions. Due to limited available resources for sampling, each sampling instance yields different content, resulting in data bias and alignment problems of multimodal units (image patches and words). To alleviate the above challenge, we propose a novel Multimodal causal Intervention graphs (MOUSING) model for FMNER. Specifically, we begin by constructing a multimodal graph that incorporates fine-grained information from multiple modalities. Subsequently, we introduce the Multimodal Causal Intervention Strategy to update the multimodal graph. It aims to decrease spurious correlations and emphasize accurate correlations between multimodal units, resulting in effectively aligned multimodal representations. Extensive experiments on two multimodal named entity recognition datasets demonstrate the superior performance of our model in the few-shot setting.
Anthology ID:
2024.lrec-main.633
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
7208–7219
Language:
URL:
https://aclanthology.org/2024.lrec-main.633
DOI:
Bibkey:
Cite (ACL):
Feihong Lu, Xiaocui Yang, Qian Li, Qingyun Sun, Ke Jiang, Cheng Ji, and Jianxin Li. 2024. Few-Shot Multimodal Named Entity Recognition Based on Mutlimodal Causal Intervention Graph. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7208–7219, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Few-Shot Multimodal Named Entity Recognition Based on Mutlimodal Causal Intervention Graph (Lu et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.633.pdf