Zexin Wang


2024

pdf bib
Decompose, Prioritize, and Eliminate: Dynamically Integrating Diverse Representations for Multimodal Named Entity Recognition
Zihao Zheng | Zihan Zhang | Zexin Wang | Ruiji Fu | Ming Liu | Zhongyuan Wang | Bing Qin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Multi-modal Named Entity Recognition, a fundamental task for multi-modal knowledge graph construction, requires integrating multi-modal information to extract named entities from text. Previous research has explored the integration of multi-modal representations at different granularities. However, they struggle to integrate all these multi-modal representations to provide rich contextual information to improve multi-modal named entity recognition. In this paper, we propose DPE-MNER, which is an iterative reasoning framework that dynamically incorporates all the diverse multi-modal representations following the strategy of “decompose, prioritize, and eliminate”. Within the framework, the fusion of diverse multi-modal representations is decomposed into hierarchically connected fusion layers that are easier to handle. The incorporation of multi-modal information prioritizes transitioning from “easy-to-hard” and “coarse-to-fine”. The explicit modeling of cross-modal relevance eliminate the irrelevances that will mislead the MNER prediction. Extensive experiments on two public datasets have demonstrated the effectiveness of our approach.