Towards Multi-modal Sarcasm Detection via Disentangled Multi-grained Multi-modal Distilling

Zhihong Zhu, Xuxin Cheng, Guimin Hu, Yaowei Li, Zhiqi Huang, Yuexian Zou


Abstract
Multi-modal sarcasm detection aims to identify whether a given sample with multi-modal information (i.e., text and image) is sarcastic, which has received increasing attention due to the rapid growth of multi-modal posts on modern social media. However, mainstream models process the input of each modality in a holistic manner, resulting in redundant and unrefined information. Moreover, the representations of different modalities are entangled in one common latent space to perform complex cross-modal interactions, neglecting the heterogeneity and distribution gap of different modalities. To address these issues, we propose a novel framework DMMD (short for Disentangled Multi-grained Multi-modal Distilling) for multi-modal sarcasm detection, which conducts multi-grained knowledge distilling (i.e., intra-subspace and inter-subspace) based on the disentangled multi-modal representations. Concretely, the representations of each modality are disentangled explicitly into modality-agnostic/specific subspaces. Then we transfer cross-modal knowledge by conducting intra-subspace knowledge distilling in a self-adaptive pattern. We also apply mutual learning to regularize the underlying inter-subspace consistency. Extensive experiments on a commonly used benchmark demonstrate the efficacy of our DMMD over cutting-edge methods. More encouragingly, visualization results indicate the multi-modal representations display meaningful distributional patterns, and we hope it will be helpful for the community of multi-modal knowledge transfer.
Anthology ID:
2024.lrec-main.1441
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
16581–16591
Language:
URL:
https://aclanthology.org/2024.lrec-main.1441
DOI:
Bibkey:
Cite (ACL):
Zhihong Zhu, Xuxin Cheng, Guimin Hu, Yaowei Li, Zhiqi Huang, and Yuexian Zou. 2024. Towards Multi-modal Sarcasm Detection via Disentangled Multi-grained Multi-modal Distilling. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16581–16591, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Towards Multi-modal Sarcasm Detection via Disentangled Multi-grained Multi-modal Distilling (Zhu et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1441.pdf