Mi-CGA: Cross-modal Graph Attention Network for robust emotion recognition in the presence of incomplete modalities

Nguyen Cam-Van Thi, Kieu Hai-Dang, Ha Quang-Thuy, Phan Xuan-Hieu, Le Duc-Trong

Publisher

Multimodal Emotion Recognition in Conversation (Multimodal ERC) is crucial for understanding human communication across various applications. However, the challenge of missing modalities impedes the development of robust models. Existing approaches often overlook scenarios where multiple modalities are absent simultaneously and fail to explore deep semantic interactions between modalities. Additionally, learning high-dimensional interactive features from limited samples is challenging due to missing data. This paper proposes Mi-CGA, a framework tailored for incomplete multimodal learning in conversational contexts. Mi-CGA comprises two main components: Incomplete Multimodal Representation (IMR) and Cross-modal Graph Attention Network (CGA-Net). IMR simulates incomplete modalities, while CGA-Net extracts rich information from conversational graphs. CGA-Net consists of three key modules: Modality Feature Estimation reconstructs missing data, Multi-head Graph Attention Network enhances utterance-level representation, and Cross-modal Attention Network improves conversation-level representation. Experimental results on three benchmark datasets (IEMOCAP, CMU-MOSI, and CMU-MOSEI) consistently demonstrate that Mi-CGA outperforms several representative baseline models, marking a significant advancement for the Multimodal ERC task. Source code for Mi-CGA is available at https://github.com/dangkh/Mi-CGA.

Publisher: Neurocomputing

Article number: 129342

ISSN (Electronic): 18728286

ISSN (Print): 09252312

Keywords

  • Cross-modality attention
  • Graph Attention Network
  • Modality incompleteness
  • Multimodal emotion recognition

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Publication year

2025

Fingerprint