Multimodal Emotion Recognition in Conversation (Multimodal ERC) is crucial for understanding human communication across various applications. However, the challenge of missing modalities impedes the development of robust models. Existing approaches often overlook scenarios where multiple modalities are absent simultaneously and fail to explore deep semantic interactions between modalities. Additionally, learning high-dimensional interactive features from limited samples is challenging due to missing data. This paper proposes Mi-CGA, a framework tailored for incomplete multimodal learning in conversational contexts. Mi-CGA comprises two main components: Incomplete Multimodal Representation (IMR) and Cross-modal Graph Attention Network (CGA-Net). IMR simulates incomplete modalities, while CGA-Net extracts rich information from conversational graphs. CGA-Net consists of three key modules: Modality Feature Estimation reconstructs missing data, Multi-head Graph Attention Network enhances utterance-level representation, and Cross-modal Attention Network improves conversation-level representation. Experimental results on three benchmark datasets (IEMOCAP, CMU-MOSI, and CMU-MOSEI) consistently demonstrate that Mi-CGA outperforms several representative baseline models, marking a significant advancement for the Multimodal ERC task. Source code for Mi-CGA is available at https://github.com/dangkh/Mi-CGA.