Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Authors: Hong Li, Xingyu Li, Pengbo Hu, Yinuo Lei, Chunxiao Li, Yi Zhou

What

This paper proposes Adaptive Gradient Modulation (AGM), a novel method for enhancing the performance of multi-modal learning models by adaptively controlling the gradient flow during training to mitigate modality competition.

Why

This work is important because it addresses the sub-optimal performance of standard joint training in multi-modal learning, particularly the issue of modality competition where a dominant modality hinders the learning of other modalities. It provides a novel solution (AGM) applicable to various fusion strategies and offers insights into the dynamics of modality competition.

How

The authors develop AGM, which utilizes Shapley value-based attribution to isolate mono-modal responses and adaptively modulates the gradients of individual modalities during back-propagation. They introduce the concept of “mono-modal concept” to represent the ideal, competition-less state of a modality and use it to quantify the competition strength. Experiments are conducted on five multi-modal datasets (AV-MNIST, CREMA-D, UR-Funny, AVE, CMU-MOSEI) with varying fusion strategies, modalities, and network architectures to evaluate AGM’s effectiveness and analyze modality competition.

Result

The key findings demonstrate that AGM consistently outperforms existing modulation methods and significantly improves multi-modal models’ accuracy across different datasets and architectures. The analysis reveals that AGM encourages models to leverage more informative modalities and mitigates the model’s inherent bias towards specific modalities during training. The paper also establishes that modality competition is prevalent in multi-modal models, often with a “preferred modality” that the model tends to exploit. The strength of modality competition is found to be largely independent of the fusion strategy and modality type but appears to be influenced by the specific task and data characteristics.

LF

The paper acknowledges the need for further investigation into the relationship between modality competition strength, modality information content, and data characteristics. Future work could explore more sophisticated methods for defining and utilizing the “mono-modal concept” and investigate the role of higher-order interactions among modalities in shaping competition dynamics.

Abstract

While the field of multi-modal learning keeps growing fast, the deficiency of the standard joint training paradigm has become clear through recent studies. They attribute the sub-optimal performance of the jointly trained model to the modality competition phenomenon. Existing works attempt to improve the jointly trained model by modulating the training process. Despite their effectiveness, those methods can only apply to late fusion models. More importantly, the mechanism of the modality competition remains unexplored. In this paper, we first propose an adaptive gradient modulation method that can boost the performance of multi-modal models with various fusion strategies. Extensive experiments show that our method surpasses all existing modulation methods. Furthermore, to have a quantitative understanding of the modality competition and the mechanism behind the effectiveness of our modulation method, we introduce a novel metric to measure the competition strength. This metric is built on the mono-modal concept, a function that is designed to represent the competition-less state of a modality. Through systematic investigation, our results confirm the intuition that the modulation encourages the model to rely on the more informative modality. In addition, we find that the jointly trained model typically has a preferred modality on which the competition is weaker than other modalities. However, this preferred modality need not dominate others. Our code will be available at https://github.com/lihong2303/AGM_ICCV2023.