On Model Explanations with Transferable Neural Pathways

Authors: Xinmiao Lin, Wentao Bao, Qi Yu, Yu Kong

What

This paper introduces GEN-CNP, a novel method for generating class-relevant neural pathway explanations for image recognition models, aiming to improve the interpretability of model explanations while maintaining faithfulness to the original model.

Why

The paper addresses the limitations of existing neural pathway explanation methods that often lack interpretability and rely on global sparsity. It proposes class-wise and instance-specific interpretability concepts, enhancing the understanding of model behavior by revealing class-relevant features and allowing the transferability of explanations to other samples within the same class.

How

The authors propose GEN-CNP, a model that learns to predict neural pathways from the target model’s feature maps. GEN-CNP uses Recursive Feature Embedders (RFEs) to extract feature patterns and Pathway Distillation Network (PDN) to learn class-relevant information from them. It utilizes Recursive Pathway Decoders (RPDs) with Distance Aware Quantization (DAQ) to decode importance scores into sparse and faithful neural pathways. They train GEN-CNP using knowledge distillation with sparsity constraints to ensure faithfulness to the target model and generate sparse explanations.

Result

The proposed GEN-CNP method generates neural pathways with higher faithfulness to the original model, as demonstrated by improved performance on metrics like mIC and mDC. The generated pathways exhibit higher class-relevance, confirmed by higher acIOU scores and the transferability experiments, showing consistent and faithful explanations for samples within the same class. Qualitative visualizations using Grad-CAM and neural pathway gradients highlight that GEN-CNP identifies more semantically meaningful features compared to existing methods.

LF

The authors acknowledge limitations in terms of computational cost and the current implementation’s focus on image recognition models. Future work could explore more computationally efficient architectures for GEN-CNP and extend its applicability to other domains beyond image recognition, such as natural language processing or time series analysis.

Abstract

Neural pathways as model explanations consist of a sparse set of neurons that provide the same level of prediction performance as the whole model. Existing methods primarily focus on accuracy and sparsity but the generated pathways may offer limited interpretability thus fall short in explaining the model behavior. In this paper, we suggest two interpretability criteria of neural pathways: (i) same-class neural pathways should primarily consist of class-relevant neurons; (ii) each instance’s neural pathway sparsity should be optimally determined. To this end, we propose a Generative Class-relevant Neural Pathway (GEN-CNP) model that learns to predict the neural pathways from the target model’s feature maps. We propose to learn class-relevant information from features of deep and shallow layers such that same-class neural pathways exhibit high similarity. We further impose a faithfulness criterion for GEN-CNP to generate pathways with instance-specific sparsity. We propose to transfer the class-relevant neural pathways to explain samples of the same class and show experimentally and qualitatively their faithfulness and interpretability.