WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
Authors: Yong Hyun Ahn, Hyeon Bae Kim, Seong Tae Kim
What
This paper introduces WWW, a novel framework designed to explain neural network decisions by revealing ‘what’ concept a neuron represents, ‘where’ in the input image the concept is located, and ‘why’ the concept contributes to the prediction.
Why
The paper addresses the “black box” problem in neural networks, aiming to make their decision-making process more transparent and understandable to humans. This is crucial for building trust and reliability in AI systems, especially given the increasing demand for explainable AI in various domains.
How
WWW comprises three modules: 1) Concept Discovery identifies concepts represented by each neuron using adaptive cosine similarity and adaptive selection. 2) Localization identifies relevant input regions for each concept by combining neuron activation maps with Shapley values. 3) Reasoning identifies important neurons for both the predicted class and the specific input sample, highlighting differences to understand prediction reliability.
Result
WWW demonstrates superior performance in both qualitative and quantitative evaluations. It outperforms existing methods in accurately identifying neuron concepts, particularly with larger concept sets. The paper also shows that heatmap similarity, derived from the framework, can be a more effective measure of prediction uncertainty compared to maximum softmax probability.
LF
The paper acknowledges limitations in accurately identifying neuron concepts when only a few example images are available. Future work will focus on improving concept discovery by exploring different example selection strategies and concept representations. Another direction is exploring the use of heatmap similarity for misprediction detection and model improvement.
Abstract
Recent advancements in neural networks have showcased their remarkable capabilities across various domains. Despite these successes, the “black box” problem still remains. Addressing this, we propose a novel framework, WWW, that offers the ‘what’, ‘where’, and ‘why’ of the neural network decisions in human-understandable terms. Specifically, WWW utilizes adaptive selection for concept discovery, employing adaptive cosine similarity and thresholding techniques to effectively explain ‘what’. To address the ‘where’ and ‘why’, we proposed a novel combination of neuron activation maps (NAMs) with Shapley values, generating localized concept maps and heatmaps for individual inputs. Furthermore, WWW introduces a method for predicting uncertainty, leveraging heatmap similarities to estimate ‘how’ reliable the prediction is. Experimental evaluations of WWW demonstrate superior performance in both quantitative and qualitative metrics, outperforming existing methods in interpretability. WWW provides a unified solution for explaining ‘what’, ‘where’, and ‘why’, introducing a method for localized explanations from global interpretations and offering a plug-and-play solution adaptable to various architectures.