KAN: Kolmogorov-Arnold Networks

Authors: Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, Max Tegmark

What

This paper introduces Kolmogorov-Arnold Networks (KANs), a novel neural network architecture inspired by the Kolmogorov-Arnold representation theorem, as a promising alternative to Multi-Layer Perceptrons (MLPs) for function approximation, featuring learnable activation functions on edges.

Why

The paper is important because it challenges the dominance of MLPs in deep learning by presenting KANs as a more accurate and interpretable alternative, especially in scientific domains. KANs exhibit faster neural scaling laws, better handle the curse of dimensionality for functions with compositional structure, and offer improved interpretability, potentially making them valuable for AI-driven scientific discovery.

How

The authors generalize the Kolmogorov-Arnold representation theorem to arbitrary network depths and widths. They parameterize each weight in the network as a learnable 1D spline function, allowing for fine-grained control over function approximation. The paper includes extensive experiments on toy datasets, special functions, Feynman equations, partial differential equations, and real-world scientific datasets in knot theory and condensed matter physics to demonstrate KANs’ advantages in accuracy and interpretability. The authors also propose simplification techniques like sparsity regularization and pruning to enhance interpretability.

Result

KANs consistently outperform MLPs in terms of accuracy and parameter efficiency across various tasks, including function fitting, PDE solving, and symbolic regression. Their test loss scales favorably with the number of parameters, approaching the theoretically predicted scaling exponent. KANs demonstrate an ability to learn complex functions, including special functions and phase transition boundaries. They can be simplified and visualized to reveal underlying compositional structures and enable symbolic regression with human interaction. In applications to scientific datasets, KANs rediscover known mathematical relations in knot theory and uncover mobility edges in condensed matter physics, highlighting their potential for AI-driven scientific discovery.

LF

The authors acknowledge that the mathematical understanding of deeper KANs is limited and propose a generalized Kolmogorov-Arnold theorem as future work. Algorithmically, they identify potential improvements in accuracy, efficiency, and training strategies, including adaptive grids and hybrid KAN-MLP architectures. They also suggest expanding KAN applications to other scientific domains and integrating them into existing architectures like transformers. A key limitation is the current slow training speed of KANs compared to MLPs.

Abstract

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes (“neurons”), KANs have learnable activation functions on edges (“weights”). KANs have no linear weights at all — every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today’s deep learning models which rely heavily on MLPs.