If you want to simplify the functionality of neural networks in an extreme way, you can say that neural networks are one way of curve fitting. Then, are there other, and better ways to perform the same tasks?
In fact, theoretically, many problems that can be solved using neural networks, can also be solved using approaches like:
- Splines: The phrase “spline” is used to address a wide class of functions used in applications requiring data interpolation and/or smoothing.
- Kriging: Kriging, closely related to regression analysis, predicts the value of a function at a given point by computing a weighted average of the known values of the function.
- Lowess: lowess stands for locally weighted scatterplot smoothing) and is one of various non-parametric regression techniques.
However, the universal approximation theorem, a set of results that limits what neural networks can theoretically learn, states that under certain conditions, a neural network can get you as close as you want to a wide class of functions. So, it is the best tool for generalization. But there are obviously certain limitations to neural network architectures. Kolmogorov-Arnold Networks (KANs) aim to eliminate some of them.
We know that multi-layer perceptrons (MLPs) are a type of artificial neural network consisting of multiple layers of neurons. The neurons in the MLP typically use nonlinear activation functions, allowing the network to learn complex patterns in data. If you are a programmer looking for a more robust introduction to neural networks, I suggest you refer to my video series, “Deep Learning With Python.”
What are KANs?
We know that in the classic definition of the structure of neural networks, activation functions at nodes are fixed. Unlike MLPs, which rely on fixed activation functions at individual nodes, KANs leverage learnable activation functions on edges, introducing a new neural network structure.
This unique structure of KANs eliminates the concept of linear weight matrices, and replaces them with learnable 1D spline functions. This architectures modification allows KANs to combine the strengths of splines and MLPs while mitigating their respective weaknesses.
The key strength of KANs lies in their ability to merge the accuracy of splines with the feature learning capabilities of MLPs.
As defined previously, splines do a great job representing low-dimensional functions but run into limitations with high-dimensional data (The curse of dimensionality (COD)). On the other hand, MLPs are proficient in feature learning but may struggle when it comes to optimizing univariate functions. By leveraging an architecture integrating splines internally and MLPs externally, KANs offer a robust solution to function approximation challenges.
One of the significant advantages of KANs, in my opinion, is interpretability.
This enhanced interpretability stems from their architecture, which incorporates learnable activation functions. Unlike traditional neural networks, where activation functions are fixed and non-linear, KANs allow these functions to adapt and evolve during training. Because of this, they capture the underlying structure of data more effectively.
Since the activation functions are learnable, KANs allow users to gain meaningful insights into how individual features contribute to the overall prediction. By analyzing the coefficients of the spline functions, analysts can interpret which features are most influential in driving the network’s decisions.
Advantages
In addition to interpretability, there are numerous other advantages that KANs have over MLPs:
Accuracy: In almost all research work, KANs have demonstrated increased accuracy as compared to MLPs.
Efficiency: When it come to computational resources and parameter utilization, KANs have been shown to be superior to MLPs.
Generalization: The capability to generalize is greater in KANs as compared to MLPs. This should not be surprising, considering their ability to adaptively learn activation functions. This means that overall, they are more flexible as well.
Scalability: Because their architecture is inherently scalable, KANs are in general more scalable than MLPs..
Noise Robustness: KANs exhibit enhanced robustness to noisy data and adversarial attacks, as compared to MLPs. KANs can learn more robust representations of data through adaptive activation functions. This makes them less susceptible to perturbations and adversarial manipulations.
References
KAN: Kolmogorov-Arnold Networks

