(Hi, just some short prereqs for this blog: The first part can be read without any, for the latter part, it will help to know some measure theory to understand the proofs completely. Basic mathematics and reinforcement learning understanding is expected).
Reinforcement Learning models can exhibit various abrupt changes in their behavior. Common examples include catastrophic forgetting, reward hacking, or even generalization (there is substantial evidence of generalization in Deep Reinforcement Learning Models). It is very important to understand these phase transitions from an alignment point of view, since meaningful progress in this study can prevent any change in the alignment of a deployed model (which can be a very “catastrophic” issue).
A very novel idea is called Singular Learning Theory, developed by Sumio Watanabe, and in my opinion and research, this has proved substantially useful already in understanding general Neural Networks. In this post, I aim to try to explain the crux of this subject (which this blog is too short to contain), and explain its extension to Reinforcement Learning Models. Let us get into it.
We are mostly interested in comparing models (specifically a model of a previous epoch to a model of a newer epoch). We can compare using various metrics, such as weights (in a neural network) and loss. But neither of them are really ideal for study, since models are singular, i.e., various weights give the same performance, and different functions are compatible with the same loss.
Key Idea: We find a quantity that is invariant over similar models. This creates an equivalence class on the models, and whenever there is a change in this class, it corresponds to a major shift in the quantity as well as the model, signifying a phase transition!
The Local Learning Coefficient (LLC) is such a quantity!
(We will formalize everything later).
The best mathematical insights are the ones that come in different manners. The Local Learning Coefficient can be motivated in a different manner (and this is how we actually are going to mathematically develop it). Viewing the models as statistical systems (or rather thermodynamical systems if you know that analog), we are interested in understanding the asymptotic behavior of the posterior distribution (understanding how the posterior distribution evolves is just what learning is). This also helps us understand the geometry of the loss landscape (regret landscape in the case of deep RL).
But the key issue with Neural Networks is that they are singular. This makes the Fisher information matrix non-invertible, and the asymptotic behavior is not well understood by classical methods. Singular Learning Theory aims to deal with this and the LLC helps us understand this asymptotic growth!
Connecting the two: Understanding the loss landscape actually helps us understand generalization, since models often prefer settling in the flatter basins. This is something that is formalized by SLT, and is more widely known as just another instance of Occam’s razor!
Time to get rigorous.
When we refer to Bayesian Inference as a learning process, we mean how the posterior distribution $p(w|D_n)$ over parameters $w \in W$ over given data $D_n$ evolves as $n$ increases.
(Bernstein-Von Mises Theorem) In supervised learning for regular statistical models, where the data consists of $n$ random variables i.i.d (sampled from a distribution), and the Fisher information matrix is invertible, as $n \to \infty$, the posterior converges to a normal distribution centered on the MLE with variance controlled by the inverse of the Fisher Information Matrix.
However, when we parametrize our model by a neural network, the model is not regular. It is called a singular model. It is one in which the same model can be described by multiple parameters. This can happen due to permutation symmetry, scaling symmetry, or many other relations. It can be seen that in such models, the Fisher information matrix is non-invertible. Thus the previous theorem does not work.
From a geometric perspective, the theorem is not valid because there are singularities in the manifold where we parametrize the distance between two models by their KL divergence (if you really want a true distance function, take the Riemannian metric). These singularities require a transformation of the space itself. We can resolve these singularities by “blowing up” the space, and there are many other considerations as well, but we will not go into them. We will be more applicative in this blog.
The point to take away is that this theory leverages algebraic geometry to describe Bayesian learning dynamics in terms of the nature of singularities. In particular, SLT gives an asymptotic expansion of the local free energy (which describes the posterior). This is known as Watanabe’s free energy formula:
Let $u \in W$ be a local minimum of the expected negative log likelihood, and let $U$ be a ball around $u$ in which $u$ is a global minimum. Then, under certain technical conditions on the statistical model, we have an asymptotic expansion in $n$ of the free energy associated to this neighborhood:
$$ \log \int_U \exp(-nL_n(w))\phi(w)\mathrm{d}w = nL_n(u) + \lambda(U) \log n - (m(U) - 1) \log \log n + O_{\mathrm{P}}(1) $$
where $L_n(u)$ is the negative log likelihood of $D_n$ at $u$ and $\lambda(U)$ and $m(U)$ are coefficients derived from the geometry of the log likelihood around $u$. The coefficient $\lambda(U)$ is called the Local Learning Coefficient.