$$ \newcommand{\dint}{\mathrm{d}} \newcommand{\vphi}{\boldsymbol{\phi}} \newcommand{\vpi}{\boldsymbol{\pi}} \newcommand{\vpsi}{\boldsymbol{\psi}} \newcommand{\vomg}{\boldsymbol{\omega}} \newcommand{\vsigma}{\boldsymbol{\sigma}} \newcommand{\vzeta}{\boldsymbol{\zeta}} \renewcommand{\vx}{\mathbf{x}} \renewcommand{\vy}{\mathbf{y}} \renewcommand{\vz}{\mathbf{z}} \renewcommand{\vh}{\mathbf{h}} \renewcommand{\b}{\mathbf} \renewcommand{\vec}{\mathrm{vec}} \newcommand{\vecemph}{\mathrm{vec}} \newcommand{\mvn}{\mathcal{MN}} \newcommand{\G}{\mathcal{G}} \newcommand{\M}{\mathcal{M}} \newcommand{\N}{\mathcal{N}} \newcommand{\S}{\mathcal{S}} \newcommand{\I}{\mathcal{I}} \newcommand{\diag}[1]{\mathrm{diag}(#1)} \newcommand{\diagemph}[1]{\mathrm{diag}(#1)} \newcommand{\tr}[1]{\text{tr}(#1)} \renewcommand{\C}{\mathbb{C}} \renewcommand{\R}{\mathbb{R}} \renewcommand{\E}{\mathbb{E}} \newcommand{\D}{\mathcal{D}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\innerbig}[1]{\left \langle #1 \right \rangle} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\two}{\mathrm{II}} \newcommand{\GL}{\mathrm{GL}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\grad}[1]{\mathrm{grad} \, #1} \newcommand{\gradat}[2]{\mathrm{grad} \, #1 \, \vert_{#2}} \newcommand{\Hess}[1]{\mathrm{Hess} \, #1} \newcommand{\T}{\text{T}} \newcommand{\dim}[1]{\mathrm{dim} \, #1} \newcommand{\partder}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\rank}[1]{\mathrm{rank} \, #1} \newcommand{\inv}1 \newcommand{\map}{\text{MAP}} \newcommand{\L}{\mathcal{L}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} $$

Volume Forms and Probability Density Functions Under Change of Variables

From elementary probability theory, it is well known that a probability density function (pdf) is not invariant under an arbitrary change of variables (reparametrization). In this article we'll see that pdf are actually invariant when we see a pdf in its entirety, as a volume form and a Radon-Nikodym derivative in differential geometry.

The Invariance of the Hessian and Its Eigenvalues, Determinant, and Trace

In deep learning, the Hessian and its downstream quantities are observed to be not invariant under reparametrization. This makes the Hessian to be a poor proxy for flatness and makes Newton's method non-invariant. In this post, we shall see that the Hessian and the quantities derived from it are actually invariant under reparametrization.

Convolution of Gaussians and the Probit Integral

Gaussian distributions are very useful in Bayesian inference due to their (many!) convenient properties. In this post we take a look at two of them: the convolution of two Gaussian pdfs and the integral of the probit function w.r.t. a Gaussian measure.

The Last Mile of Creating Publication-Ready Plots

In machine learning papers, plots are often treated as afterthought---authors often simply use the default Matplotlib style, resulting in an out-of-place look when the paper is viewed as a whole. In this post, I'm sharing how I make my publication-ready plots using TikZ.

Modern Arts of Laplace Approximations

The Laplace approximation (LA) is a simple yet powerful class of methods for approximating intractable posteriors. Yet, it is largely forgotten in the Bayesian deep learning community. Here, we review the LA, and highlight a recent software library for applying LA to deep nets.

Chentsov's Theorem

The Fisher information is often the default choice of the Riemannian metric for manifolds of probability distributions. In this post, we study Chentsov's theorem, which justifies this choice. It says that the Fisher information is the unique Riemannian metric (up to a scaling constant) that is invariant under sufficient statistics. This fact makes the Fisher metric stands out from other choices.

The Curvature of the Manifold of Gaussian Distributions

The Gaussian probability distribution is central in statistics and machine learning. As it turns out, by equipping the set of all Gaussians p.d.f. with a Riemannian metric given by the Fisher information, we can see it as a Riemannian manifold. In this post, we will prove that this manifold can be covered by a single coordinate chart and has a constant negative curvature.

Hessian and Curvatures in Machine Learning: A Differential-Geometric View

In machine learning, especially in neural networks, the Hessian matrix is often treated synonymously with curvatures. But, from calculus alone, it is not clear why can one say so. Here, we will view the loss landscape of a neural network as a hypersurface and apply a differential-geometric analysis on it.

Optimization and Gradient Descent on Riemannian Manifolds

One of the most ubiquitous applications in the field of differential geometry is the optimization problem. In this article we will discuss the familiar optimization problem on Euclidean spaces by focusing on the gradient descent method, and generalize them on Riemannian manifolds.

Notes on Riemannian Geometry

This article is a collection of small notes on Riemannian geometry that I find useful as references. It is largely based on Lee's books on smooth and Riemannian manifolds.