$$\newcommand{\dint}{\mathrm{d}} \newcommand{\vphi}{\boldsymbol{\phi}} \newcommand{\vpi}{\boldsymbol{\pi}} \newcommand{\vpsi}{\boldsymbol{\psi}} \newcommand{\vomg}{\boldsymbol{\omega}} \newcommand{\vsigma}{\boldsymbol{\sigma}} \newcommand{\vzeta}{\boldsymbol{\zeta}} \renewcommand{\vx}{\mathbf{x}} \renewcommand{\vy}{\mathbf{y}} \renewcommand{\vz}{\mathbf{z}} \renewcommand{\vh}{\mathbf{h}} \renewcommand{\b}{\mathbf} \renewcommand{\vec}{\mathrm{vec}} \newcommand{\vecemph}{\mathrm{vec}} \newcommand{\mvn}{\mathcal{MN}} \newcommand{\G}{\mathcal{G}} \newcommand{\M}{\mathcal{M}} \newcommand{\N}{\mathcal{N}} \newcommand{\S}{\mathcal{S}} \newcommand{\diag}[1]{\mathrm{diag}(#1)} \newcommand{\diagemph}[1]{\mathrm{diag}(#1)} \newcommand{\tr}[1]{\text{tr}(#1)} \renewcommand{\C}{\mathbb{C}} \renewcommand{\R}{\mathbb{R}} \renewcommand{\E}{\mathbb{E}} \newcommand{\D}{\mathcal{D}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\innerbig}[1]{\left \langle #1 \right \rangle} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\two}{\mathrm{II}} \newcommand{\GL}{\mathrm{GL}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\grad}[1]{\mathrm{grad} \, #1} \newcommand{\gradat}[2]{\mathrm{grad} \, #1 \, \vert_{#2}} \newcommand{\Hess}[1]{\mathrm{Hess} \, #1} \newcommand{\T}{\text{T}} \newcommand{\dim}[1]{\mathrm{dim} \, #1} \newcommand{\partder}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\rank}[1]{\mathrm{rank} \, #1}$$

# Variational Autoencoder (VAE) in Pytorch

This post should be quick as it is just a port of the previous Keras code. For the intuition and derivative of Variational Autoencoder (VAE) plus the Keras implementation, check this post. The full code is available in my Github repo: https://github.com/wiseodd/generative-models.

## The networks

Let’s begin with importing stuffs.

Now, recall in VAE, there are two networks: encoder $Q(z \vert X)$ and decoder $P(X \vert z)$. So, let’s build our $Q(z \vert X)$ first:

Our $Q(z \vert X)$ is a two layers net, outputting the $\mu$ and $\Sigma$, the parameter of encoded distribution. So, let’s create a function to sample from it:

Let’s construct the decoder $P(z \vert X)$, which is also a two layers net:

Note, the use of b.repeat(X.size(0), 1) is because this Pytorch issue.

## Training

Now, the interesting stuff: training the VAE model. First, as always, at each training step we do forward, loss, backward, and update.

Now, the forward step:

That is it. We just call the functions we defined before. Let’s continue with the loss, which consists of two parts: reconstruction loss and KL-divergence of the encoded distribution:

Backward and update step is as easy as calling a function, as we use Autograd feature from Pytorch:

After that, we could inspect the loss, or maybe visualizing $P(X \vert z)$ to check the progression of the training every now and then.

The full code could be found here: https://github.com/wiseodd/generative-models.