Variational Autoencoder: Theory
Dive into the mathematics behind variational autoencoders.
We'll cover the following...
In simple terms, a variational autoencoder is a probabilistic version of autoencoders.
Why?
Because we want to be able to sample from the latent vector () space to generate new data, which is not possible with vanilla autoencoders.
Each latent variable that is generated from the input will now represent a probability distribution (or what we call the posterior distribution denoted as ).
All we need to do is find the posterior or solve the inference problem.
In fact, the encoder will try to approximate the posterior by computing another distribution , known as the variational posterior.
Note that a probability distribution is fully characterized by its parameters. In the case of the Gaussian, these are the mean and the standard deviation .
So it is enough to pass the parameters (mean and the standard deviation ) of the normal probability distribution — denoted as in the decoder — instead of simply passing the latent vector like the simple autoencoder.
Then, the decoder will receive the distribution parameters and try to reconstruct the input x. However, this statement is factually incorrect because you cannot compute the gradients of a constantly changing operation (stochastic). In other words, you cannot backpropagate through a sampling operation. This is exactly the heart of learning to train variational autoencoders.
Let’s see how we can make it possible. (Hint: Check the reparameterization trick section below.)
Train a variational autoencoder
First things first.
Since our goal is for the variational posterior to be as close as possible to the true posterior, the following loss function is used to train the model.
...