Variational autoencoders (VAEs) are one of the most commonly used models for content generation. In simple terms, it is an autoencoder with regularization.
VAEs dive deep into the probabilistic domain, and instead of getting a single output from the encoder, we get a probability distribution for each latent attribute in the
In the case of VAEs, the encoder is also known as the generative network. The decoder is known as the recognition network or the inference network.
Autoencoders are suitable for encoding and decoding, but they don't perform well when we're using them for content generation. To generate content, we take a latent attribute and use a decoder to generate content. Here, a single latent feature is used since this represents one encoding dimension, and the decoder will then use this to recreate an actual input.
This model is prone to overfitting since it may learn the identity function. The identity function means that it may try to map a feature to itself:
In VAEs, the generative network generates a probability distribution. This generative network outputs the mean and variance parameters for each latent attribute which are further used to get the required probability distribution. The decoder now samples from this distribution and reconstructs the original input from these samples.
In statistical terms, the generative model can be defined as the following:
Here,
VAEs are better than autoencoders because they can use the normal distribution to generate inputs that the encoder has not seen. This way, we have a smooth latent space, and now we can leverage it to generate entirely new content by interpolating.
Suppose we have two inputs,
Where
Now we can decode by calculating. This interpolation technique is called latent space interpolation.