The Variational Objective
Learn about the elements that allow us to create effective encodings to sample new images from a space of random numerical vectors.
Let’s examine how to optimally compress information in numerical vectors using neural networks. To do so, each element of the vector should encode distinct information from the others, a property we can achieve using a variational objective. This variational objective is the building block for creating VAE networks.
Creating efficient encodings
Let’s start by quantifying more rigorously what makes such an encoding “good” and allows us to recreate images well. We'll need to maximize the posterior:
A problem occurs when the probability of
In some cases, we can use simple cases such as binary units to compute an approximation such as contrastive divergence, which allows us to still compute a gradient even if we can’t calculate a closed form. However, this might also be challenging for very large datasets, where we would need to make many passes over the data to compute an average gradient using contrastive
If we can’t calculate the distribution of our encoder
Consider why this is a good measure: as
Therefore, if we want to measure the difference between the information encoded in two distributions,
Finally, if we want to find the expected difference in information between the distributions for all elements of
This quantity is known as the Kullback Leibler (KL) divergence. It has a few interesting properties:
It is not symmetric:
...