Generative AI with Python and TensorFlow 2/

...

The Variational Objective

Learn about the elements that allow us to create effective encodings to sample new images from a space of random numerical vectors.

We'll cover the following...

Let’s examine how to optimally compress information in numerical vectors using neural networks. To do so, each element of the vector should encode distinct information from the others, a property we can achieve using a variational objective. This variational objective is the building block for creating VAE networks.

Creating efficient encodings

Let’s start by quantifying more rigorously what makes such an encoding “good” and allows us to recreate images well. We'll need to maximize the posterior:

A problem occurs when the probability of $x$ is extremely high dimensional. As you saw, this can occur in even simple data such as binary MNIST digits, where we have $2^{\text{number of pixels}}$ possible configurations to integrate over (in a mathematical sense of integrating over a probability distribution) to get a measure of the probability of an individual image. In other words, the density $p(x)$ is intractable, making the posterior $p(z|x)$ , which depends on $p(x)$ , likewise intractable.

In some cases, we can use simple cases such as binary units to compute an approximation such as contrastive divergence, which allows us to still compute a gradient even if we can’t calculate a closed form. However, this might also be challenging for very large datasets, where we would need to make many passes over the data to compute an average gradient using contrastive divergenceKingma, Diederik, and Max Welling. 2014. “Auto-Encoding Variational Bayes.” https://arxiv.org/pdf/1312.6114.pdf. (CD).

If we can’t calculate the distribution of our encoder $p(z|x)$ directly, maybe we could optimize an approximation that is close enough—let’s call this $q(z|x)$ . Then, we can use a measure to determine if the distributions are close enough. One useful measure of closeness is whether the two distributions encode similar information; we can quantify information using the Shannon Information equation: