What are variational autoencoders (VAEs)?

Introduction

Variational autoencoders (VAEs) are one of the most commonly used models for content generation. In simple terms, it is an autoencoder with regularization.

VAEs dive deep into the probabilistic domain, and instead of getting a single output from the encoder, we get a probability distribution for each latent attribute in the latent space.When we compress the data, similar data points are placed together in a space with reduced dimensions. This space is called latent space. This distribution is used in combination with the decoder to generate new content.

In the case of VAEs, the encoder is also known as the generative network. The decoder is known as the recognition network or the inference network.

Need for VAEs

Autoencoders are suitable for encoding and decoding, but they don't perform well when we're using them for content generation. To generate content, we take a latent attribute and use a decoder to generate content. Here, a single latent feature is used since this represents one encoding dimension, and the decoder will then use this to recreate an actual input.

An autoencoder model

This model is prone to overfitting since it may learn the identity function. The identity function means that it may try to map a feature to itself:

Content generation using the trained decoder of an autoencoder

Variational autoencoders (VAEs)

In VAEs, the generative network generates a probability distribution. This generative network outputs the mean and variance parameters for each latent attribute which are further used to get the required probability distribution. The decoder now samples from this distribution and reconstructs the original input from these samples.

In statistical terms, the generative model can be defined as the following:

Here, x x represents the input and the variablez z represents the latent space embeddings. pθ(z) p_\theta(z) and pθ(xz) p_\theta(x|z) are usually normal and exponential distributions respectively. The recognition model performs approximate posterior inferenceThis is the method of approximating the uncertainty in our estimations of the parameters. and can be defined as the following:

VAEs are better than autoencoders because they can use the normal distribution to generate inputs that the encoder has not seen. This way, we have a smooth latent space, and now we can leverage it to generate entirely new content by interpolating.

Latent space interpolation

Suppose we have two inputs, x1 x_1 and x2 x_2 . Let z1=Eq(zx1)[z] z_1 = E_{q(z|x_1)} [z] and z2=Eq(zx2)[z] z_2 = E_{q(z|x_2)}[z] be the corresponding latent attribute. We can now generate the new content by the following interpolation:

Where 0λ1 0 \le \lambda \le 1

Now we can decode by calculating. This interpolation technique is called latent space interpolation.

The workflow of a VAE

Copyright ©2024 Educative, Inc. All rights reserved