Variational Autoencoders

Learn about variational autoencoders and their essential components.

Background

Autoencoders are a special type of neural network architecture and excel in unsupervised learning. They transform high-dimensional input data into a reduced-dimension representation (latent space) and reconstruct the initial format, making them valuable for tasks like dimensionality reduction (data compression) and feature extraction.

In latent space, similar inputs tend to group. For example, while considering the MNIST dataset, similar digits will belong to one group, and so on. A part of latent space will not belong to any of the digits; therefore, when we use such parts as inputs for the encoder, it will produce an output that doesn’t belong to any of our digits. Therefore, the latent space is unsuitable for generating data because after training the encoder, there is no way to know if the random input belongs to valid or invalid groups. This also means that latent space is not regularized. In conclusion, autoencoders are unsuitable for data generation, and can only be used for compression. We use this architecture and modify it to generate data in the following manner.

Variational autoencoders

The variational autoencoder (VAE) tackles the problem of an unregulated latent space in the autoencoder and extends generative potential across the entire space. While the encoder in the autoencoder produces latent vectors, the VAE’s encoder generates parameters of a predetermined distribution for each input in the latent space, enforcing a constraint to ensure it follows a normal distribution, therefore regularizing the latent space.

Major components of VAE

VAEs consist of three key components:

  1. Encoder: It maps input dataXXto a compressed form and outputs the meanμ\muand standard deviationσ \sigmafor the predetermined distribution in the latent space for each input value.

  2. Latent Vector:ZZis sampled fromμ\mu andσ\sigma and fed to the decoder for reconstruction.

  3. Decoder: It reconstructs data from samples drawn from the latent spaceXX', capturing the generative process. It maps latent variable samples back to the data space.

Get hands-on with 1400+ tech skills courses.