Variational Autoencoders
Learn about variational autoencoders and their essential components.
We'll cover the following
Background
Autoencoders are a special type of neural network architecture and excel in unsupervised learning. They transform high-dimensional input data into a reduced-dimension representation (latent space) and reconstruct the initial format, making them valuable for tasks like dimensionality reduction (data compression) and feature extraction.
In latent space, similar inputs tend to group. For example, while considering the MNIST dataset, similar digits will belong to one group, and so on. A part of latent space will not belong to any of the digits; therefore, when we use such parts as inputs for the encoder, it will produce an output that doesn’t belong to any of our digits. Therefore, the latent space is unsuitable for generating data because after training the encoder, there is no way to know if the random input belongs to valid or invalid groups. This also means that latent space is not regularized. In conclusion, autoencoders are unsuitable for data generation, and can only be used for compression. We use this architecture and modify it to generate data in the following manner.
Variational autoencoders
The variational autoencoder (VAE) tackles the problem of an unregulated latent space in the autoencoder and extends generative potential across the entire space. While the encoder in the autoencoder produces latent vectors, the VAE’s encoder generates parameters of a predetermined distribution for each input in the latent space, enforcing a constraint to ensure it follows a normal distribution, therefore regularizing the latent space.
Major components of VAE
VAEs consist of three key components:
Encoder: It maps input data
to a compressed form and outputs the mean and standard deviation for the predetermined distribution in the latent space for each input value. Latent Vector:
is sampled from and and fed to the decoder for reconstruction. Decoder: It reconstructs data from samples drawn from the latent space
, capturing the generative process. It maps latent variable samples back to the data space.
Get hands-on with 1200+ tech skills courses.