Today, generative adversarial Networks (GANs) are state-of-the-art deep learning models. GANs fall under the generative models in the unsupervised learning category. Generative models are a class of machine learning models designed to generate new data that resembles a given training dataset.
Let's further break down the term generative adversarial networks to fully grasp it. In the context of GANs, generative refers to the ability of the model to create new data that resembles a given training dataset. The adversarial aspect of GANs refers to the competitive nature of the training process between two neural networks: the generator and the discriminator. Lastly, networks refer to neural network models.
The two major components of GANs, the generator and the discriminator, can best be understood by examples that simplify the architecture of GANs.
The generator functions similarly to a robber in that it generates fake samples based on the original sample in order to mislead the discriminator to interpret the fake as real. A discriminator, on the other hand, is similar to a police officer in that their responsibility is to find irregularities in the generator samples and label them as fake or real. This rivalry between the two components continues until the degree of perfection is reached, at which point the generator triumphs by fooling the discriminator with phoney data.
Now let's describe the two main components in GANs architecture.
Generator: It is an unsupervised learning approach. It will produce phoney data based on original (actual) data. It is a neural network with hidden layers, activation, and a loss function.
The loss function for the generator is as follows:
Discriminator: As it is a supervised method, it is a straightforward classifier that predicts whether the data is authentic. It is trained on real-world data and feeds back to a generator.
The loss function for the discriminator is as follows:
The goal of the generator is to construct false pictures based on feedback and deceive the discriminator into thinking it cannot foresee a fake image. When the generator fools the discriminator, the training terminates, and we may claim that a generalized GAN model is constructed.
Let's visualize this in the workflow diagram of GANs below:
Note: In an ideal scenario for GANs, the generator always makes flawless copies from the input domain, and the discriminator cannot tell the difference and predicts "unsure" (e.g., 50% for real and fake) in every case.
One of the many major advancements in the use of deep learning domains, such as computer vision, is a technique called data augmentation.
Data augmentation increases model performance by increasing model skill and providing a regularising impact. Hence, it minimizes generalisation errors. It works by creating new, fictional, but plausible cases from the input issue domain on which the model is trained.
In complex domains or domains with insufficient data, generative modelling provides a path to further modelling training. GANs have made significant progress in disciplines such as deep reinforcement learning.
Recent advancements in GANs have led to many types of GANs; let's explore some of these types.
Vanilla GANs: This is the most basic GAN, and its algorithm uses stochastic gradient descent to try to optimise the mathematical equation. The classification and generation of generated images are accomplished by treating the generators and discriminators as simple multi-layer perceptrons.
Conditional GANs (cGANs): cGANs incorporate additional conditioning information to generate samples. By providing both noise input and auxiliary information, such as class labels, cGANs can generate samples conditioned on specific attributes. This enables more targeted generations. For instance, during GAN training, the network receives the images with their actual labels, such as "rose," "sunflower", or "tulip", to help it learn how to distinguish between them.
Deep convolutional GANs (DCGANs): DCGANs leverage deep convolutional neural networks as generator and discriminator architectures. They are specifically designed for image synthesis tasks and have shown improved stability and generation quality compared to traditional GAN architectures.
GANs are used to generate a wide range of data types, including images, music and text. The following are popular real-world examples of GAN:
Image synthesis: GANs can generate realistic and high-resolution images, enabling applications such as virtual scenery creation.
Video generation: GANs can generate realistic video sequences, facilitating applications like video synthesis, video completion,
Text-to-Image Synthesis: GANs can generate images based on textual descriptions, allowing for applications such as generating images from textual prompts that aid content creation
Note: GANs are state-of-the-art models for virtual reality and gaming where they are used to generate life like characters, textures, and landscapes, to enhance the immersion and realism of virtual reality experiences and video games.
In conclusion, GANs have emerged as a powerful framework for generating realistic and diverse synthetic data. Their ability to generate images, videos, and other types of data offers exciting applications, including image synthesis, image-to-image translation, data augmentation, and anomaly detection. However, working with GANs also presents challenges. GAN training can be unstable and sensitive to hyperparameter tuning. Mode collapse, where the generator fails to capture the full diversity of the target distribution, is a common issue. Evaluation metrics for GANs are still an active area of research, making it challenging to objectively assess their performance.
GANs Basics
Which component of GANs is responsible for distinguishing between real and fake samples?
Generator
Discriminator
Encoder
Decoder