Replacement Using Autoencoders
Learn how to train an autoencoder to generate swapped fake output images.
We'll cover the following...
The aim of this exercise is to develop a face swapping setup. In this setup, we'll focus on transforming Nicolas Cage (a Hollywood actor) into Donald J. Trump (former US president).
Disclaimer: This demonstration uses deepfake technology to show how images and videos can be manipulated using AI. It is intended purely for educational purposes to explore the capabilities of the technology, not to deceive or mislead. Ethical considerations are crucial when using this technology, and unauthorized use without consent is not supported.
Autoencoder architecture
We have prepared our datasets for both Donald Trump and Nicolas Cage using the tools presented in the previous lesson. Let’s now work toward a model architecture that learns the task of face swapping.
We presented a few common architectures in earlier sections of the course. The encoder-decoder setup is one such setup widely used for deepfake tasks. For our current task of face swapping, we will develop an autoencoder setup to learn and swap faces. As has been the norm, we’ll use TensorFlow and Keras to prepare the required models.
Before we get onto the actual architecture code, let’s briefly recap how this setup works. A typical autoencoder has two components, an encoder and a decoder. The encoder takes an image as input and compresses it down to a lower dimensional space. This compressed representation is called an embedding or bottleneck features. The decoder works in the reverse manner. It takes the embedding vector as input and tries to reconstruct the image as output. In short, an autoencoder can be described as:
The encoder takes
With this brief overview of the autoencoder architecture, let’s get started with developing the required functions for both encoders and decoders. The following snippet shows a function that creates downsampling blocks for the encoder part:
def conv(x, filters):x = Conv2D(filters, kernel_size=5, strides=2, padding='same')(x)x = LeakyReLU(0.1)(x)return x
The downsampling block uses a two-dimensional convolutional Conv2D
layer followed by LeakyReLU
activation. The encoder will use multiple such repeating blocks followed by fully connected and reshaping layers. We finally use an upsampling block to transform the output into an
def upscale(x, filters):x = Conv2D(filters * 4, kernel_size=3, padding='same')(x)x = LeakyReLU(0.1)(x)x = UpSampling2D()(x)return x
The upsampling block is composed of a two-dimensional convolution, LeakyReLU
, and finally an UpSampling2D
layer. We use both the downsampling and upsampling blocks to create the encoder architecture, which is presented in the following snippet:
def Encoder(input_shape, encoder_dim):input_ = Input(shape=input_shape)x = input_x = conv(x, 128)x = conv(x, 256)x = conv(x, 512)x = conv(x, 1024)x = Dense(encoder_dim)(Flatten()(x))x = Dense(4 * 4 * 1024)(x)# Passed flattened X input into 2 dense layers, 1024 and 1024*4*4x = Reshape((4, 4, 1024))(x)# Reshapes X into 4,4,1024x = upscale(x, 128)return Model(input_, x)
The decoder, on the other hand, has a simpler setup. We use a few upsampling blocks followed by a convolutional layer to reconstruct the input image as its output. The following snippet shows the function for the decoder:
def Decoder(input_shape=(8, 8, 512)):input_ = Input(shape=input_shape)x = input_x = upscale(x, 256)x = upscale(x, 128)x = upscale(x, 64)x = Conv2D(3, kernel_size=5, padding='same', activation='sigmoid')(x)return Model(input_, x)
For our task of face swapping, we develop two autoencoders, one for each identity, in other words, one for Donald Trump and one for Nicolas Cage. The only trick is that both ...