Dataset Preparation
Learn how to prepare a dataset that the autoencoder can use to generate swapped deepfake output images.
We'll cover the following
Face swapping is a type of replacement mode operation in deepfake terminology. This setup requires preparing data before training our models to generate swapped fake output images.
Dataset preparation
Since the aim is to develop a face swapper for Nicolas Cage and Donald Trump, we need datasets containing images of each of them. This task of data collection itself can be time-consuming and challenging for a number of reasons. Firstly, photographs could be restricted by licensing and privacy issues. Secondly, it is challenging to find good-quality datasets that are publicly available. Finally, there is the challenge associated with identifying specific faces in photographs, as there could be multiple faces in a given photograph belonging to different people.
For copyright reasons, we cannot publish the training datasets that have been used to obtain the exact output in this chapter, as they have been scraped from a variety of online sources. However, websites that might prove useful for obtaining similar datasets are Deepfakes FaceSwap, Deepfakes dataset, and Deepfake Detection Challenge.
Assuming we already have the raw datasets collected, we can proceed to the next set of tasks: face detection and identification.
The first task is to define an entity class to hold face-related objects. We need such a class as we will need to pass images, extracted faces, and face landmarks, as well as transformations, through the pipeline. We define a class, DetectedFace
, as shown in the following code snippet:
Get hands-on with 1400+ tech skills courses.