CelebA Dataset
Learn about the CelebA Dataset.
We'll cover the following...
A real challenge for training GANs is ensuring there is enough training data. We can’t train a GAN to generate human faces with just tens, or even hundreds of faces. Luckily, there is a popular dataset we can use called the CelebA dataset, which contains 202,599 photos of celebrity faces, aligned and cropped so the eyes and mouth are roughly centered.
You can read more about the CelebA dataset and see sample images on its home site here.
⚠️ The dataset is intended to be used only for non-commercial research and educational use.
We won’t reproduce any images directly from the database itself and only show GAN generated images. This is a good excuse to draw cartoon faces and show those instead!
Have a look at some of the images from the VGGface2 dataset instead to see what the images look like.
Hierarchical data format
The CelebA dataset contains thousands of separate image files in JPEG format. We could unpack them into a folder and have our GAN code open and close all of these images separately as we worked through the training data. This would work, but would be very slow as opening and closing thousands of image files individually is not very efficient. The problem is made much worse if we use a mounted Google Drive because the path between our code and the data itself is even less direct.
To help ease these performance challenges, we can package the data into a format that is designed to support this kind of repeated access more ...