CelebA CNN

Learn to create a convolutional GAN using CelebA, which was used in designing architectures for the discriminator and the generator to training the GAN.

Now that we’ve practiced using convolution layers to make a classifier, let’s use them for a GAN.

We’ll start with the code we developed for the CelebA GAN here.

The CelebA images are rectangular 217 by 178 pixels in size. To keep our convolutions simple, we’ll work with square 128 by 128 images. This means that we’ll need to crop the training images to this size.

Helper function

The following code is a helper function for cropping a numpy image array to a given size, with the crop centered on the supplied image.

Press + to interact
def crop_centre(img, new_width, new_height):
height, width, _ = img.shape
startx = width//2 - new_width//2
starty = height//2 - new_height//2
return img[ starty:starty + new_height, startx:startx + new_width, :]

To create a square 128 by 128 image from the center of a larger image img, we’d use crop_centre(img, 128, 128).

The Dataset class

We’ll need to move our helper functions above the Dataset class in our notebook because we’ll need to use crop_centre() in its definition. The following code updates the __getitem__() and plot_image() methods. In both cases, an image is retrieved from the HDF5 dataset and then cropped to a 128 by 128 square.

Press + to interact
def __getitem__(self, index):
if (index >= len(self.dataset)):
raise IndexError()
img = numpy.array(self.dataset[str(index)+'.jpg'])
# crop to 128x128 square
img = crop_centre(img, 128, 128)
return torch.cuda.FloatTensor(img).permute(2,0,1).view(1,3,128,128) / 255.0
def plot_image(self, index):
img = numpy.array(self.dataset[str(index)+'.jpg'])
# crop to 128x128 square
img = crop_centre(img, 128, 128)
plt.imshow(img, interpolation='nearest')
pass

The __getitem__() needs to return a tensor, and we know it now needs to be a 4-dimensional tensor of the form (batch size, channels, height, width).

The NumPy arrays are 3-dimensional of the form (height, width, 3). The permute(2,0,1) reorders the NumPy array to (3, height, width), and view(1,3,128,128) adds an additional dimension for batch size, set to 1.

Let’s check that the Dataset class crops the images correctly.

We can see the images are indeed cropped to a smaller 128 by 128 square.

💡 How to select the output image size?

Unlike fully connected layers, the size of what emerges from a convolution layer isn’t immediately obvious.

  • A pencil and paper is really helpful when designing convolutional networks. Some people like to work it out by drawing sketches of the input tensors, kernels, and strides, like we did earlier for very small tensors.

  • Some will just experiment with the code, using the error messages to guide how they adjust the kernel and stride sizes.

  • Some will use formulae like those in Appendix C or those on the PyTorch reference page for nn.Conv2d() to directly calculate the size of the output.

💡 How many layers should our network have? How many kernels should we have in the middle layers?

There is no simple answer to this question.

We should try to build the smallest possible network to make training easier, but not so small that it doesn’t have the capacity to learn the given task. We should also be mindful of whether our network is trying to reduce data, like a classifier, or expand data like a generator, as this will guide the shape of our networks.

The discriminator

Let’s begin by defining the network architecture of the discriminator.

The discriminator network architecture

The following network has 3 convolutional layers and a final fully connected layer.

  • The first convolutional layer takes the 3-channel color images and applies 256 kernels to output 256 feature maps. Because the kernels ...