Introduction to Deep Learning & Neural Networks/

...

CNN Architectures

Explore the most innovative CNN architectures (available from 2012 until 2021) and their principles.

We'll cover the following...

- AlexNet
- VGG
- InceptionNet/GoogleNet

In this section, we will discuss CNN architectures that stood the test of time. Even though not all of them are still used in recent top-performing architectures, it is important to study them and understand their intuitions.

AlexNet

AlexNet is made up of 5 conv layers starting from an 11x11 kernel. It was the first architecture that employed max-pooling layers, Relu activation functions, and dropout for the 3 enormous linear layers. The network was used for image classification with 1000 possible classes, which for that time was madness (it was introduced in 2012). Now you can implement it in 35 lines of Pytorch code

It was the first convolutional model that was successfully trained on Imagenet, a dataset with 1M training images of 1000 classes.

Press + to interact

Python

import torch
import torch.nn as nn
class AlexNet(nn.Module):
    def __init__(self, num_classes: int = 1000) -> None:
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x
model = AlexNet(num_classes=10)
inp = torch.rand(1,3,128,128)
print(model(inp).shape)

VGG

The famous paper “Very Deep Convolutional Networks for Large-Scale Image Recognition” made the term “deep” viral. It was the first study that provided undeniable evidence that simply adding more layers increases performance. Nonetheless, this assumption holds true up to a certain point. The authors used only 3x3 kernels, as opposed to AlexNet. The architecture was trained using 224 × 224 RGB images.

The main principle is that a stack of three $3 \times 3$ conv layers are similar to a single $7 \times 7$ layer. And maybe even better because they use three non-linear activations in between (instead of one), which makes the function more discriminative.

Secondly, this design decreases the number of parameters. Specifically, you need $3*(3^2)C^2 = 27 \times C^2$ ...

Learn Deep Learning

Neural Networks

Training Neural Networks

Convolutional Neural Networks

Recurrent Neural Networks

Autoencoders

Generative Adversarial Networks

Attention and Transformers

Graph Neural Networks

Conclusion

Final Quiz

CNN Architectures

AlexNet

VGG