AlexNet (2012)

Learn the fundamentals of the AlexNet image classification architecture.

General structure

The general structure of AlexNet is as follows:

AlexNet is the image classification architecture that won the LSVRC competition in 2012.

  • The AlexNet contains eight trainable layers: five convolutional in the middle and three fully connected at the end.

  • 3 maximum pooling layers spread between convolutional layers.

  • ReLU activation function is used after each layer except for the last one.

  • For the last layer, a softmax activation function is used to obtain predictions as probabilities.

  • A dropout mechanism is used with a rate = 0.5.

  • To initialize the weights, a zero-mean gaussian distribution is used (also called normal distribution) using standard deviation = 0.01.

  • The biases are initialized with a constant value of 1.

  • The learning rate is initialized with 0.01 and divided by 10 every time the validation error rate stops improving.

  • Stochastic gradient descent with momentum is used with momentum = 0.9 and batch size = 128.

  • L2 regularization is used.

  • It’s a model created for 227x227 RGB images for 1000 classes of the ImageNet dataset. It contains ~60 million parameters. A simple view of the architecture is as follows:

Press + to interact
AlexNet architecture with output feature maps sizes
AlexNet architecture with output feature maps sizes

The above figure illustrates the AlexNet architecture. We have an input image 227x227x3 (the model joined to the competition was designed for 227x227x3 input images.) We see the convolution and max pooling operations with their kernel size, padding, and stride. Additionally, above each arrow, we see the output feature map dimensions as width, height, and channel. In the end, we have three fully connected layers with their dropout rates.

Softmax activation function

Even though the different types of activation functions and their advantages—disadvantages are not the subject of this course, we will discuss softmax. The softmax activation function is the most common one for classification architectures to use at the end, and it’s essential to understand how it works.

Softmax is a particular type that guarantees the sum of all the output nodes will be 1 while the values of output nodes will stay in the range [0, 1]. Therefore, softmax converts the problem to a probability case, which is very useful for classification problems when it’s used as the activation function of the last layer.

Press + to interact
Softmax activation converting the outputs to probabilities
Softmax activation converting the outputs to probabilities

After the softmax output, it’s about choosing the class having the maximum probability as the final prediction.

Dropout mechanism

The dropout mechanism is a regularization technique where we determine a proportion of the nodes that the model will ignore during one iteration of the training step. For example, we have a dropout proportion of 0.3 and 10 nodes in our model; in every iteration, the model chooses three nodes to ignore and makes the forward calculation without considering them. In the next iteration, these three nodes to be ignored are changed randomly.

It is an important regularization because it helps address the overfitting problem in neural networks. Overfitting occurs when the model performs well on the training data but fails to generalize to unseen data.

The ...

Access this course and 1400+ top-rated courses and projects.