VGG16 (2014)

Learn the fundamentals of VGG image classification architecture with a brief explanation of the model size and flop calculation.

General structure

The general structure of VGG16 is as follows:

  • It contains 16 trainable layers: 13 convolutional and 3 fully connected.

  • Similar to AlexNet, it uses ReLu for hidden layers and softmax for the output layer. SGD with momentum = 0.9, initial learning rate = 0.01 (with the same decreasing method), a weight initialization with Gaussian distribution, and L2 regularization with coefficient = 5x104^4, dropout regularisation with a ratio 0.5 for the first two fully-connected layers are used.

  • Contrary to AlexNet, the same size is used for all the kernels in every convolutional layer, which is a 3x3 kernel with stride = 1 and padding = 1. Even though max pooling layers are the same size as in AlexNet, VGG16 uses 2x2 kernels (with a stride = 2), not 3x3 like in AlexNet. They initialized the biases to 0, not 1. The input image size is 224x224, not 227x227, with a batch size of 256, not 128.

  • It has ~138 million parameters, which is double the size of AlexNet.

Comparison with AlexNet

The main structure of VGG and AlexNet is pretty similar except for some little changes. The main difference is that VGG is a deeper neural network than AlexNet. But what is the performance difference between these two models on ImageNet?

VGG vs. AlexNet


Year

Top5 Accuracy

Parameters

FLOP

AlexNet

2012

84.70%

62M

1.5B

VGGNet

2014

92.30%

138M

19.6B

As we can see, with some additional layers and little changes, VGG reaches 8% more accuracy. However the model size and computational cost increase significantly (1.5B = 1.5 billion).

Calculating model parameters

Different structures in a neural network have different calculations for their number of parameters.

Convolution Layers: ((m×n×d)+1)×k((m \times n \times d)+1)\times k, which is basically

We add 11 ...

Access this course and 1400+ top-rated courses and projects.