VGG16 (2014)

Learn the fundamentals of VGG image classification architecture with a brief explanation of the model size and flop calculation.

General structure

The general structure of VGG16 is as follows:

  • It contains 16 trainable layers: 13 convolutional and 3 fully connected.

  • Similar to AlexNet, it uses ReLu for hidden layers and softmax for the output layer. SGD with momentum = 0.9, initial learning rate = 0.01 (with the same decreasing method), a weight initialization with Gaussian distribution, and L2 regularization with coefficient = 5x104^4, dropout regularisation with a ratio 0.5 for the first two fully-connected layers are used.

  • Contrary to AlexNet, the same size is used for all the kernels in every convolutional layer, which is a 3x3 kernel with stride = 1 and padding = 1. Even though max pooling layers are the same size as in AlexNet, VGG16 uses 2x2 kernels (with a stride = 2), not 3x3 like in AlexNet. They initialized the biases to 0, not 1. The input image size is 224x224, not 227x227, with a batch size of 256, not 128.

  • It has ~138 million parameters, which is double the size of AlexNet.

Get hands-on with 1200+ tech skills courses.