VGG16 (2014)
Learn the fundamentals of VGG image classification architecture with a brief explanation of the model size and flop calculation.
General structure
The general structure of VGG16 is as follows:
-
It contains 16 trainable layers: 13 convolutional and 3 fully connected.
-
Similar to AlexNet, it uses ReLu for hidden layers and softmax for the output layer. SGD with momentum = 0.9, initial learning rate = 0.01 (with the same decreasing method), a weight initialization with Gaussian distribution, and L2 regularization with coefficient = 5x10, dropout regularisation with a ratio 0.5 for the first two fully-connected layers are used.
-
Contrary to AlexNet, the same size is used for all the kernels in every convolutional layer, which is a 3x3 kernel with stride = 1 and padding = 1. Even though max pooling layers are the same size as in AlexNet, VGG16 uses 2x2 kernels (with a stride = 2), not 3x3 like in AlexNet. They initialized the biases to 0, not 1. The input image size is 224x224, not 227x227, with a batch size of 256, not 128.
-
It has ~138 million parameters, which is double the size of AlexNet.
Get hands-on with 1400+ tech skills courses.