A. Decreasing parameters

In order to make a smaller model, we need to decrease the number of weights per convolution layer. There are three ways to decrease the number of weights in a convolution layer:

Decrease the kernel size
Decrease the number of filters used
Decrease the number of input channels

We don't necessarily want to reduce the number of filters we use, since using a variety of filters allows us to extract different hidden features from the input. However, there are ways to decrease the kernel size and number of input channels while still maintaining good model performance.

B. Kernel size

The size of a kernel represents the amount of spatial information it can capture. For example, a 1x1 kernel will only capture the channel information for individual pixels, while a 3x3 kernel will aggregate the information between adjacent pixels within each 3x3 square of the input data.

Although larger kernels can capture more information, it comes at the cost of additional parameters. A convolution layer that uses 3x3 kernels will use 9x as many parameters as a layer that uses 1x1 kernels. A good strategy for balancing performance and parameter count is to use a mix of larger and smaller size kernels.

C. Intermediate layer

The way we decrease the number of input channels is by adding an intermediate convolution layer. Though this may seem counter-intuitive, since adding an extra layer introduces additional kernel weights, it can drastically decrease the number of parameters used in a layer. Consider a convolution layer with 100 filters and 3x3 kernels. Given the input has 50 channels, we can use the equation from chapter 1 to calculate the number of parameters in the layer:

P = 3 \times 3 \times 100 \times 50 + 100 = 45{,}100

What you'll learn from this course

Image Processing

CNN

SqueezeNet

ResNet

Fire Module

Chapter Goals:

A. Decreasing parameters

B. Kernel size

C. Intermediate layer