Fire Module
Learn about the central component of SqueezeNet, the fire module.
We'll cover the following...
Chapter Goals:
- Learn strategies for decreasing the number of parameters in a model
- Understand how the fire module works and why it's effective
- Write your own fire module function
A. Decreasing parameters
In order to make a smaller model, we need to decrease the number of weights per convolution layer. There are three ways to decrease the number of weights in a convolution layer:
- Decrease the kernel size
- Decrease the number of filters used
- Decrease the number of input channels
B. Kernel size
The size of a kernel represents the amount of spatial information it can capture. For example, a 1x1 kernel will only capture the channel information for individual pixels, while a 3x3 kernel will aggregate the information between adjacent pixels within each 3x3 square of the input data.
Although larger kernels can capture more information, it comes at the cost of additional parameters. A convolution layer that uses 3x3 kernels will use 9x as many parameters as a layer that uses 1x1 kernels. A good strategy for balancing performance and parameter count is to use a mix of larger and smaller size kernels.
C. Intermediate layer
The way we decrease the number of input channels is by adding an intermediate convolution layer. Though this may seem counter-intuitive, since adding an extra layer introduces additional kernel weights, it can drastically decrease the number of parameters used in a layer. Consider a convolution layer with 100 filters and 3x3 kernels. Given the input has 50 channels, we can use the equation from chapter 1 to calculate the number of parameters in the layer:
...