Shortcut

Understand how shortcuts can improve performance in large models.

Chapter Goals:

  • Learn about mapping functions and identity mapping
  • Understand the purpose of a shortcut for residual learning
  • Implement a wrapper for the pre-activation function that also returns the shortcut

A. Mapping functions

To understand the intuition behind ResNet, we need to discuss its building blocks. Each ResNet building block takes in an input, x\small \mathbf{x}, and produces some output H(x)\small \mathcal{H}(\mathbf{x}), where H\small \mathcal{H} represents the block’s mapping function. The mapping function, H\small \mathcal{H}, is a mathematical representation of the block itself; it takes in an input and, using the weights within the block, produces an output.

widget

The image above shows the input and output for block B\small B with mapping function H\small \mathcal{H}.

For now, it suffices to know that a block is just a stack of convolution layers (we’ll discuss the inner workings of a block in the next two chapters).

B. Identity mapping

We've previously discussed the issue of degradation, where a model's performance decreases after a certain number of layers are added. Despite its effects, degradation is actually a pretty counter-intuitive problem.

Let's say our model performs well at 40 convolution layers, so we want to add 20 more convolution layers (60 layers total). There is a simple way for the larger model to achieve the same performance as the smaller model: use the same weights as the smaller model for the first 40 layers, then make the last 20 convolution layers an identity mapping.

An identity mapping just means the output for a layer (or set of ...

Access this course and 1400+ top-rated courses and projects.