Pre-activation

Learn why batch normalization and pre-activation are important in ResNet.

Chapter Goals:

  • Learn about internal covariate shift and batch normalization
  • Understand the difference between pre-activation and post-activation
  • Implement the pre-activation function

A. Internal covariate shift

When training models with many weight layers, e.g. ResNet, a problem known as internal covariate shift can occur. To understand this problem, we first define a related problem known as covariate shift.

A covariate shift occurs when the input data's distribution changes and the model cannot handle the change properly. An example would be if a model was trained to classify between different dog breeds, with a training set of only images of brown dogs. Then if we test the model on images of yellow dogs, the performance may not be as good as we expected.

In this case, the model's original input distribution was limited to just brown dogs, and changing the input distribution to a different color of dogs introduced covariate shift.

An internal covariate shift is essentially just a covariate shift that happens between layers of a model. Since the input of one layer is the output of the previous layer, the input distribution for a layer is the same as the output distribution of the previous layer.

Because the output distribution of a layer depends on its weights, and the weights of a model are constantly being updated, each layer's output distribution will constantly change (though by incremental amounts). In a model without many layers, the incremental changes in layer distributions don't really have much impact. However, in models ...

Access this course and 1400+ top-rated courses and projects.