Dropout

Learn about dropout and how it can reduce overfitting in large neural networks.

Chapter Goals:

  • Understand why we use dropout in neural networks
  • Apply dropout to a fully-connected layer

A. Co-adaptation

Co-adaptation refers to when multiple neurons in a layer extract the same, or very similar, hidden features from the input data. This can happen when the connection weights for two different neurons are nearly identical.

An example of co-adaptation between neurons A and B. Due to identical weights, A and B will pass the same value into C.
An example of co-adaptation between neurons A and B. Due to identical weights, A and B will pass the same value into C.

When a fully-connected layer has a large number of neurons, co-adaptation is more likely to occur. This can be a problem for two reasons. First, it is a waste of computation when we have redundant neurons computing the same output. Second, if many neurons are extracting the same features, it adds more significance to those features for our model. This leads to overfitting if the duplicate extracted features are specific to only the training set.

B. Dropout

The way we minimize co-adaptation for fully-connected layers with many neurons is by applying dropout during training. In dropout, we randomly shut down some fraction of a layer’s neurons at each training step by zeroing out the neuron values. The fraction of neurons to be zero’d out is known as the dropout rate, ...