Using Dropout to Combat Overfitting

Explore how dropout techniques enhance neural network models.

A major shortcoming of the baseline model was overfitting. Overfitting is commonly due to a phenomenon found in large models called coadaptation. This can be addressed with dropout. Both the coadaptation issue and its resolution with dropout are explained below.

What’s coadaptation?

If all the weights in a deep learning network are learned together, it‘s usual that some nodes have more predictive capability than others.

In such a scenario, because the network is trained iteratively, these powerful nodes start to suppress the weaker ones. These nodes usually constitute a fraction of all. However, over many iterations, only these powerful nodes are trained and the rest stop participating.

This phenomenon is called coadaptation. It’s difficult to prevent with the traditional L1\mathcal {L_1} and L2\mathcal {L_2} regularization. The reason is that they also regularize based on the predictive capability of the nodes. As a result, the traditional methods become close to deterministic in choosing and rejecting weights. And so, a strong node gets stronger, and the weak gets weaker.

A major fallout of coadaptation is expanding the neural network size does not help.

This had been a severe issue in deep learning for a long time. Then, in around 2012, the idea of dropout, a new regularization approach emerged.

Dropout resolved coadaptation, which naturally revolutionized deep learning. With dropout, deeper and broader networks were possible.

What is dropout?

Dropout changed the approach of learning weights. Instead of learning all the network weights collectively, dropout trains a subset of them in a batch training iteration.

Get hands-on with 1400+ tech skills courses.