Skip Connections

Learn about skip connections and the problem they solve.

If you were trying to train a neural network back in 2014, you would have definitely observed the so-called vanishing gradient problem. In simple terms: you are behind the screen checking the training process of your network, and all you see is that the training loss stops decreasing but is still far away from the desired value. You check all your code lines to see if something was wrong all night and you find no clue.

The update rule and the vanishing gradient problem

Let’s revisit ourselves the update rule of gradient descent without momentum, given L to be the loss function and λ\lambda to be the learning rate:

wi=wi+Δwi,w_{i}' = w_{i} + \Delta w_{i} ,

where Δwi=λCΔwi\Delta w_{i} = - \lambda \frac{\partial C}{\partial \Delta w_{i}} ...