Backpropagation Algorithm

Take a look at the mathematics of the backpropagation algorithm.

Neural Networks (NN) are non-linear classifiers that can be formulated as a series of matrix multiplications. Just like linear classifiers, they can be trained using the same principles we followed before, namely the gradient descent algorithm. The difficulty arises in computing the gradients.

But first things first.

Let’s start with a straightforward example of a two-layered NN, with each layer containing just one neuron.

Notations

  • The superscript defines the layer that we are in.
  • oLo^L denotes the activation of layer L.
  • wLw^L is a scalar weight of the layer L.
  • bLb^L is the bias term of layer L.
  • CC is the cost function, tt is our target class, and ff is the activation function.

Forward pass

Our lovely model would look something like this in a simple sketch:

We can write the output of a neuron at layer LL as:

oL=f(wLoL1+bL)o^L =f( w^{L}o^{L-1} +b^L)

To simplify things, let’s define:

zL=wLoL1+bLz^L = w^{L}o^{L-1} +b^L

so that our basic equation will become:

oL=f(zl)o^L=f(z^l)

We also know that our loss function is:

C=(oLt)2C = (o^L - t)^2

This is the so-called forward pass. We take some input and pass it through the network. From the output of the network, we can compute the loss CC.

Backward pass

Backward pass is the process of adjusting the weights ww in all the layers to minimize the loss CC.

To adjust the weights based on the training example, we can use our known update rule:

wtL=wt1LλCwLw^{L}_{t} = w^{L}_{t-1} - \lambda * \frac{\partial C}{\partial w^L}

where λ\lambda ...

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy