Initialize the Weights

Discover why it is important to initialize weights and avoid dead neurons while building the neural network.

In Part IFrom “How Machine Learning Works” to “The Perceptron” of this course, weight initialization was a quick job, we set all the weights to 0. By contrast, weight initialization in a neural network comes with a hard-to-spot pitfall. Let’s discuss that pitfall, and see how to handle it.

Fearful symmetry

Here is one rule to keep in mind: never initialize all the weights in a neural network with the same value. The reason for that recommendation is subtle, and comes from the matrix multiplications in the network. For example, let’s look at this matrix multiplication below:

We don’t need to remember the details of matrix multiplication (we can review the details in the Multiplying matrices section). The interesting detail in this example is that the numbers in the first matrix are all different, the result has two identical columns because of the uniformity of the second matrix. In general, if the second matrix in the multiplication has the same value in every cell, the result will have the same values in every row.

Now imagine that the first and second matrices are respectively xx (the inputs) and w1w_1 (the first layer’s weights) of a neural network. Once the multiplication is done, the resulting matrix passes through a sigmoid and results in the hidden layer hh. Now h has the same values in each row, it means that all the hidden nodes of the network have the same value. And when we initialize all the weights with the same value, we force ...

Access this course and 1400+ top-rated courses and projects.