Fundamentals of Machine Learning for Software Engineers/

...

Initialize the Weights

Discover why it is important to initialize weights and avoid dead neurons while building the neural network.

We'll cover the following...

Fearful symmetry
Dead neurons
Weight initialization done right

In Part IFrom “How Machine Learning Works” to “The Perceptron” of this course, weight initialization was a quick job, we set all the weights to 0. By contrast, weight initialization in a neural network comes with a hard-to-spot pitfall. Let’s discuss that pitfall, and see how to handle it.

Fearful symmetry

Here is one rule to keep in mind: never initialize all the weights in a neural network with the same value. The reason for that recommendation is subtle, and comes from the matrix multiplications in the network. For example, let’s look at this matrix multiplication below:

We don’t need to remember the details of matrix multiplication (we can review the details in the Multiplying matrices section). The interesting detail in this example is that the numbers in the first matrix are all different, the result has two identical columns because of the uniformity of the second matrix. In general, if the second matrix in the multiplication has the same value in every cell, the result will have the same values in every row.

Now imagine that the first and second matrices are respectively $x$ (the inputs) and $w_1$ (the first layer’s weights) of a neural network. Once the multiplication is done, the resulting matrix passes through a sigmoid and results in the hidden layer $h$ . Now h has the same values in each row, it means that all the hidden nodes of the network have the same value. And when we initialize all the weights with the same value, we force our network to behave as if it has only one hidden node.

If $w_2$ ...

How Machine Learning Works

Our First Learning Program

Walking the Gradient

Hyperspace

A Discern Machine

Get Real

The Final Challenge

The Perceptron

Designing the Network

Building the Network

Training the Network

How Classifiers Work

Batchin’ Up

The Zen of Testing

Let’s Do Development

A Deeper Kind of Network

Diabetes Prediction Using Keras

Defeating Overfitting

Taming Deep Networks

Beyond Vanilla Networks

Into the Deep

Recognize Handwritten Digits Using a Deep Neural Network

Machine Learning Fundamentals

Initialize the Weights

Fearful symmetry