Prepare Data: Random Initial Weights
Learn how to initialize the weights randomly, and to address the problems in initial weight selection.
We'll cover the following...
Random initiazation of weights
The same argument applies here as with the inputs and outputs. We should avoid large initial weights because they cause large signals into an activation function, leading to the saturation we just talked about, and the reduced ability to learn better weights.
We could choose initial weights randomly and uniformly from a range of to . That would be a much better idea than using a very large range, say to .
Mathematicians and computer scientists have done the math to work out a rule of thumb for setting the random initial weights given specific shapes of networks and with specific activation functions.
We won’t go into the details of that, but the core idea is that if we have many signals into a node, which we do in a neural network, and if these signals are already well behaved and are not too large or randomly distributed, then the weights should support keeping those signals well behaved because they are combined and the activation function ...