Parameter and Loss Function
Understand model training cheat sheet and its different parameters.
We'll cover the following
Designing a training strategy is just as important if not more than model design. Sometimes, a good training strategy can make a poorly designed model shine. Here, we will talk about the following topics:
Parameter initialization
Adjusting the loss function
Parameter initialization
Sometimes, one of the most frustrating things about learning about an optimization method from a book/paper and implementing it with code is that the initial state of the machine learning system (initial values of the parameters) can have a great impact on the model’s final performance. It is important to have knowledge of parameter initialization, especially while we’re dealing with deep networks. A good parameter initialization also means that we won’t always rely on batch normalization to keep our parameters in line during training. To quote from the PyTorch documentation:
“A PyTorch Tensor is basically the same as a NumPy array: it does not know anything about deep learning or computational graphs or gradients and is just a generic n-dimensional array to be used for arbitrary numeric computation.”
This is why there can be so many methods, and there will probably be more in the future. There are several popular parameter initialization methods. We won’t go into great detail about some of the methods since they are rather self-explanatory. Note that uniform distributions are often used for fully-connected layers, and normal distributions are often used for convolution layers. Let’s go over some of these now:
Uniform (
nn.init.uniform_(tensor,a,b)
): It initializestensor
with uniform distribution. Normal (
nn.init.normal_(tensor, a, b)
): It initializestensor
with normal distribution. Xavier-uniform (
nn.init.xavier_uniform_(tensor)
): It initializestensor
with uniform distribution, where we have the following equation:
Get hands-on with 1400+ tech skills courses.