Parameter and Loss Function

Understand model training cheat sheet and its different parameters.

Designing a training strategy is just as important if not more than model design. Sometimes, a good training strategy can make a poorly designed model shine. Here, we will talk about the following topics:

  • Parameter initialization

  • Adjusting the loss function

Parameter initialization

Sometimes, one of the most frustrating things about learning about an optimization method from a book/paper and implementing it with code is that the initial state of the machine learning system (initial values of the parameters) can have a great impact on the model’s final performance. It is important to have knowledge of parameter initialization, especially while we’re dealing with deep networks. A good parameter initialization also means that we won’t always rely on batch normalization to keep our parameters in line during training. To quote from the PyTorch documentation:

“A PyTorch Tensor is basically the same as a NumPy array: it does not know anything about deep learning or computational graphs or gradients and is just a generic n-dimensional array to be used for arbitrary numeric computation.”

This is why there can be so many methods, and there will probably be more in the future. There are several popular parameter initialization methods. We won’t go into great detail about some of the methods since they are rather self-explanatory. Note that uniform distributions are often used for fully-connected layers, and normal distributions are often used for convolution layers. Let’s go over some of these now:

  • Uniform (nn.init.uniform_(tensor,a,b)): It initializes tensor with uniform distribution U(a,b)\mathcal{U}(a,b).

  • Normal (nn.init.normal_(tensor, a, b)): It initializes tensor with normal distribution N(a,b2)\mathscr{N}(a,b^2).

  • Xavier-uniform (nn.init.xavier_uniform_(tensor)): It initializes tensor with uniform distribution U(a,a)\mathcal{U}(-a,a), where we have the following equation:  

Get hands-on with 1400+ tech skills courses.