Learning of the Model

Learn what the learning of the model is and how we can distinguish between loss and accuracy of a neural network.

Recap of neural networks

  • By now, we are familiar with gradient descent. This chapter introduces a souped-up variant of GD called mini-batch gradient descent.

  • Mini-batch gradient descent is slightly more complicated than plain vanilla GD, but as we are about to see, it also tends to converge faster. In simpler terms, mini-batch GD is faster at approaching the minimum loss, speeding up the network’s training. As a bonus, it takes less memory, and sometimes it even finds a better loss than regular GD. In fact, after this chapter, we might never use regular GD again!

  • One might wonder why focusing on training speed when we have more pressing concerns. In particular, the accuracy of our neural network is still disappointing, still, better than a perceptron, but well below our target of 99% on MNIST. Should we not make the network more accurate first, and faster later? After all, as Donald Knuth has said, “premature optimization is the root of all evil

  • However, there’s a reason to speed up training straight away. Within a couple of chapters, we’ll move towards that 99% goal by tuning the network iteratively. We’ll tweak the hyperparameters, train the network and then do it all over again until we get the desired result. Each of those iterations can take hours. We would find a way to speed them up, otherwise, the tuning process might take days.

  • One might also wonder why we have to look for an alternative algorithm instead of speeding up the algorithm we already have. The answer is, soon enough, we’ll stop writing our own backpropagation code, and switch to highly optimized ML libraries. Instead of optimizing code that we’ll throw away soon, we should better focus on techniques that will stay valid even after we switch to libraries.

Mini-batch gradient descent is one of those techniques. Let’s see what it’s about. But first, let’s review what happens when we train a neural network.

Visualize the learning process

To accelerate training, we need to understand how it works. Let’s take a deeper look at how the neural network’s loss and accuracy change during training.

We know that the loss goes down during training, and the accuracy goes up. Staring at those numbers, however, does not tell the whole story. To visualize the loss and accuracy, we hack the neural network to return two lists—the histories of the loss and the accuracy stored at each iteration. We train the network to collect the two histories and then plot them over as illustrated in the following graphs.

Get hands-on with 1300+ tech skills courses.