Learning of the Model

Learn what the learning of the model is and how we can distinguish between loss and accuracy of a neural network.

Recap of neural networks

  • By now, we are familiar with gradient descent. This chapter introduces a souped-up variant of GD called mini-batch gradient descent.

  • Mini-batch gradient descent is slightly more complicated than plain vanilla GD, but as we are about to see, it also tends to converge faster. In simpler terms, mini-batch GD is faster at approaching the minimum loss, speeding up the network’s training. As a bonus, it takes less memory, and sometimes it even finds a better loss than regular GD. In fact, after this chapter, we might never use regular GD again!

  • One might wonder why focusing on training speed when we have more pressing concerns. In particular, the accuracy of our neural network is still disappointing, still, better than a perceptron, but well below our target of 99% on MNIST. Should we not make the network more accurate first, and faster later? After all, as Donald Knuth has said, “premature optimization is the root of all evil

  • However, there’s a reason to speed up training straight away. Within a couple of chapters, we’ll move towards that 99% goal by tuning the network iteratively. We’ll tweak the hyperparameters, train the network and then ...