...

Learning of the Model

Learn what the learning of the model is and how we can distinguish between loss and accuracy of a neural network.

We'll cover the following...

Recap of neural networks

By now, we are familiar with gradient descent. This chapter introduces a souped-up variant of GD called mini-batch gradient descent.
Mini-batch gradient descent is slightly more complicated than plain vanilla GD, but as we are about to see, it also tends to converge faster. In simpler terms, mini-batch GD is faster at approaching the minimum loss, speeding up the network’s training. As a bonus, it takes less memory, and sometimes it even finds a better loss than regular GD. In fact, after this chapter, we might never use regular GD again!
One might wonder why focusing on training speed when we have more pressing concerns. In particular, the accuracy of our neural network is still disappointing, still, better than a perceptron, but well below our target of 99% on MNIST. Should we not make the network more accurate first, and faster later? After all, as Donald Knuth has said, “premature optimization is the root of all evil”
However, there’s a reason to speed up training straight away. Within a couple of chapters, we’ll move towards that 99% goal by tuning the network iteratively. We’ll tweak the hyperparameters, train the network and then ...