Learning of the Model
Learn what the learning of the model is and how we can distinguish between loss and accuracy of a neural network.
We'll cover the following...
Recap of neural networks
-
By now, we are familiar with gradient descent. This chapter introduces a souped-up variant of GD called mini-batch gradient descent.
-
Mini-batch gradient descent is slightly more complicated than plain vanilla GD, but as we are about to see, it also tends to converge faster. In simpler terms, mini-batch GD is faster at approaching the minimum loss, speeding up the network’s training. As a bonus, it takes less memory, and sometimes it even finds a better loss than regular GD. In fact, after this chapter, we might never use regular GD again!
-
One might wonder why focusing on training speed when we have more pressing concerns. In particular, the accuracy of our neural network is still disappointing, still, better than a perceptron, but well below our target of 99% on MNIST. Should we not make the network more accurate first, and faster later? After all, as Donald Knuth has said, “premature optimization is the root of all evil”
-
However, there’s a reason to speed up training straight away. Within a couple of chapters, we’ll move towards that 99% goal by tuning the network iteratively. We’ll tweak the hyperparameters, train the network and then ...