Search⌘ K

Learning of the Model

Explore how mini-batch gradient descent enhances neural network training by speeding up convergence and reducing memory use. Understand the relationship between loss and accuracy during training, and why tuning hyperparameters iteratively is essential for improving model performance. This lesson prepares you to apply practical optimizations in supervised learning.

Recap of neural networks

  • By now, we are familiar with gradient descent. This chapter introduces a souped-up variant of GD called mini-batch gradient descent.

  • Mini-batch gradient descent is slightly more complicated than plain vanilla GD, but as we are about to see, it also tends to converge faster. In simpler terms, mini-batch GD is faster at approaching the minimum loss, speeding up the network’s training. As a bonus, it takes less memory, and sometimes it even finds a better loss than regular GD. In fact, after this chapter, we might never use regular GD again!

  • One might wonder why focusing on training speed when we have more pressing concerns. In particular, the accuracy of our neural network is still disappointing, still, better than a perceptron, but well below our target of 99% on MNIST. Should we not make the network more accurate first, and faster later? After all, as Donald Knuth has said, “premature optimization is the root of all evil

  • However, there’s a reason to speed up training straight away. Within a couple of chapters, we’ll move towards that 99% goal by tuning the network iteratively. We’ll tweak the hyperparameters, train the network and then ...