Overview of learning rate results

The conclusion that we drew from looking at the results of the different learning rates was that it is best if all the curves are equally steep, so the learning rate is closer to optimal for all of them!

Achieving equally steep curves

How do we then achieve equally steep curves? The short answer: you have to “correctly” scale your dataset. Let us now go into depth about how scaling your dataset helps to achieve equally steep curves.

“Bad” feature

First, let us take a look at a slightly modified example, which we would be calling the “bad” dataset:

Over here, we multiplied our feature (x) by 10, so it is in the range [0, 10] now and renamed to bad_x.
But since we do not want the labels (y) to change, we also divided the original true_w parameter by 10 and renamed it bad_w. this way, both bad_w * bad_x and w * x yields the same results.

Press + to interact

How can we fix it? Well, we ruined it by scaling it 10x bigger. Perhaps, we can make it better if we scale it in a different way.

Scaling / standardizing / normalizing

Different how? There is this beautiful thing called the StandardScaler, which transforms a feature in such a way that it ends up with zero mean and unit standard deviation.

How does it achieve that? First, it computes the mean and the standard deviation of a given feature (x) using the training set (N points):

$\bar{X} = \frac{1}{N} \sum_{i=1}^N{x_i}$ ...

Introduction

Visualizing Gradient Descent

A Simple Regression Problem

Rethinking the Training Loop

Going Classy

A Simple Classification Problem

Conclusion

Appendix

Scaling the Dataset

Overview of learning rate results

Achieving equally steep curves

“Bad” feature

Scaling / standardizing / normalizing