Scaling the Dataset

Learn how scaling the dataset can have a meaningful impact on gradient descent.

Overview of learning rate results

The conclusion that we drew from looking at the results of the different learning rates was that it is best if all the curves are equally steep, so the learning rate is closer to optimal for all of them!

Achieving equally steep curves

How do we then achieve equally steep curves? The short answer: you have to “correctly” scale your dataset. Let us now go into depth about how scaling your dataset helps to achieve equally steep curves.

“Bad” feature

First, let us take a look at a slightly modified example, which we would be calling the “bad” dataset:

  • Over here, we multiplied our feature (x) by 10, so it is in the range [0, 10] now and renamed to bad_x.

  • But since we do not want the labels (y) to change, we also divided the original true_w parameter by 10 and renamed it bad_w. this way, both bad_w * bad_x and w * x yields the same results.

Press + to interact
true_b = 1
true_w = 2
N = 100
# Data generation
np.random.seed(42)
# We divide w by 10
bad_w = true_w / 10
# And multiply x by 10
bad_x = np.random.rand(N, 1) * 10
# So, the net effect on y is zero - it is still
# the same as before
y = true_b + bad_w * bad_x + (.1 * np.random.randn(N, 1))
# Displaying the bad_w parameter along with the bad_x values (first five)
print("bad_w: {} \n\nbad_x: {}".format(bad_w, bad_x[:5]))

Then, we performed the same split as before for both the original and bad datasets, and plot the training sets side-by-side, as seen below:

Press + to interact
# Generates train and validation sets
# It uses the same train_idx and val_idx as before,
# but it applies to bad_x
bad_x_train, y_train = bad_x[train_idx], y[train_idx]
bad_x_val, y_val = bad_x[val_idx], y[val_idx]
# Displaying the training and validation data (first five)
print("bad_x_train: {} \n\nbad_x_val: {}".format(bad_x_train[:5], bad_x_val[:5]))

The following figure shows the difference between the original training dataset and the bad training dataset:

The only difference between the two plots is the scale of feature x ...

Access this course and 1400+ top-rated courses and projects.