Upgrade the Learner

Prepare data by adding more dimensions and upgrading the algorithm according to the updated dataset.

After a mathematical detour, we can return to work at hand. We want to upgrade our learning program to deal with multiple input variables. Let’s make a plan of action so that we do not get lost in the process:

  1. First, we’ll load and prepare the multidimensional data, to feed them to the learning algorithm.
  2. After preparing the data, we’ll upgrade all the functions in our code to use the new model. We’ll switch from a line to a more generic weighted sum, as mentioned in Adding More Dimensions.

Prepare data

ML is all about building amazing AIs. The reality is that a large part of the job is preparing data for the learning algorithm. To do that, let’s start from the file that contains our dataset:

In the previous chapters, this file had two columns, which we loaded into two arrays with NumPy’s loadtxt(). Now that we have multiple input variables, XX needs to become a matrix like this:

pizza_3_vars.txt

Each row in XX is an example, and each column is an input variable.

If we load the file with loadtxt(), as we did before, we’ll get a NumPy array for each column:

import numpy as np
x1, x2, x3, y = np.loadtxt("pizza_3_vars.txt", skiprows=1, unpack=True)

Arrays are NumPy’s distinctive feature. They are very flexible objects that can represent anything from a scalar (a single number) to a multidimensional structure. However, that same flexibility makes arrays somewhat hard to grasp at first. We’ll understand how to mold those four arrays into the XX and YY variables. We recommend NumPy’s documentation handy when doing this.

To determine the dimensions of an array, we can use its shape operation:

x1.shape # => (30, )

All four columns have 30 elements, one for each example in pizza_3_vars.txt. That dangling comma in NumPy’s represents g that these arrays have just one dimension

Let’s build the XX matrix by joining the first three arrays together:

X = np.column_stack((x1, x2, x3))
X.shape # => (30, 3)

Here are the first two rows of XX:

X[:2] # => array([[13., 26., 9.], [2., 14., 6.]])

NumPy’s indexes are powerful, and sometimes confusing. The notation [:2][:2] in this code is a shortcut for [0:2][0:2], that means the rows with index from zero to 11 (22 excluded), that is, the first two rows.

Now that we have taken care of XX, let’s look at yy with one-dimensional (30,)(30,) shape.

A useful trick is that we should avoid mixing NumPy matrices and one-dimensional arrays. Code that involves both can have surprising behavior. For this reason, as soon as we have a one-dimensional array, it’s better to reshape it into a matrix with the reshape() function:

Y = y.reshape(-1, 1)

The reshape() takes the dimensions of the new array. If one dimension is -1, then NumPy will set it to whatever makes the other dimensions fit. So the preceding line means that we need to reshape YY so that it’s a matrix with 11 column, and as many rows as we need to fit the current elements. The result is a (30,1)(30,1) matrix:

Y.shape # => (30, 1)

Now our data is neatly arranged into an ...