...

/

Introduction

Introduction

This lesson gives a brief introduction to code vectorization and explains it with the help of an example.

We'll cover the following...

Problem vectorization is much harder than code vectorization because it means that you fundamentally have to rethink your problem in order to make it vectorizable. Most of the time this means you have to use a different algorithm to solve your problem or even worse… to invent a new one. The difficulty is thus to think out-of-the-box.

To illustrate this, let’s consider a simple problem where given two vectors X and Y, we want to compute the sum of X[i]*Y[j] for all pairs of indices i, j.

One simple and obvious solution is given below.

Press + to interact
def compute_python(X, Y):
result = 0
for i in range(len(X)):
for j in range(len(Y)):
result += X[i] * Y[j]
return result
print(compute_python([1,2],[1,2]))

However, this first and naïve implementation requires two loops and we already know it will be slow:

Press + to interact
main.py
tools.py
import numpy as np
from tools import timeit
def compute_python(X, Y):
result = 0
for i in range(len(X)):
for j in range(len(Y)):
result += X[i] * Y[j]
return result
X = np.arange(1000)
timeit("compute_python(X,X)", globals())

How to vectorize the problem then? If you remember your linear algebra course, you may have identified the expression X[i] * Y[j] to be very similar to a matrix product expression. So maybe we could benefit from some NumPy speedup. One wrong solution would be to write:

Press + to interact
def compute_numpy_wrong(X, Y):
return (X*Y).sum()
print(compute_numpy_wrong([1,2],[1,2]))

This is wrong because the X*Y expression will actually compute a new vector Z such that Z[i] = X[i] * Y[i] and this is not what we want. Instead, we can exploit NumPy broadcasting by first reshaping the two vectors and then multiplying them:

Press + to interact
def compute_numpy(X, Y):
Z = X.reshape(len(X),1) * Y.reshape(1,len(Y))
return Z.sum()

Here we have Z[i,j] == X[i,0]*Y[0,j] and if we take the sum over each elements of Z, we get the expected result. Let’s see how much speed we gain in the process:

Press + to interact
main.py
tools.py
import numpy as np
from tools import timeit
def compute_numpy(X, Y):
Z = X.reshape(len(X),1) * Y.reshape(1,len(Y))
return Z.sum()
X = np.arange(1000)
timeit("compute_numpy(X,X)", globals())

This is better, we gained a factor of ~150. But we can do much better.

If you look again and more closely at the pure Python version, you can see that the inner loop is using X[i] that does not depend on the j index, meaning it can be removed ...

Access this course and 1400+ top-rated courses and projects.