...

/

Matrix Multiplication

Matrix Multiplication

Learn the purpose of using matrices in neural network.

Why do we use matrices?

If we manually do the calculations for a two-layer network with just two nodes in each layer, that would be enough work. But imagine doing the same for a network with five layers and hundreds of nodes in each layer. Just writing out all the necessary calculations would be a huge task. These can include combinations of combining signals, multiplied by the right weights, and applying the sigmoid activation function for each node and each layer. Clearly, there are too many manual calculations.

How can matrices help? Well, they help us in two ways. First, they allow us to compress writing all those calculations into a very simple short form. The second benefit is that many computer programming languages understand working with matrices, and because the real work is repetitive, they can recognize that and do it quickly and efficiently.

In short, matrices allow us to express the work we need to do concisely and easily, and computers can get the calculations done quickly and efficiently.

What is a matrix?

Now that we know why we’re going to look at matrices, let’s demystify them. A matrix is just a table—a rectangular grid of numbers. That’s it. There’s nothing more complex about a matrix than that. If we’ve used spreadsheets, we’re already comfortable with working with numbers arranged in a grid. Some call it a table. We can call it a matrix too. The following illustration shows a table of numbers.

A B C
3 32 5
5 74 2
8 11 8
2 75 3

That’s all a matrix is, a table or a grid of numbers, just like the following example of a 2 x 3 matrix:

[234322431254]\begin{bmatrix} 23 & 43 & 22\\ 43 & 12 & 54 \end{bmatrix}

It is a convention to use rows first and then columns, so this isn’t a 3×23 \times 2 matrix; it is a 2×32 \times 3 matrix. Additionally, some people will use square brackets around matrices, and others use round brackets.

The values in the matrix don’t have to be numbers. They could be quantities that we give a name to, but may not have assigned an actual numerical value to. So, the following example is a matrix where each element is a variable, which has a meaning and could have a numerical value. We just haven’t defined what the variables are yet.

[longitudeofshiplongitudeofplanelattitudeofshiplattitudeofplane]\begin{bmatrix} \text{longitude}\hspace{1mm}\text{of}\hspace{1mm}\text{ship} & \text{longitude}\hspace{1mm}\text{of}\hspace{1mm}\text{plane} \\ \text{lattitude}\hspace{1mm}\text{of}\hspace{1mm}\text{ship} & \text{lattitude}\hspace{1mm}\text{of}\hspace{1mm}\text{plane} \end{bmatrix}

Multiply two matrices

Matrices become useful to us when we look at how they are multiplied. We may remember how to do this from school, but let’s look at it again. Here’s an example of two simple matrices multiplied together:

[1234][5678]=[(1×5)+(2×7)(1×6)+(2×8)(3×5)+(4×7)(3×6)+(4×8)]=[19224350]\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 & 6\\ 7 & 8 \end{bmatrix}= \begin{bmatrix} (1\times5)+(2\times7) & (1\times6)+(2\times8)\\ (3\times5)+(4\times7) & (3\times6)+(4\times8) \end{bmatrix} \\ = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} ...