The confusion matrix is a method to assess a classification model’s performance. It is a summary of classification problem prediction outcomes.
For example, let’s look at a binary classification problem. It is a 2 x 2 matrix split into two parts: actual values and predicted values.
The following can be deduced from the table above:
The actual and predicted values can be further broken down into four parts:
Let’s break this down. We will use the case of a cancer patient as an example, where 1 represents a cancer patient and 0 represents a cancer-free patient.
True positive is when the predicted value matches the actual value in a positive way. It is when the actual value and the predicted value are positive. For example, the actual value of a patient is 1 and the predicted value is 1.
True negative is when the predicted value matches the actual value in a negative way. It is when the actual value and the predicted value are negative. For example, the actual value of a patient is 0 and the predicted value is 0.
A false positive is known as the Type-1 error. It is when the actual value is negative and the predicted value is positive. For example, the actual value of a patient is 0 and the predicted value is 1.
A false negative is known as the type-2 error. It is when the actual value is positive and the predicted value is negative. For example, the actual value of a patient is 1 and the predicted value is 0.
The benefit of the confusion matrix is that it gives a summary of the outcomes of the model. For example:
This tells us that:
With that, you should now understand what a confusion matrix is…hopefully it didn’t confuse you.