What is a confusion matrix in machine learning?

The confusion matrix is a method to assess a classification model’s performance. It is a summary of classification problem prediction outcomes.

For example, let’s look at a binary classification problem. It is a 2 x 2 matrix split into two parts: actual values and predicted values.

The following can be deduced from the table above:

  • The target variable can take either a positive or a negative value.
  • The values of the target variable are represented in the columns.
  • The rows represent the predicted values of the target variable.

The actual and predicted values can be further broken down into four parts:

  • True positive (TP)
  • True negative (TN)
  • False Positive (FP)
  • False Negative (FN)

Example

Let’s break this down. We will use the case of a cancer patient as an example, where 1 represents a cancer patient and 0 represents a cancer-free patient.

1. True Positive (TP)

True positive is when the predicted value matches the actual value in a positive way. It is when the actual value and the predicted value are positive. For example, the actual value of a patient is 1 and the predicted value is 1.

2. True Negative (TN)

True negative is when the predicted value matches the actual value in a negative way. It is when the actual value and the predicted value are negative. For example, the actual value of a patient is 0 and the predicted value is 0.

3. False Positive (FP)

A false positive is known as the Type-1 error. It is when the actual value is negative and the predicted value is positive. For example, the actual value of a patient is 0 and the predicted value is 1.

4. False Negative (FN)

A false negative is known as the type-2 error. It is when the actual value is positive and the predicted value is negative. For example, the actual value of a patient is 1 and the predicted value is 0.

Conclusion

The benefit of the confusion matrix is that it gives a summary of the outcomes of the model. For example:

This tells us that:

  • 50 people truly have cancer, and it was predicted correctly.
  • 10 people have cancer, but the predicted value was wrong.
  • 5 people do not have cancer, and the predicted value was wrong.
  • 100 people do not have cancer, and it was predicted correctly.

With that, you should now understand what a confusion matrix is…hopefully it didn’t confuse you.