What is a confusion matrix in machine learning?

The confusion matrix is a method to assess a classification model’s performance. It is a summary of classification problem prediction outcomes.

For example, let’s look at a binary classification problem. It is a 2 x 2 matrix split into two parts: actual values and predicted values.

The following can be deduced from the table above:

The target variable can take either a positive or a negative value.
The values of the target variable are represented in the columns.
The rows represent the predicted values of the target variable.

The actual and predicted values can be further broken down into four parts:

True positive (TP)
True negative (TN)
False Positive (FP)
False Negative (FN)

Example

Let’s break this down. We will use the case of a cancer patient as an example, where 1 represents a cancer patient and 0 represents a cancer-free patient.

1. True Positive (TP)

True positive is when the predicted value matches the actual value in a positive way. It is when the actual value and the predicted value are positive. For example, the actual value of a patient is 1 and the predicted value is 1.

2. True Negative (TN)

True negative is when the predicted value matches the actual value in a negative way. It is when the actual value and the predicted value are negative. For example, the actual value of a patient is 0 and the predicted value is 0.

3. False Positive (FP)

A false positive is known as the Type-1 error. It is when the actual value is negative and the predicted value is positive. For example, the actual value of a patient is 0 and the predicted value is 1.

4. False Negative (FN)

A false negative is known as the type-2 error. It is when the actual value is positive and the predicted value is negative. For example, the actual value of a patient is 1 and the predicted value is 0.