For classification problems,
Let's say that our data has 900 samples of the positive class and only 100 samples of the negative class. Suppose that our model predicts positive classes for every sample. The accuracy for such a model would be:
Even though our model is performing poorly, we're still getting a high accuracy which is misleading.
For binary classification problems, area under the curve (AUC) turns out to be a better alternative for accuracy. The curve we refer to is the ROC (receiver operating characteristic) curve.
The ROC curve is a plot between true positive rate (TPR) and false positive rate (FPR). Before we define these, we must introduce some terms. In binary classification, there can be four possible outcomes from the model classification. Which are as follows:
The classification of positive data samples was correctly classified. This is known as true positive (TP).
The classification of negative data samples was correctly classified. This is known as true negative (TN).
The classification of positive data samples was incorrectly classified. This is known as a false positive (FP).
The classification of negative data samples was incorrectly classified. This is known as a false negative (FN).
This is illustrated by a plot called the confusion matrix. It is given below:
We define TPR as:
It is also known as sensitivity or recall. It gives us a numeric representation of what number of positive data samples were correctly classified.
We define FPR as:
It gives us a numeric representation of what number of negative data samples were incorrectly classified.
For classifiers that generate probability outcomes, we set up a threshold based on which we will classify our data samples. Usually, this threshold is chosen to be
By changing thresholds for a classifier, we can calculate TPR and FPR. Then we can plot the ROC curve for it.
As the name suggests, AUC is the area under the ROC curve. It gives us a summary of the ROC curve. It is also helpful in comparing the ROC curves of different classifiers.
To understand the concept better, let's take a dataset of 10 data samples with five positive and five negative examples. A classifier predicts
y (actual) | y (predicted) | >= 0.3 | >= 0.5 | >= 0.7 |
1 | 0.91 | 1 | 1 | 1 |
1 | 0.94 | 1 | 1 | 1 |
1 | 0.87 | 1 | 1 | 1 |
1 | 0.73 | 1 | 1 | 1 |
1 | 0.64 | 1 | 1 | 0 |
0 | 0.52 | 1 | 1 | 0 |
0 | 0.44 | 1 | 0 | 0 |
0 | 0.39 | 1 | 0 | 0 |
0 | 0.26 | 0 | 0 | 0 |
0 | 0.17 | 0 | 0 | 0 |
TP | 5 | 5 | 4 | |
TN | 2 | 4 | 5 | |
FP | 0 | 0 | 1 | |
FN | 3 | 1 | 0 | |
TPR | 0.625 | 0.833 | 1.0 | |
FPR | 0.0 | 0.0 | 0.167 |
The ROC curve for this example is as follows:
The AUC for this example is:
The value of AUC lies between
As mentioned earlier, accuracy is not a good measure in some cases like dealing with an imbalanced dataset. Using AUC and ROC can work well in such scenarios. But in the case of a balanced dataset, accuracy is a good enough measure and some people use only accuracy as their only performance metric.
Free Resources