What are AUC and ROC?

Background

For classification problems, accuracyA measure that gives us information about the correct predictions of a classifier. has been widely used. But it's not optimal in every case. For example, most of the classifiers give us a probability distribution for all the classes we have. Accuracy doesn't make use of this information. In addition, accuracy is also not good for an imbalanced dataset. To understand it better, consider the example below.

Let's say that our data has 900 samples of the positive class and only 100 samples of the negative class. Suppose that our model predicts positive classes for every sample. The accuracy for such a model would be:

Even though our model is performing poorly, we're still getting a high accuracy which is misleading.

For binary classification problems, area under the curve (AUC) turns out to be a better alternative for accuracy. The curve we refer to is the ROC (receiver operating characteristic) curve.

ROC curve

The ROC curve is a plot between true positive rate (TPR) and false positive rate (FPR). Before we define these, we must introduce some terms. In binary classification, there can be four possible outcomes from the model classification. Which are as follows:

  • The classification of positive data samples was correctly classified. This is known as true positive (TP).

  • The classification of negative data samples was correctly classified. This is known as true negative (TN).

  • The classification of positive data samples was incorrectly classified. This is known as a false positive (FP).

  • The classification of negative data samples was incorrectly classified. This is known as a false negative (FN).

This is illustrated by a plot called the confusion matrix. It is given below:

A confusion matrix of binary classification problem

True positive rate (TPR)

We define TPR as:

It is also known as sensitivity or recall. It gives us a numeric representation of what number of positive data samples were correctly classified.

False positive rate (FPR)

We define FPR as:

It gives us a numeric representation of what number of negative data samples were incorrectly classified.

Intuition

For classifiers that generate probability outcomes, we set up a threshold based on which we will classify our data samples. Usually, this threshold is chosen to be 0.50.5. But in some applications, it is desirable to control this threshold, so the model performs as required. For example, in a cancer classification problem, we would like to keep the false positives as minimum as possible. This can be achieved by changing the thresholds.

By changing thresholds for a classifier, we can calculate TPR and FPR. Then we can plot the ROC curve for it.

AUC

As the name suggests, AUC is the area under the ROC curve. It gives us a summary of the ROC curve. It is also helpful in comparing the ROC curves of different classifiers.

Example

To understand the concept better, let's take a dataset of 10 data samples with five positive and five negative examples. A classifier predicts y^\hat{y} probabilities, and its classifications based on the thresholds of 0.3,0.50.3, 0.5, and 0.7 are given below:

y (actual)

y (predicted)

>= 0.3

>= 0.5

>= 0.7

1

0.91

1

1

1

1

0.94

1

1

1

1

0.87

1

1

1

1

0.73

1

1

1

1

0.64

1

1

0

0

0.52

1

1

0

0

0.44

1

0

0

0

0.39

1

0

0

0

0.26

0

0

0

0

0.17

0

0

0

TP

5

5

4

TN

2

4

5

FP

0

0

1

FN

3

1

0

TPR

0.625

0.833

1.0

FPR

0.0

0.0

0.167

The ROC curve for this example is as follows:

The ROC curve
The ROC curve

The AUC for this example is:

The value of AUC lies between 0.00.0 to 1.01.0. A value of 1.01.0 means that all positive classes were classified correctly, and 0.00.0 means that all positive classes were incorrectly classified as negative. So we can say that this classifier is doing a good job.

Conclusion

As mentioned earlier, accuracy is not a good measure in some cases like dealing with an imbalanced dataset. Using AUC and ROC can work well in such scenarios. But in the case of a balanced dataset, accuracy is a good enough measure and some people use only accuracy as their only performance metric.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved