Metrics

In this lesson, learn about some of the most commonly used metrics in the Machine Learning task.

What are metric?

Metrics are crucial for Machine Learning tasks. When you define a Machine Learning task, you should define what kind of metrics you would look at. Evaluating the performance of your Machine Learning model is quite important. Without metrics, you don’t know how good or how bad your model is performing. To make matters worse, choosing the wrong metrics can lead you in the wrong direction.

Therefore, it is very important to define a good evaluation index for a Machine Learning task.

For supervised learning, we usually have some recognized metrics. For classification and regression problems, the metrics are different. Even for the same type of tasks, such as classification tasks, there are many metrics to choose from, such as F1-score, AUC, accuracy, and so on. Some metrics are generic, and some can only be used in certain scenarios.

For unsupervised learning, the situation is a little more complicated, and for different tasks, the metrics vary greatly. For example, there are many metrics for clustering, but they are not as universal as those in supervised learning.

sklearn provides a lot of functions that cover many kinds of tasks and scenarios.

Notice: In the following examples, we use logistic regression and linear regression models to demonstrate our models. You can ignore how these models are trained, at the moment. This is explained in the upcoming chapters.

Classification

Confusion matrix

Before talking about the confusion matrix, we should understand some terms defined in a binary classification task.

  • P: The number of real positive cases in the data.
  • N: The number of real negative cases in the data.
  • TP: True Positive: the prediction result is positive, and the real value is positive.
  • TN: True Negative: the prediction result is negative, and the real value is negative.
  • FP: False Positive: the prediction result is positive, and the real value is negative.
  • FN: False Negative: the prediction result is negative, and the real value is positive.

The confusion matrix is a table with two rows and two columns that reports the number of false positives, false negatives, true positives, and true negatives.

sklearn provides a function to output the matrix. It also has a very useful function to help plot the matrix with color.

As you can see from the code below, the confusion_matrix is used to compute the confusion matrix when given the true label and the predicted label. plot_confusion_matrix is used to plot the confusion matrix as a heatmap. Below is one example of a confusion matrix plot.

import sklearn.metrics as metrics

cm = metrics.confusion_matrix(test_y, pred_y)
metrics.plot_confusion_matrix(lr, test_x, test_y)
Press + to interact
import sklearn.datasets as datasets
import sklearn.metrics as metrics
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_confusion_matrix
import matplotlib.pyplot as plt
X, y = datasets.make_classification(1000)
train_x, test_x, train_y, test_y = train_test_split(X,
y,
test_size=0.2,
random_state=42)
lr = LogisticRegression()
lr.fit(train_x, train_y)
pred_y = lr.predict(test_x)
plot_confusion_matrix(lr, test_x, test_y)
plt.savefig('output/plt.png', dpi=300)
cm = metrics.confusion_matrix(test_y, pred_y)
print(cm)
  • From line 6 to line 10, a binary classification dataset is generated and split into two parts. The training set contains 80% of the binary classification dataset.

  • From line 12 to line 13, a logistic regression model is created from LogisticRegression() and trained.

  • We use the model to do the prediction on the test dataset at line 14.

  • line 15 shows how to use confusion_matrix() to calculate the confusion matrix. You need to pass the prediction pred_y (the output of line 14) and the ground truth test_y.

F1-score

In binary classification, the F1 score (also F-score or F-measure) is a measure of a test’s accuracy. It considers both the precision, pp, and the recall, ...