The F1-score combines the precision and recall of a classifier into a single metric by taking their harmonic mean. It is primarily used to compare the performance of two classifiers. Suppose that classifier A has a higher recall, and classifier B has higher precision. In this case, the F1-scores for both the classifiers can be used to determine which one produces better results.
The F1-score of a classification model is calculated as follows:
= the precision
= the recall of the classification model
Consider the following confusion matrix that corresponds to a binary classifier:
As computed earlier, the precision of the classifier equals , and the recall equals . From these values, we can caluclate that the F1-score equals:
Assume that we have calculated the following values:
The F1-score for:
class A
class B
class C
From the calculations above, we can see that the classifier works best for class A.
One way to calculate the F1-score for the entire model is to take the arithmetic mean of the F1-scores of all the classes.