Effect of Threshold Selection

Let’s see how threshold influences predictions.

Influence of threshold

When we perform binary classification by default, we assume the threshold to be 0.5. If the prediction score is greater than it, we consider the prediction positive. However, we can manipulate the threshold value to modify model behavior.

For example, if we are very concerned about false positives, we can increase the threshold to make the model more conservative. The prediction will be positive only when the score is high, and the model is very confident about it. Threshold selection is the common technique for dealing with the precision-recall trade-off. By changing its value, we can modify the model characteristics.

Precision/recall curve

Let’s consider the precision/recall example. Imagine we have a trained classifier with a default threshold of 0.5. The model achieves X precision and Y recall. When the value of the threshold is lower, the model makes more positive predictions, increasing recall but decreasing precision. The same holds in reverse when the threshold is increased.

This relationship is commonly visualized as a precision/recall curve. We can generate it as follows:

  1. Select multiple values for threshold: 0, 0.01, 0.02, 0.03, … 0.97, 0.98, 0.99, 1.00.

  2. For each threshold value, generate predictions and compute precision and recall.

  3. Plot each point, precision on the x-axis, recall on the y-axis

Get hands-on with 1200+ tech skills courses.