Baselines

Learn how to use baselines to help you better assess your models.

In ML, baselines serve as reference models that provide a benchmark for evaluating the performance of more complex models. Baseline models are typically simple and make minimal assumptions about the data.

They provide a starting point for model development and evaluation. They are relatively simple models that establish a minimum level of performance against which other models can be compared. Baselines are useful for assessing whether a more complex model provides significant improvements over a simple reference point.

Baseline models serve several purposes, including the following:

  • Performance evaluation: They provide a baseline against which the performance of more complex models can be compared.

  • Model complexity assessment: Comparing a complex model to a baseline helps determine if the additional complexity is justified by the performance gain.

  • Sanity check: Baselines allow us to verify if our more complex models are learning meaningful patterns in the data.

In real-life scenarios, they often help us decide if using complex ML algorithms is even a wise choice. If we cannot get better results than the simple mean, this suggests that building a complex ML pipeline might not be worth the effort.

In this lesson, we introduce two useful classes provided by scikit-learn: DummyClassifier and DummyRegressor. These classes allow us to create simple baseline models for classification and regression tasks, respectively.

Dummy classifier

The DummyClassifier class in scikit-learn implements a simple baseline strategy for classification tasks. It allows us to create a classifier that makes predictions using simple rules or random guessing.

The DummyClassifier class supports different strategies for generating predictions:

  • stratified: It generates predictions by randomly guessing according to the class distribution in the training data.

  • most_frequent: It always predicts the most frequent class in the training data.

  • uniform: It generates predictions uniformly at random.

  • constant: It always predicts a constant class label specified by the user.

Let’s go through a quick example of how to use the DummyClassifier class:

Get hands-on with 1200+ tech skills courses.