Model Evaluation

Baseline accuracy is a crucial calculation when evaluating our trained model’s performance.

The baseline accuracy

Baseline accuracy is the accuracy that a model can achieve by simply guessing the majority class for every observation.

Formula: baseline accuracy=majority class Ntotal N\small \text{baseline accuracy} = \frac{\text{majority class}\ N}{\text{total}\ N}

A typical guess by a human is inclined to have about 50% accuracy, an equal chance for both classes in a binary class classification problem. This is only true if we have both classes in the same ratio or if a multiclass classification problem has the majority class making up ~50% of the labels.

Note: A rule of thumb is that baseline accuracy can never be below 50%.

In a real life binary class problem, the datasets are not balanced, and we have a baseline accuracy higher than 50%. For example, out of 100 observations, if 70 belong to class 1 and 30 belong to class 0, the baseline accuracy would be 70%. We don’t really need to create a model with accuracy lower than the baseline.

If 99% of our observations (extremely unbalanced data) belong to class 1, a model can predict 99% of them correctly. Quality data is important, and a model with 99% accuracy could be an unusable model. Well, we can easily find the baseline accuracy using value_counts().

Get hands-on with 1400+ tech skills courses.