Evaluating Machine Learning Models

Learn how to evaluate, compare, and optimize machine learning models.

In this lesson, we’ll learn about different evaluation metrics for machine learning classification and regression problems. It’s crucial to choose the right metric for the desired outcome. The metrics vary in objectives and must be carefully selected for the specific use case.

Evaluating a regression model

Regression models predict a continuous output variable. Some commonly used metrics in regression models are:

  • Mean squared error (MSE): MSE measures the average squared difference between the predicted and actual values, where a lower value of MSE indicates a better fit.

  • Root-mean-square error (RMSE): RMSE is the square root of MSE. Like MSE, it measures the average deviation between predicted and actual values. MSE and RMSE are very sensitive to outliers.

  • Mean absolute error (MAE): MAE measures the average absolute difference between predicted and actual values. It is less sensitive to outliers than MSE.

  • R-squared (R²): R² measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges between 0 and 1, and a higher value indicates a better fit model.

  • Mean absolute percentage error (MAPE): MAPE is an error metric based on percentages. It measures the average absolute percentage difference between predicted and actual values and is used for cases where percentage errors are more important than absolute errors.

Let’s say we are predicting the price of a house based on features such as location, square footage, and number of bedrooms and bathrooms. We can use the above metrics to evaluate the performance of our model. For instance:

  • MSE or RMSE would tell us how well the model predicts the actual prices of houses and if there are any outliers present.

  • MAE would show us the average deviation from the actual price.

  • R² would measure how well the model fits the data.

  • MAPE would calculate the average percentage deviation of predicted prices from actual prices.

Evaluating a classification model

To evaluate a classification model’s performance, we need to build a confusion matrix. Let’s quickly learn more about that.

Confusion matrix

A confusion matrix shows us how often the model gets confused in determining which class a particular record in the dataset belongs to.

Imagine we have a model that tries to classify whether a credit card user will default on their debt. The confusion matrix would show how many times the model correctly identified a defaulter and a nondefaulter and the number of times it made a mistake and identified a nondefaulter as a defaulter or the reverse. Here’s one such confusion matrix of 27,500 credit statements:

Get hands-on with 1200+ tech skills courses.