Mean absolute error in sklearn

Scikit-learn is a Python-focused library mainly utilized in machine learning tasks, including classification, regression, clustering, and model selection.

Note: To get hands-on practice in Scikit-learn, you can explore the course

Hands on machine learning with scikit-learn.

Scikit-learn model training

While training a model we've built from scratch, it is crucial to focus on its accuracy. Various metrics can be used to specify how well a model is trained. In this Answer, we will be covering one such metric called the mean absolute error.

Model validation

We can confirm the accuracy of a trained model by passing validation data to it and observing the results. Here, we compare our predicted and actual values for the data, and this is precisely where the mean absolute error comes in handy.

Model validation
Model validation

Mean absolute error

Mean absolute error, abbreviated as MAE, is a metric used to measure the average absolute difference between the predicted and actual values in a regression problem i.e. modeling various relations of variables. In deep learning, MAE is used as a loss functionquantifies how well the model's predictions match the actual values during model training.

Simply put, MAE tells us how much our predictions are off from the actual values in the dataset. It helps us understand the accuracy of our model by measuring the absolute errors between predicted and true values.

Interpreting our results

A lower MAE means our model's predictions are closer to the actual values. This indicates better performance.

MAE depiction
MAE depiction

Mathematical representation

To calculate MAE, we take the absolute difference between each predicted value and its corresponding actual value. Then, we add up all these absolute differences and divide the sum by the total number of data points.

Formula

MeanAbsoluteError=i=1nyixin\mathrm {MeanAbsoluteError}= \frac {\sum _{i=1}^{n} {|y_i-x_i|}}{n}

Mechanism

The manual calculation of the mean absolute error is represented by the code below.

actual_values = [25, 30, 20, 35, 41]
predicted_values = [23, 28, 19, 33, 38]
n = len(actual_values)
absolute_differences = [abs(actual_values[i] - predicted_values[i]) for i in range(n)]
sum_of_absolute_differences = sum(absolute_differences)
mae = sum_of_absolute_differences / n
print("Actual values = ", actual_values)
print("Predicted values = ", predicted_values)
print("Absolute differences = ", absolute_differences)
print("Sum of absolute difference = ", sum_of_absolute_differences)
print("The MAE = ", mae)
  1. First off, we begin by defining sample data with actual_values and predicted_values,

  2. Next, we calculate the number of data points n in our dataset.

  3. We then calculate the absolute differences between both values for each data point using a list comprehension. abs() is used to get the absolute value of these differences.

  4. Then, we sum these differences using sum().

  5. Finally, we calculate the MAE by dividing the sum by the number of data points n.

  6. We print our variables to understand the code better.

Code sample

from sklearn.metrics import mean_absolute_error
y_true = [3.5, 2.1, 5.2, 7.8, 4.6]
y_pred = [3.0, 2.5, 4.8, 8.0, 5.2]
mae = mean_absolute_error(y_true, y_pred)
print("MAE = ", mae)

Code explanation

  • Line 1: We import the mean_absolute_error method from the sklearn.metrics module.

  • Lines 4–5: We define our target i.e. true and predicted pred values.

  • Line 8: We make use of the pre-built function to calculate MAE and make the code compact for us.

  • Line 9: Finally, we print the calculated error.

Match The Answer
Select an option from the left-hand side

abs() function is used to

find how well the data has been trained

MAE is used to

calculate the absolute value of the variable passed to it


Free Resources

Copyright ©2024 Educative, Inc. All rights reserved