...

Evaluating Regression Models

This lesson will focus on ways to evaluate the performance of regression Models.

We'll cover the following...

Losses

Interpreting losses

Plotting absolute error percentages

Interpreting absolute error percentages

R2 score

Interpreting R2
Contribution of each variable

Takeaway

Press + to interact

import pandas as pd 
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,mean_absolute_error,median_absolute_error
import numpy as np
df = pd.read_csv('USA_Housing.csv')
# Split dataframe into Xs and Y
X = df.drop(columns = ['Address','Price'])
Y = df[['Price']]
# Linear Regression model fitting
lr = LinearRegression()
lr.fit(X,Y)
# Loss and predictions
predictions = lr.predict(X)
df['Predictions'] = predictions
print(df[['Price','Predictions']].head())
mse_loss = mean_squared_error(y_true = Y,y_pred = predictions)
mae_loss = mean_absolute_error(y_true = Y,y_pred = predictions)
print('MSE loss = ',mse_loss)
print('MAE loss = ',mae_loss)
median_abs_loss = median_absolute_error(y_true = Y,y_pred = predictions)
print('Median abs loss = ',median_abs_loss)

In lines 3 and 4, we have imported the LinearRegression class and mean_squared_error function. We read the data into a dataframe in line 7. Since we will not be using any non-numeric variables for prediction, we drop Price and Address and form a new dataframe X in line 10. It has all the variables that we can use in prediction. In line 11, we separate the actual values of Price in a dataframe called Y.

In line 14, we initialize the LinearRegression class and call the class object lr. We then use the fit function to fit our model in the next line. The fit function will find the best model for us and store the model parameters internally.

Now we get predictions using our fitted model in line 18 using the predict function. Then we add a column Predictions in the dataframe in line 20. The next line will show us the actual values and predicted values of the top 5 rows side by side.

In line 23, we take the mean squared error and save it as mse_loss using the mean_squared_error function. In the next line, we take the mean absolute error using the mean_absolute_error function and save it as mae_loss. Both functions expect the same arguments, actual values (y_true) and predicted values(y_pred). We print these losses in the next two lines.

Interpreting losses

Losses are a good indication of the performance of the model. Now by looking at the losses, we can see that the model does not perform great. The more intuitive Mean Absolute Error of almost $81000$ does not seem very good performance by the model on average.

However, some loss metrics, such as MSE and MAE, are greatly affected by outliers. For instance, there might be some outliers in the data that push the mean loss value up. Therefore, we also calculate the median absolute loss in line 28. We can see that the median absolute loss is almost $69000$ , which is a noticeable drop from the mean absolute error. This shows that we cannot always rely on loss functions to evaluate the performance, so we might need some other ways to look at the model’s performance.

Press + to interact

import pandas as pd 
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
df = pd.read_csv('USA_Housing.csv')
# Split dataframe into Xs and Y
X = df.drop(columns = ['Address','Price'])
Y = df[['Price']]
# Linear Regression model fitting
lr = LinearRegression()
lr.fit(X,Y)
# Loss and predictions
predictions = lr.predict(X)
# Plot error %
errors = np.abs((Y-predictions) / Y) * 100
plt.plot(range(errors.shape[0]),errors)

What is Data Science

Python Basics

Handling Tabular Data in Python

Data Cleaning

Exploratory Data Analysis

Statistical Inference

Predictive Models

Machine Learning

How to Predict the Traffic Volume Using Machine Learning

Evaluating Regression Models

Losses

Interpreting losses

Plotting absolute error percentages