...

/

Evaluating Regression Models

Evaluating Regression Models

This lesson will focus on ways to evaluate the performance of regression Models.

In the previous lessons, we learned how to make and fit linear regression models in Python. But we did not discuss ways to judge the performance of the models. In this lesson, we will focus on techniques used to evaluate the performance of linear regression models.

We will be using the same model that we used in the last lesson where we tried to predict house prices using the USA Housing Dataset.

Losses

We can evaluate the model performance by looking at different losses. We have already looked at mean squared loss.

Press + to interact
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,mean_absolute_error,median_absolute_error
import numpy as np
df = pd.read_csv('USA_Housing.csv')
# Split dataframe into Xs and Y
X = df.drop(columns = ['Address','Price'])
Y = df[['Price']]
# Linear Regression model fitting
lr = LinearRegression()
lr.fit(X,Y)
# Loss and predictions
predictions = lr.predict(X)
df['Predictions'] = predictions
print(df[['Price','Predictions']].head())
mse_loss = mean_squared_error(y_true = Y,y_pred = predictions)
mae_loss = mean_absolute_error(y_true = Y,y_pred = predictions)
print('MSE loss = ',mse_loss)
print('MAE loss = ',mae_loss)
median_abs_loss = median_absolute_error(y_true = Y,y_pred = predictions)
print('Median abs loss = ',median_abs_loss)

In lines 3 and 4, we have imported the LinearRegression class and mean_squared_error function. We read the data into a dataframe in line 7. Since we will not be using any non-numeric variables for prediction, we drop Price and Address and form a new dataframe X in line 10. It has all the variables that we can use in prediction. In line 11, we separate the actual values of Price in a dataframe called Y.

In line 14, we initialize the LinearRegression class and call the class object lr. We then use the fit function to fit our model in the next line. The fit function will find the best model for us and store the model parameters internally.

Now we get predictions using our fitted model in line 18 using the predict function. Then we add a column Predictions in the dataframe in line 20. The next line will show us the actual values and predicted values of the top 5 rows side by side.

In line 23, we take the mean squared error and save it as mse_loss using the mean_squared_error function. In the next line, we take the mean absolute error using the mean_absolute_error function and save it as mae_loss. Both functions expect the same arguments, actual values (y_true) and predicted values(y_pred). We print these losses in the next two lines.

Interpreting losses

Losses are a good indication of the performance of the model. Now by looking at the losses, we can see that the model does not perform great. The more intuitive Mean Absolute Error of almost 8100081000 does not seem very good performance by the model on average.

However, some loss metrics, such as MSE and MAE, are greatly affected by outliers. For instance, there might be some outliers in the data that push the mean loss value up. Therefore, we also calculate the median absolute loss in line 28. We can see that the median absolute loss is almost 6900069000, which is a noticeable drop from the mean absolute error. This shows that we cannot always rely on loss functions to evaluate the performance, so we might need some other ways to look at the model’s performance.

Plotting absolute error percentages

To check how well our model performed let’s plot the absolute percentage error in each prediction

Press + to interact
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
df = pd.read_csv('USA_Housing.csv')
# Split dataframe into Xs and Y
X = df.drop(columns = ['Address','Price'])
Y = df[['Price']]
# Linear Regression model fitting
lr = LinearRegression()
lr.fit(X,Y)
# Loss and predictions
predictions = lr.predict(X)
# Plot error %
errors = np.abs((Y-predictions) / Y) * 100
plt.plot(range(errors.shape[0]),errors)

After performing the regression, we compute absolute percentage errors in line 21. We use the following formula:

%error=abs(actualpredicted)actual100\% error = \frac{abs(actual - predicted)}{actual} * 100 ...

Access this course and 1400+ top-rated courses and projects.