...

/

Model Evaluation

Model Evaluation

Evaluate the quality of the model using a handful of evaluation metrics.

A residual value measures how much a regression line vertically misses a data point. Regression lines are the best fit for a set of data. We can think of the lines as averages; a few data points will fit the line, and others will miss. A residual plot has the residual values on the vertical axis, and the horizontal axis typically displays the independent variable.

Residual scatterplot

The scatterplot below (residual values against y_hat predictions) is helpful for understanding randomness.

Press + to interact
# Residual histogram
plt.figure(figsize=(14, 6))
print("Red dotted line shows the mean value '{}' of residuals (y_test-predictions). \
THIS IS NOT THE MODEL LINE!".format(
round(np.mean(y_test-predictions),4)))
plt.axhline(y=np.mean(y_test-predictions), color='r', ls='--', linewidth=3)
plt.scatter(predictions,y_test-predictions);
plt.xlabel('Model Predictions (y_hat)')
plt.ylabel('Residual Values')
plt.title('\nResidual scatter plot vs model predictions.');

We are interested in the following.

  • Scedasticity: We want a consistent variance between our low and high predictions (homoscedasticity). Our target is not normally distributed if we spot the opposite (heteroscedasticity). The remedy is to run the target vector through a power transformation to make it more Gaussian-like, such as Box-Cox or Yeo-Johnson.

  • Outliers: If the loss function involves squaring the residuals (for example, MSE, RMSE, R2), then outliers will have a lot of leverage over the model. We recommend removing the worst outliers or offenders from the training data.

In statistics, scedasticity is the distribution of error terms, and they can either be distributed randomly and with constant variance (homoscedasticity) or with some pattern (heteroscedasticity).

Residual histogram

The residual histogram is also helpful as it can tell us how much the predicted value differs from the actual value in y_test. We can do the subtraction y_test - predictions for this plot. Before we move on, let's try to understand the skewness in the distribution plots. The code below generates data and creates the plots for learning.

Press + to interact
fig,ax=plt.subplots(nrows=3,figsize=(18,14))
# Generating data
np.random.seed(41);size,scale = 100000,10
PositiveSkew=pd.Series(np.random.gamma(scale,size=size)**2.5)
NotmalDistribution=pd.Series(np.random.randn(size));NegativeSkew = PositiveSkew[::-1]
# Positive skewed plot
sns.histplot(data=PositiveSkew,ax=ax[0],kde=True,bins=200,color='r',line_kws = {'lw':3});
ax[0].text(1000,4000,'Positive Skew / Right Skewed',fontsize=18,color='r')
ax[0].axvline(np.mean(PositiveSkew),lw=5,c='DarkRed')
ax[0].text(390,1500,'Mean',fontsize=18,color='DarkRed',weight='bold')
ax[0].axvline(np.median(PositiveSkew),lw =5,c='gold',ls = '--')
ax[0].text(160,1500,'Median',fontsize=18,color='gold',weight='bold')
ax[0].set_xlim(0,1500);ax[0].set(xticklabels=[]);ax[0].set(yticklabels=[]);ax[0].set(ylabel=None)
# Normal distribution - No Bias
sns.histplot(data=NotmalDistribution,ax=ax[1],kde=True,bins=40,color='g',line_kws={'lw':3});
ax[1].text(1.6,6050,'Normal Distribution, No Bias - OK',fontsize=18,color='g')
ax[1].axvline(np.mean(NotmalDistribution),lw=10,c ='DarkRed')
ax[1].text(0.1,4000,'Mean',fontsize=18,color='DarkRed',weight='bold')
ax[1].axvline(np.median(NotmalDistribution),lw =3,c='gold',ls = '--')
ax[1].text(0.1,3000,'Median',fontsize=18,color='gold',weight='bold')
ax[1].set(xticklabels=[]);ax[1].set(yticklabels=[]);ax[1].set(ylabel=None);
# Negative Skew
sns.histplot(data=NegativeSkew,ax=ax[2],kde=True,bins=200,color='r',line_kws={'lw':3})
ax[2].text(1000,4000,'Negative Skew / Left Skewed',fontsize=18,color='r')
ax[2].axvline(np.mean(NegativeSkew), lw=5, c = 'DarkRed')
ax[2].text(470,1500,'Mean',fontsize=18,color='DarkRed',weight='bold')
ax[2].axvline(np.median(NegativeSkew),lw =5,c='gold',ls = '--')
ax[2].text(280,1500,'Median',fontsize=18,color='gold',weight='bold')
ax[2].set_xlim(1500,0);ax[2].set(xticklabels=[]);ax[2].set(yticklabels=[]);ax[2].set(ylabel=None);
print("NOTE: The 'x-' limits are enforced in these histograms, to see the full plots, remove the limits!")

Note: The mean and median are the same for normal distributions. However, they become different numbers if the distributions are skewed. The mean will be on the right side of the median in a right-skewed (positive) distribution. In a left-skewed (negative) distribution, the mean will be on the left of the median.

For the best-fitted model, the mean and the sum of residuals are always zero for the training data (X_train, y_train). Let's see how the residual histogram looks for the testing data (X_test, y_test). We don't expect the mean and the sum of residuals to be zero in this plot.

Press + to interact
pred_train = lm.predict(X_train)
print("TRAINING DATA (X_train, y_train)")
print("Mean of the residual:", abs(round(np.mean(y_train-pred_train), 4)))
print("Sum of the residual:", abs(round(np.sum(y_train-pred_train), 4)))
# Residual Histogram
# Training Data
fig1, ax1 = plt.subplots(nrows=2,figsize=(16, 10))
sns.histplot(data=y_train-pred_train,ax=ax1[0], kde=True, bins=100);
#sns.displot(y_test-predictions);
ax1[0].axvline(x=np.mean(y_train-pred_train), color='r', ls='--', linewidth=2)
ax1[0].set_title("Residual histogram for TRAINING DATA (X_train, y_train)", fontsize=16)
ax1[0].set_xlabel('Residuals (y_train-pred_train)', fontsize=14)
ax1[0].set_ylabel('Relative Frequency', fontsize=14)
ax1[0].set_xlim(-20, 30)
# putting ssome text on the plot!
ax1[0].text(x=5, y=18, bbox=dict(facecolor='green', alpha=0.3),fontsize=12,
s="For best fitted model, the mean and the sum of residual always\
\nequals to zero for the TRAINING DATA.");
ax1[0].text(x=5, y=13, bbox=dict(facecolor='red', alpha=0.3),fontsize=12,
s="Compare this plot with the above example histograms.\n\nWhat do you think?");
# Testing Data
print("\nTEST DATA (X_test, y_test)")
print("Mean of the residual:", round(np.mean(y_test-predictions),4))
print("Sum of the residual:", round(sum(y_test-predictions),4))
sns.histplot(data=y_test-predictions, ax=ax1[1], kde=True, bins=100);
#sns.displot(y_test-predictions);
ax1[1].axvline(x=np.mean(y_test-predictions), color='r', ls='--', linewidth=2)
ax1[1].set_xlabel('Residuals (y_test - pred_test)', fontsize=14)
ax1[1].set_ylabel('Relative Frequency', fontsize=14)
ax1[1].set_title('Residual histogram from TEST DATA (X_test, y_test)', fontsize=16)
ax1[1].set_xlim(-20, 30)
# putting some text on the plot!
ax1[1].text(x=5, y=6, bbox=dict(facecolor='red', alpha=0.3),fontsize=12,
s="We don't expect mean and sum of residual to be zero in this plot.\
\nThis is unseen/test data for the model!");
fig1.tight_layout(pad=2.0)

We have trained our model lm, and the residual plot looks good.

R-squared and accuracy of the fit

Let's see what the accuracy of our model is. We can call the score function on our trained model for this purpose.

Press + to interact
# Calling score on trained model "lm"
print("R^2 (train) - The accuracy score of our model in train part is: ", lm.score(X = X_train, y = y_train))
print("R^2 (test) - The accuracy score of our model on test part is: ", lm.score(X = X_test, y = y_test))

Adjusted R2 is always preferred for multivariant (multiple) linear regression, where we have more than one predictor variable.

Press + to interact
# Adjusted R^2 (see explanation "R^2 vs R^2_adj" at the end)
r_2_train = lm.score(X = X_train, y = y_train)
r_2_test = lm.score(X = X_test, y = y_test)
p = len(X_train.columns) # n_features same for train & test
n_train = len(X_train) # n_data_points train
n_test = len(X_test) # n_data_points test
print("R^2 (adjusted) for (X_train, y_train) = ", (1 - (((1-r_2_train)*(n_train-1))/(n_train-p-1))))
print("R^2 (adjusted) for (X_train, y_train) = ", (1 - (((1-r_2_test)*(n_test-1))/(n_test-p-1))))

Alternatively, we can use the r2_score function from sklearn.metrics.

Press + to interact
# using r2_score here -- same as above but different module from metrics
# We need to import matrices to access r2_score
from sklearn import metrics
print("The accuracy score of our model is on test data: ")
print(metrics.r2_score(y_true = y_test, y_pred = predictions))
# y_true, y_pred order is must if we are not passing them by name (y_test, y_pred)

R-squared—coefficient of determination—is a regression score function. It is a statistical measure representing the proportion of the variance for a dependent variable explained by an independent variable or variables in a regression model. The best possible score is 1.0. It can be negative (because the model can be arbitrarily worse).

Note: A constant model that always predicts the expected value of y, disregarding the input features, would get the R-squared score of 0.0. The lesson "R-square and Goodness of the Fit" could be a great read to get more details on R2R^2 vs. ...

Access this course and 1400+ top-rated courses and projects.