...

Model Evaluation

Evaluate the quality of the model using a handful of evaluation metrics.

We'll cover the following...

Residual scatterplot
Residual histogram
R-squared and accuracy of the fit
Regression evaluation metrics

Press + to interact

We are interested in the following.

Scedasticity: We want a consistent variance between our low and high predictions (homoscedasticity). Our target is not normally distributed if we spot the opposite (heteroscedasticity). The remedy is to run the target vector through a power transformation to make it more Gaussian-like, such as Box-Cox or Yeo-Johnson.
Outliers: If the loss function involves squaring the residuals (for example, MSE, RMSE, R2), then outliers will have a lot of leverage over the model. We recommend removing the worst outliers or offenders from the training data.

In statistics, scedasticity is the distribution of error terms, and they can either be distributed randomly and with constant variance (homoscedasticity) or with some pattern (heteroscedasticity).

Residual histogram

The residual histogram is also helpful as it can tell us how much the predicted value differs from the actual value in y_test. We can do the subtraction y_test - predictions for this plot. Before we move on, let's try to understand the skewness in the distribution plots. The code below generates data and creates the plots for learning.

Press + to interact

Python 3.8

fig,ax=plt.subplots(nrows=3,figsize=(18,14))
# Generating data
np.random.seed(41);size,scale = 100000,10
PositiveSkew=pd.Series(np.random.gamma(scale,size=size)**2.5)
NotmalDistribution=pd.Series(np.random.randn(size));NegativeSkew = PositiveSkew[::-1]
# Positive skewed plot
sns.histplot(data=PositiveSkew,ax=ax[0],kde=True,bins=200,color='r',line_kws = {'lw':3});
ax[0].text(1000,4000,'Positive Skew / Right Skewed',fontsize=18,color='r')
ax[0].axvline(np.mean(PositiveSkew),lw=5,c='DarkRed')
ax[0].text(390,1500,'Mean',fontsize=18,color='DarkRed',weight='bold')
ax[0].axvline(np.median(PositiveSkew),lw =5,c='gold',ls = '--')
ax[0].text(160,1500,'Median',fontsize=18,color='gold',weight='bold')
ax[0].set_xlim(0,1500);ax[0].set(xticklabels=[]);ax[0].set(yticklabels=[]);ax[0].set(ylabel=None)
# Normal distribution - No Bias
sns.histplot(data=NotmalDistribution,ax=ax[1],kde=True,bins=40,color='g',line_kws={'lw':3});
ax[1].text(1.6,6050,'Normal Distribution, No Bias - OK',fontsize=18,color='g')
ax[1].axvline(np.mean(NotmalDistribution),lw=10,c ='DarkRed')
ax[1].text(0.1,4000,'Mean',fontsize=18,color='DarkRed',weight='bold')
ax[1].axvline(np.median(NotmalDistribution),lw =3,c='gold',ls = '--')
ax[1].text(0.1,3000,'Median',fontsize=18,color='gold',weight='bold')
ax[1].set(xticklabels=[]);ax[1].set(yticklabels=[]);ax[1].set(ylabel=None);
# Negative Skew
sns.histplot(data=NegativeSkew,ax=ax[2],kde=True,bins=200,color='r',line_kws={'lw':3})
ax[2].text(1000,4000,'Negative Skew / Left Skewed',fontsize=18,color='r')
ax[2].axvline(np.mean(NegativeSkew), lw=5, c = 'DarkRed')
ax[2].text(470,1500,'Mean',fontsize=18,color='DarkRed',weight='bold')
ax[2].axvline(np.median(NegativeSkew),lw =5,c='gold',ls = '--')
ax[2].text(280,1500,'Median',fontsize=18,color='gold',weight='bold')
ax[2].set_xlim(1500,0);ax[2].set(xticklabels=[]);ax[2].set(yticklabels=[]);ax[2].set(ylabel=None);
print("NOTE: The 'x-' limits are enforced in these histograms, to see the full plots, remove the limits!")

Note: The mean and median are the same for normal distributions. However, they become different numbers if the distributions are skewed. The mean will be on the right side of the median in a right-skewed (positive) distribution. In a left-skewed (negative) distribution, the mean will be on the left of the median.

For the best-fitted model, the mean and the sum of residuals are always zero for the training data (X_train, y_train). Let's see how the residual histogram looks for the testing data (X_test, y_test). We don't expect the mean and the sum of residuals to be zero in this plot.

Press + to interact

Python 3.8

pred_train = lm.predict(X_train)
print("TRAINING DATA (X_train, y_train)")
print("Mean of the residual:", abs(round(np.mean(y_train-pred_train), 4)))
print("Sum of the residual:", abs(round(np.sum(y_train-pred_train), 4)))
# Residual Histogram
# Training Data
fig1, ax1 = plt.subplots(nrows=2,figsize=(16, 10))
sns.histplot(data=y_train-pred_train,ax=ax1[0], kde=True, bins=100);
#sns.displot(y_test-predictions);
ax1[0].axvline(x=np.mean(y_train-pred_train), color='r', ls='--', linewidth=2)
ax1[0].set_title("Residual histogram for TRAINING DATA (X_train, y_train)", fontsize=16)
ax1[0].set_xlabel('Residuals (y_train-pred_train)', fontsize=14)
ax1[0].set_ylabel('Relative Frequency', fontsize=14)
ax1[0].set_xlim(-20, 30)
# putting ssome text on the plot!
ax1[0].text(x=5, y=18, bbox=dict(facecolor='green', alpha=0.3),fontsize=12,  
         s="For best fitted model, the mean and the sum of residual always\
            \nequals to zero for the TRAINING DATA.");
ax1[0].text(x=5, y=13, bbox=dict(facecolor='red', alpha=0.3),fontsize=12,  
         s="Compare this plot with the above example histograms.\n\nWhat do you think?");
# Testing Data
print("\nTEST DATA (X_test, y_test)")
print("Mean of the residual:", round(np.mean(y_test-predictions),4))
print("Sum of the residual:", round(sum(y_test-predictions),4))
sns.histplot(data=y_test-predictions, ax=ax1[1], kde=True, bins=100);
#sns.displot(y_test-predictions);
ax1[1].axvline(x=np.mean(y_test-predictions), color='r', ls='--', linewidth=2)
ax1[1].set_xlabel('Residuals (y_test - pred_test)', fontsize=14)
ax1[1].set_ylabel('Relative Frequency', fontsize=14)
ax1[1].set_title('Residual histogram from TEST DATA (X_test, y_test)', fontsize=16)
ax1[1].set_xlim(-20, 30)
# putting some text on the plot!
ax1[1].text(x=5, y=6, bbox=dict(facecolor='red', alpha=0.3),fontsize=12,  
         s="We don't expect mean and sum of residual to be zero in this plot.\
         \nThis is unseen/test data for the model!");
fig1.tight_layout(pad=2.0)

Press + to interact

Python 3.8

# Adjusted R^2 (see explanation "R^2 vs R^2_adj" at the end)
r_2_train = lm.score(X = X_train, y = y_train)
r_2_test  = lm.score(X = X_test, y = y_test)
p = len(X_train.columns) # n_features same for train & test
n_train = len(X_train) # n_data_points train
n_test  = len(X_test) # n_data_points test
print("R^2 (adjusted) for (X_train, y_train) = ", (1 - (((1-r_2_train)*(n_train-1))/(n_train-p-1))))
print("R^2 (adjusted) for (X_train, y_train) = ", (1 - (((1-r_2_test)*(n_test-1))/(n_test-p-1))))

Press + to interact

R-squared—coefficient of determination—is a regression score function. It is a statistical measure representing the proportion of the variance for a dependent variable explained by an independent variable or variables in a regression model. The best possible score is 1.0. It can be negative (because the model can be arbitrarily worse).

Note: A constant model that always predicts the expected value of y, disregarding the input features, would get the R-squared score of 0.0. The lesson "R-square and Goodness of the Fit" could be a great read to get more details on $R^2$ vs. $R^2_{adj}$ ...

Course Introduction

Linear Regression

Regularization

Bias-Variance Trade-off

Categorical Features

Logistic Regression

Logistic Regression: Titanic Data

Sentiment Analysis Using Multinomial Logistic Regression

Multiclass Classification and Handling Imbalanced Classes

Project: Predicting Chronic Kidney Disease

K-Nearest Neighbors

Implementation of K-Nearest Neighbors

Logistic Regression vs. KNN

Decision Tree Learning

Implement the Decision Tree Classifier from Scratch

Bootstrapping and Confidence Interval

Support Vector Machine

Practice and Comparisons

What's Next?

Appendix

Model Evaluation

Residual scatterplot

Residual histogram

R-squared and accuracy of the fit