Model Building and Evaluation

Learn how to build and evaluate a linear regression model to predict the customer revenue in a retail shop.

Finally, the dataset has been prepared, and we have observed the underlying relationships between the features. Now, we can build our regression model and start predicting customer spending for the year 2020.

In this lesson, we’ll build our first version of the model, evaluate its performance, and test different ways to improve the model’s performance.

As always, let's import all the necessary libraries and the wrangled dataset.

Press + to interact
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
df_retail = pd.read_csv('wrangled_transactions.csv', header=0, index_col='customer_id')
print(df_retail.head())

Model building

We’ll separate the features from the label first. Then we’ll split the dataset into training and test datasets, keeping 20% of the dataset for testing. This is a standard process in most model-building scenarios.

We'll use revenue_2020 as our target prediction label and the following as our feature columns:

  • revenue_2019

  • ...