Model Building and Evaluation
Learn how to build and evaluate a linear regression model to predict the customer revenue in a retail shop.
Finally, the dataset has been prepared, and we have observed the underlying relationships between the features. Now, we can build our regression model and start predicting customer spending for the year 2020.
In this lesson, we’ll build our first version of the model, evaluate its performance, and test different ways to improve the model’s performance.
As always, let's import all the necessary libraries and the wrangled dataset.
import pandas as pdimport numpy as npimport datetime as dtimport matplotlib.pyplot as pltimport seaborn as snsfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_errordf_retail = pd.read_csv('wrangled_transactions.csv', header=0, index_col='customer_id')print(df_retail.head())
Model building
We’ll separate the features from the label first. Then we’ll split the dataset into training and test datasets, keeping 20% of the dataset for testing. This is a standard process in most model-building scenarios.
We'll use revenue_2020
as our target prediction label and the following as our feature columns:
revenue_2019
...