Solution: Revenue Prediction

Learn by doing your work alongside the regression challenge solution.

We'll cover the following...

Solution code

Let's take a look at the solution and review the code.

Press + to interact
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# challenge 1 - load the dataset
df_retail = pd.read_csv('wrangled_transactions.csv', header=0, index_col='customer_id')
# challenge 2 - handle outliers
df_retail = df_retail.query('revenue_2019 < 5000')
# challenge 3 - split the dataset into test & training dataset
X = df_retail[['revenue_2019']]
y = df_retail['revenue_2020']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=11)
# challenge 4 - build and train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# challenge 5 - calculate the RMSE score
rmse = mean_squared_error(y_test, model.predict(X_test))**0.5
print("Root mean squared error: %.2f" % rmse)

Explanation

...