Solution: Revenue Prediction
Learn by doing your work alongside the regression challenge solution.
We'll cover the following...
Solution code
Let's take a look at the solution and review the code.
Press + to interact
import pandas as pdimport numpy as npfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_error# challenge 1 - load the datasetdf_retail = pd.read_csv('wrangled_transactions.csv', header=0, index_col='customer_id')# challenge 2 - handle outliersdf_retail = df_retail.query('revenue_2019 < 5000')# challenge 3 - split the dataset into test & training datasetX = df_retail[['revenue_2019']]y = df_retail['revenue_2020']X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=11)# challenge 4 - build and train a linear regression modelmodel = LinearRegression()model.fit(X_train, y_train)# challenge 5 - calculate the RMSE scorermse = mean_squared_error(y_test, model.predict(X_test))**0.5print("Root mean squared error: %.2f" % rmse)
Explanation