Feature Engineering
Learn how to discover and extract features through domain knowledge and data wrangling techniques.
We'll cover the following...
Not all features are used to train a machine learning model. Some features improve the performance, and others increase the bias. In this lesson, we’ll extract important features for our model using the information gathered in the data exploration step.
Load the dataset
Before we start, let’s import the pertinent libraries and load the dataset.
Press + to interact
import pandas as pdimport numpy as npimport datetime as dtimport matplotlib.pyplot as pltimport seaborn as snsdf_retail = pd.read_csv('retail_transactions.csv')print(df_retail.head())
Data wrangling
In the previous lesson, we learned about exploratory data analysis techniques. Using that knowledge, we’ll shape and prepare our data for our model.
Press + to interact
# remove unnecessary columnsdf_retail = df_retail.drop(columns=['StockCode', 'Description'])# keep UK records onlydf_retail = df_retail[df_retail['Country'] == 'United Kingdom']# fix the data type and parse datetimedf_retail['CustomerID'] = df_retail['CustomerID'].astype(str)df_retail['InvoiceDate'] = pd.to_datetime(df_retail['InvoiceDate']).dt.normalize()# calculate revenue and transaction yeardf_retail['Revenue'] = df_retail['UnitPrice'] * df_retail['Quantity']df_retail['Year'] = df_retail['InvoiceDate'].dt.year# take a look at the current datasetdf_retail.info()
Explanation ...