Business Example: RFM Analysis in Python
This lesson will focus on how to do RFM analysis in Python with Pandas as an example of the usability of pandas.
In this chapter, we have seen how we can extract useful information from raw data very easily using pandas in Python. But we have only scratched the surface. A lot more can be done to obtain useful insights from the data. Professionals can use their domain expertise to perform different kinds of analysis on the data. In this lesson, we will explore a dataset from the perspective of a business or a marketing professional. We will be doing an RFM analysis of the Sample Sales Data that we have been using.
RFM Analysis
RFM (Recency, Frequency, Monetary) analysis is a marketing technique used to determine quantitatively which customers are the best ones by examining how recently a customer has purchased (recency), how often they purchase (frequency), and how much the customer spends (monetary). Using RFM analysis, customers are assigned a ranking number of (with being highest) for each RFM parameter. The three scores together are referred to as an RFM “cell”. The data is sorted to determine which customers were the best customers in the past, with a cell ranking of being ideal.
So, let’s start coding.
import pandas as pd# Read the datadf = pd.read_csv('sales_data.csv')# Filter the datacols_needed = ['CUSTOMERNAME','ORDERNUMBER','ORDERDATE','SALES']df = df[cols_needed]
We read the data in line 4. Since we are doing RFM analysis, we will only need four columns. CUSTOMERNAME
to group customers, ORDERDATE
to calculate recency, ORDERNUMBER
to calculate frequency, and SALES
to calculate monetary. Therefore, we write these column names in a list ...