Association rule mining (ARM) finds frequently occurring if-then patterns in the data. The output is in the form of rules that describe the most important combinations of features that co-occur frequently.
Association rule mining falls under the category of unsupervised learning as we don’t have access to the correct answers, i.e., what the correct association rules are. Hence, the evaluation of the results is subjective.
Take a quick look at the table below to learn the commonly used nomenclature. The dataset is shown in a denormalized 2D format where features one to four (features/predictors/attributes; all synonyms) are used to describe the characteristics or properties of an entity (i.e., the metadata of an entity). A row is a sample (observation/instance; all synonyms) containing the actual data values recorded for a single entity. These data values can be of various types, such as nominal, ordinal, numeric, etc. Usually, a prepared and preprocessed dataset consists of n such features (four in the following table) and m samples (two in the following table).
Feature1 | Feature2 | Feature3 | Feature4 |
pvalue1 | pvalue2 | pvalue3 | pvalue4 |
gvalue1 | gvalue2 | gvalue3 | gvalue4 |
Let's see the differences and similarities between association rules and classification. One prominent difference is that classification is a form of supervised learning, whereas association rule mining is a form of unsupervised learning. Assume we have
Example of a classification rule:
Example of an association rule:
Here, "feature1 = gvalue1 and feature4 = gvalue4" is termed as the antecedent/premise, and "feature2= gvalue2 and feature3 = gvalue3" is termed as the consequent. So we can interpret an association rule as the implication
Because classification is finding relations among (up to
Association rules can provide useful insights in many scenarios. A few of them are:
Marketing strategies and sales promotions (which products to retain, which to discontinue, how discontinuation of one product can affect another products' sales)
Supermarket shelf management or market basket analysis (i.e., which products are commonly bought together—bread/milk/eggs)
Inventory management (i.e., which parts/products to retain at different geographical locations of a retail outlet for quick delivery on demand)
Disease diagnosis assistance (i.e., what symptoms are associated with which disease)
Intelligent transportation system (ITS) (i.e., which routes are traversed together commonly, which seasons/regions result in a high number of vehicle or pedestrian accidents)
Web log data (i.e., preferences of users; recommendation systems, suggestions about items frequently bought together)
Fraud detection on the web (i.e., which transactions might be fraudulent—several purchases from the same buyer during a short period of time)
The following are some commonly used algorithms to find association rules:
Apriori: It uses the notion of itemsets and mines the data for frequent itemsets in a bottom-up manner. It iteratively generates frequent 1-itemsets to n-itemsets based on a given minimum support threshold. The output includes frequent itemsets that fulfill the aforementioned conditions and a list of generated rules. AIS and SETM algorithms also use the notion of itemsets, but both suffer from the issue of possibly generating and counting many small candidate itemsets.
FP-Growth: It mines frequent patterns without generating frequent itemsets by generating a compact tree called FP-tree to represent the dataset. Then, it finds frequent patterns by employing a recursive technique. Tree-based pattern mining is typically a faster approach than Apriori.
Equivalence class transformation (ECLAT): It identifies groups of data that exhibit similar behavior while sustaining adequate coverage. These groups are termed equivalence classes. This approach is widely used in software testing optimization.
Direct hashing and pruning: It works in two phases. As the name suggests, the first phase is hashing, and the second is pruning. During the hashing stage, features are condensed using a hash function to fixed-length values while retaining important information. In the second stage, pruning removes the redundant features, reducing the dimensionality of the hash values.
Tree projection: It converts the data into a parse tree representation based on the relationships found among data features utilizing dependency or constituency parsing techniques. The feature values are mapped to nodes, and the relationships among them are represented by the edges.
The following measures are commonly used to evaluate an association rule algorithm’s performance. Among these, the first two are the most commonly used. However, a combination of all can result in improved selection of important association rules.
Support (Supp)
Confidence (Conf)
Lift (Lift)
Leverage (Lev)
Conviction (Conv)
Support indicates how frequently the items occur in the dataset, and is also known as support count. Where
Confidence indicates how frequently the if-then rule is found to be true in the dataset. Confidence is the proportion of the samples covered by the premise that are also covered by the consequence. It’s basically a conditional probability, with a value that can range from
Why multiple measures? A rule can show a strong correlation in a dataset if it appears frequently but may occur far less when applied. This can be due to high support and low confidence. Conversely, a rule may appear less relevant in a dataset, but in real operation, it might occur frequently. This can happen in the case of high confidence and low support. So, guided by multiple measures rather than one measure, we can find important and relevant association rules.
Lift is the ratio of confidence to the proportion of all samples that are covered by the consequence. It evaluates the importance of the association that’s independent of support. With and without
Inclusion of lift to the two aforementioned measures (i.e., support and confidence) ensures the rules aren’t biased toward either rarely occurring feature combinations or rarely correct feature combinations.
Leverage is the proportion of additional observations covered by both the premise and the consequence beyond those expected if the premise and consequence were independent of one another. In other words, we can say that leverage is the probability of
Conviction is another measure of departure from independence. Conviction is given by
The Apriori algorithm begins with converting each feature into
Inputs: Dataset consisting of n
features and m
samples
Outputs: Set of frequent itemsets from the final iteration that meet the minimum support threshold and rules that meet the minimum confidence threshold
Hyperparameters:
Minimum support threshold: The minimum support needed for an itemset to qualify as frequent (SuppTh
min
)
Minimum performance metric threshold: The minimum performance metric threshold, such as confidence (ConfTh
min
), for excluding less important rules
Steps:
Generate an initial list of
Compute support for each candidate
Exclude the candidate SuppTh
min
to obtain remaining filtered frequent
Merge the frequent
Compute support for the candidate
Filter the
If filtered
Generate rules from the final frequent itemsets depending on minimum performance metric threshold such as confidence.
Exclude all rules with confidence < ConfTh
min
.
In its infancy, association rule mining was focused on finding items frequently bought together in supermarkets, so it became almost synonymous with market basket analysis. Later on, it was employed in many other application domains, as we discussed earlier. However, market basket analysis still remains a classic example of association rule mining. Let’s wrap things up by discussing a concrete example to see what we’ve learned so far in action.
Consider a fictional supermarket that sells in total five products: toothbrush, toothpaste, soap, tissues, and handwash. So, we end up with a transactions dataset that looks somewhat like the table below. One
Transaction ID | Toothpaste | Toothbrush | Tissues | Soap | Handwash |
1 | 1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 0 | 0 | 0 |
3 | 1 | 1 | 0 | 1 | 0 |
4 | 1 | 1 | 0 | 0 | 1 |
5 | 0 | 0 | 0 | 1 | 1 |
Different measures we have seen so far can then be calculated as follows (feel free to revisit the formulae for the measures):
Note that if we use SuppTh
min
Some of the example
2-itemsets | Support | Confidence | Lift |
Toothpaste → Toothbrush | 0.8 | 1 | 1.25 |
Toothpaste → Tissues | 0.2 | 0.25 | 1.25 |
Toothpaste → Soap | 0.4 | 0.5 | 0.83 |
Toothpaste → Handwash | 0.4 | 0.5 | 0.83 |
Toothbrush → Tissues | 0.2 | 0.25 | 1.25 |
Toothbrush → Soap | 0.4 | 0.5 | 0.83 |
Toothbrush → Handwash | 0.4 | 0.5 | 0.83 |
Tissues → Soap | 0.2 | 1 | 1.67 |
Tissues → Handwash | 0.2 | 1 | 1.67 |
Soap → Handwash | 0.4 | 0.67 | 1.11 |
It must be noted that after filtering the itemsets, only the remaining frequent itemsets will be used for the next iteration to form
Recall that the Apriori algorithm takes the minimum support threshold and minimum confidence threshold as hyperparameters from the user. Assuming SuppTh
min
ConfTh
min
As a conclusion, take a look at how changing the hyperparameters of the Apriori algorithm changes the end results. The dataset used as an example is
We get the following results with SuppTh
min
ConfTh
min
The above image shows that with the aforementioned parameters, the number of frequent
The extracted 10 most important rules are listed in the lower panel. Let’s discuss the first rule to clarify how to interpret the results. The first rule states that if biscuits=t, frozen foods=t, fruit=t, total=high, then bread and cake=t.
Support of itemsets mentioned in front of them (e.g., bread and cake occur together in 723 samples)
Right in front of the rule, its corresponding measures' values are mentioned (e.g., confidence =
With SuppTh
min
ConfTh
min
With SuppTh
min
ConfTh
min
Note that with this combination of hyperparameters, the Apriori algorithm ends up with no rules that fulfill that criterion. So among the three configurations we tried, the better option would be scenario I.
We can try different parameter configurations, generate rules, and ponder over them in light of the performance measures to reach a set of beautiful and meaningful association rules.
Let's see the things we’ve discussed and learned so far in action. Consider a
The dataset contains records of retail store transactions. Each transaction consists of names of products purchased by a customer (e.g., “'bread”, “toothpaste”, “dishsoap”, “door lock”, “ shampoo”). The following Python code uses the apyori
library to find frequent itemsets and association rules based on the Apriori algorithm. It’ll smoothly run in Google Colab; you just need to upload the dataset and provide the correct path to it. The description of different statements is provided using comments. It would be better to break it down into smaller chunks for ease of discussion and we can do that by adding these blocks directly to separate cells of the Colab notebook, or adding them all together in a single cell in the exact same order.
# this project was built and executed on Google Colab,# hence the description is w.r.t. its environment# install the required package for Apriori algorithm!pip install apyori# importing libraries for data processing and data visualizationimport numpy as npimport pandas as pd
# import the dataset, right now we have dataset uploaded to 'sample_data'# in the colab as a csv file# you can right click the file uploaded in the 'sample _data' section#to copy the path conveniently# or provide the correct path of the dataset in the read_csv method wherever# the dataset existsData = pd.read_csv('/content/sample_data/Market_Basket_Optimisation.csv', header = None)print(Data.head()) # displays the first 5 rows of the datasetprint (Data.info()) # displays the information about the dataset (e.g., features and samples etc.)rows, cols =Data.shape ## recording number of samples and number of features for processing later
The two print statements in the above code block will display the result of the print(Data.head())
statement and the print (
Data.info
())
statement.
Now, we need to process the dataset to convert it into an appropriate form for the Apriori method. Its documentation describes what the correct input format is:
# We need to train the model 'apyori' for mining frequent itemsets and rules# but it takes input in a list format and the elements of the list should be strings.# For that, we need to convert the pandas dataframe into a list# Initializing the transactions listtransactionsList = []# preparing the list of transactions. The number of features in the dataset are in object cols, the number of samples in the dataset are in object rowsfor i in range(0, rows):transactionsList.append([str(Data.values[i,j]) for j in range(0, cols)])
After the dataset is in the right format, we need to apply the Apriori algorithm.
#import apriori from apyorifrom apyori import apriori# the hyperparameters of the Apriori algorithm are specified as the method arguments# the complete details can be found in the documentation of the apyori library# the combination of hyperparameters below can be altered and the resulting output can be examined:# minimum support = 0.003# minimum confidence = 0.2# minimum lift = 3# minimum length = 2 i.e. minimum 2-itemsets frequent sets# maximum length = 2 i.e. maximum 2-itemsets frequent setsrules = apriori(transactions = transactionsList, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)
The results are returned in the rules
object by the apriori
method. The details of the parameters are mentioned in the comments.
After that, we need to convert the results to a list and then into a dataframe for easy viewing. The following piece of code does that for you.
# convert the resulting rules into a listrules_list = list(rules)# convert the list into a pandas dataframedef listToTabularForm(rules_list):leftHandSide = [tuple(resultTuple[2][0][0])[0] for resultTuple in rules_list]rightHandSide = [tuple(resultTuple[2][0][1])[0] for resultTuple in rules_list]support = [resultTuple[1] for resultTuple in rules_list]confidence = [resultTuple[2][0][2] for resultTuple in rules_list]lift = [resultTuple[2][0][3] for resultTuple in rules_list]return list(zip(leftHandSide, rightHandSide, support, confidence, lift))output_DataFrame = pd.DataFrame(listToTabularForm(rules_list), columns = ['Left_Hand_Side', 'Right_Hand_Side', 'Support', 'Confidence', 'Lift'])output_DataFrame
Voila! We have created our association rules.
If you found this blog interesting, more relevant material is available in the following course and skill path.
An Introductory Guide to Data Science and Machine Learning
There is a lot of dispersed, and somewhat conflicting information on the internet when it comes to data science, making it tough to know where to start. Don't worry. This course will get you familiar with the state of data science and the related fields such as machine learning and big data. You will be going through the fundamental concepts and libraries which are essential to solve any problem in this field. You will work on real-time projects from Kaggle while also honing your mathematical skills which will be used extensively in most problems you face. You will also be taken through a systematic approach to learning about data acquisition to data wrangling and everything in between. This is your all-in-one guide to becoming a confident data scientist.
Machine Learning with Python Libraries
Machine learning is used for software applications that help them generate more accurate predictions. It is a type of artificial intelligence operating worldwide and offers high-paying careers. This path will provide a hands-on guide on multiple Python libraries that play an important role in machine learning. This path also teaches you about neural networks, PyTorch Tensor, PyCaret, and GAN. By the end of this module, you’ll have hands-on experience in using Python libraries to automate your applications.
Free Resources