Home/Blog/Data Science/Understanding association rule mining
Home/Blog/Data Science/Understanding association rule mining

Understanding association rule mining

Beenish Akram
16 min read

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

Introduction to association rule mining#

Association rule mining (ARM) finds frequently occurring if-then patterns in the data. The output is in the form of rules that describe the most important combinations of features that co-occur frequently.

Association rule mining falls under the category of unsupervised learning as we don’t have access to the correct answers, i.e., what the correct association rules are. Hence, the evaluation of the results is subjective.

Association rule mining is a form of unsupervised learning
Association rule mining is a form of unsupervised learning

Take a quick look at the table below to learn the commonly used nomenclature. The dataset is shown in a denormalized 2D format where features one to four (features/predictors/attributes; all synonyms) are used to describe the characteristics or properties of an entity (i.e., the metadata of an entity). A row is a sample (observation/instance; all synonyms) containing the actual data values recorded for a single entity. These data values can be of various types, such as nominal, ordinal, numeric, etc. Usually, a prepared and preprocessed dataset consists of n such features (four in the following table) and m samples (two in the following table).

Dataset in Flat 2D Format

Feature1

Feature2

Feature3

Feature4

pvalue1

pvalue2

pvalue3

pvalue4

gvalue1

gvalue2

gvalue3

gvalue4





Classification vs. Association rules#

Let's see the differences and similarities between association rules and classification. One prominent difference is that classification is a form of supervised learning, whereas association rule mining is a form of unsupervised learning. Assume we have nn features and mm samples. In the case of classification, one feature is designated as the class label, which contains the class value for the sample (consider, class label = feature4 in the above table), and the remaining n1n-1 features are used for model training to determine what the class label is. On the contrary, in the case of association rules, there is no notion of class label; rather, all the features and any possible combination of features are used to draw useful correlations among them. Therefore, in terms of similarity, classification tries to find relations between n-1 features and the class label, and association rule mining tries to find important relations amongst all feature(s) in comparison to  other feature(s). The following examples provide a glimpse of how classification vs. association rules might look.

Example of a classification rule:

Example of an association rule:

Here, "feature1 = gvalue1 and feature4 = gvalue4" is termed as the antecedent/premise, and "feature2= gvalue2 and feature3 = gvalue3" is termed as the consequent. So we can interpret an association rule as the implication XX YY , where XX and YY are both combination(s) of feature-value pairs that do not intersect. And XX is the premise, with YY as the consequent.

Because classification is finding relations among (up to n1n-1) features with 11 feature designated as the class label, ARM finds important relations of (up to n1n-1) features with other (up to n1n-1) features. The number of extracted rules can be enormous in the latter case. We'll see shortly how to reduce the number of extracted rules by excluding infrequent patterns.

Possible applications of ARM#

Association rules can provide useful insights in many scenarios. A few of them are:

  • Marketing strategies and sales promotions (which products to retain, which to discontinue, how discontinuation of one product can affect another products' sales)

  • Supermarket shelf management or market basket analysis (i.e., which products are commonly bought together—bread/milk/eggs)

  • Inventory management (i.e., which parts/products to retain at different geographical locations of a retail outlet for quick delivery on demand)

  • Disease diagnosis assistance (i.e., what symptoms are associated with which disease)

  • Intelligent transportation system (ITS) (i.e., which routes are traversed together commonly, which seasons/regions result in a high number of vehicle or pedestrian accidents)

  • Web log data (i.e., preferences of users; recommendation systems, suggestions about items frequently bought together)

  • Fraud detection on the web (i.e., which transactions might be fraudulent—several purchases from the same buyer during a short period of time)

Algorithms commonly used for ARM#

The following are some commonly used algorithms to find association rules:

  • Apriori: It uses the notion of itemsets and mines the data for frequent itemsets in a bottom-up manner. It iteratively generates frequent 1-itemsets to n-itemsets based on a given minimum support threshold. The output includes frequent itemsets that fulfill the aforementioned conditions and a list of generated rules. AIS and SETM algorithms also use the notion of itemsets, but both suffer from the issue of possibly generating and counting many small candidate itemsets.

  • FP-Growth: It mines frequent patterns without generating frequent itemsets by generating a compact tree called FP-⁠tree to represent the dataset. Then, it finds frequent patterns by employing a recursive technique. Tree-based pattern mining is typically a faster approach than Apriori.

  • Equivalence class transformation (ECLAT): It identifies groups of data that exhibit similar behavior while sustaining adequate coverage. These groups are termed equivalence classes. This approach is widely used in software testing optimization.

  • Direct hashing and pruning: It works in two phases. As the name suggests, the first phase is hashing, and the second is pruning. During the hashing stage, features are condensed using a hash function to fixed-length values while retaining important information. In the second stage, pruning removes the redundant features, reducing the dimensionality of the hash values.

  • Tree projection: It converts the data into a parse tree representation based on the relationships found among data features utilizing dependency or constituency parsing techniques. The feature values are mapped to nodes, and the relationships among them are represented by the edges.

How to evaluate association rules#

The following measures are commonly used to evaluate an association rule algorithm’s performance. Among these, the first two are the most commonly used. However, a combination of all can result in improved selection of important association rules.

  • Support (Supp)

  • Confidence (Conf)

  • Lift (Lift)

  • Leverage (Lev)

  • Conviction (Conv)

Support indicates how frequently the items occur in the dataset, and is also known as support count. Where freq(X,Y)freq(X, Y) indicates the number of observations/samples having both XX and YY, and mm is the total number of observations. The value of support can range from 00 to 11. The closer it is to 11, the more relevant and important the features might be, making it a good candidate for further investigation.

Confidence indicates how frequently the if-then rule is found to be true in the dataset. Confidence is the proportion of the samples covered by the premise that are also covered by the consequence. It’s basically a conditional probability, with a value that can range from 00 to 11 (the higher the value, the better). A high value of confidence indicates that the sample that includes XX will likely include YY as well.

Why multiple measures? A rule can show a strong correlation in a dataset if it appears frequently but may occur far less when applied. This can be due to high support and low confidence. Conversely, a rule may appear less relevant in a dataset, but in real operation, it might occur frequently. This can happen in the case of high confidence and low support. So, guided by multiple measures rather than one measure, we can find important and relevant association rules.

Lift is the ratio of confidence to the proportion of all samples that are covered by the consequence. It evaluates the importance of the association that’s independent of support. With and without XX in the sample, how much is YY affected? The value of lift can range from 00 to infinity. If lift > 11, it indicates a positive correlation between XX and YY (directly proportional), whereas lift < 11, shows a negative correlation between XX and YY (inversely proportional). If lift = 11, XX and YY are uncorrelated and are independent of one another.

Inclusion of lift to the two aforementioned measures (i.e., support and confidence) ensures the rules aren’t biased toward either rarely occurring feature combinations or rarely correct feature combinations.

Leverage is the proportion of additional observations covered by both the premise and the consequence beyond those expected if the premise and consequence were independent of one another. In other words, we can say that leverage is the probability of XX and YY happening simultaneously, and the expected frequency if XX and YY were independent. The value of leverage ranges from 1-1 to 11. If leverage = 00, it shows independence; leverage > 00 indicates positive correlation and leverage < 00 means negative correlation.

Conviction is another measure of departure from independence. Conviction is given by Probability(X)Probability(!Y)/Probability(X,!Y)Probability(X)Probability(!Y) / Probability(X, !Y) where the !! symbol represents logical NOT. It helps evaluate whether the rule was just found coincidently. A high value of conviction indicates that the consequent YY is greatly dependent on the premise XX. It can be interpreted as lift. Its value can be in the range of 00 to infinity. The conviction is 11 if items are independent.

Apriori algorithm#

The Apriori algorithm begins with converting each feature into 11-itemsets and continues to iteratively filter and merge frequent itemsets from the previous pass. Based on the frequent itemsets, rules can be derived. The take-home point is that in this merging process, the Apriori algorithm incorporates pre-pruning of the association rules as infrequent itemsets are dropped before moving to the next iteration. The following is the simplified Apriori algorithm:

Inputs: Dataset consisting of n features and m samples

Outputs: Set of frequent itemsets from the final iteration that meet the minimum support threshold and rules that meet the minimum confidence threshold

Hyperparameters:

  • Minimum support threshold: The minimum support needed for an itemset to qualify as frequent (SuppThmin)

  • Minimum performance metric threshold: The minimum performance metric threshold, such as confidence (ConfThmin), for excluding less important rules

Steps:

  1.  Generate an initial list of kk-itemsets of size k=1k=1 using unique features from the dataset.

  2. Compute support for each candidate kk-itemset.

  3. Exclude the candidate kk-itemsets that have support < SuppThmin to obtain remaining filtered frequent kk-itemsets.

  4. Merge the frequent kk-itemsets to create candidate k+1k+⁠1-⁠itemsets.

  5. Compute support for the candidate k+1k+⁠1-⁠itemsets.

  6. Filter the k+1k+1-itemsets according to criteria specified in step 3 to remove infrequent itemsets.

  7. If filtered k+1k+1-itemsets are found, move to step 4.

  8. Generate rules from the final frequent itemsets depending on minimum performance metric threshold such as confidence.

  9. Exclude all rules with confidence < ConfThmin .

Market basket analysis use case#

In its infancy, association rule mining was focused on finding items frequently bought together in supermarkets, so it became almost synonymous with market basket analysis. Later on, it was employed in many other application domains, as we discussed earlier. However, market basket analysis still remains a classic example of association rule mining. Let’s wrap things up by discussing a concrete example to see what we’ve learned so far in action.

Consider a fictional supermarket that sells in total five products: toothbrush, toothpaste, soap, tissues, and handwash. So, we end up with a transactions dataset that looks somewhat like the table below. One (1)(1) indicates the product was purchased, and zero (0)(0) means it wasn’t purchased.

Purchases at the Supermarket

Transaction ID

Toothpaste

Toothbrush

Tissues

Soap

Handwash

1

1

1

1

1

1

2

1

1

0

0

0

3

1

1

0

1

0

4

1

1

0

0

1

5

0

0

0

1

1

Different measures we have seen so far can then be calculated as follows (feel free to revisit the formulae for the measures):

Support of 1-itemsets#

Supp(Toothpaste)=4/5=0.8Supp(Toothpaste) = 4/5 = 0.8

Supp(Toothbrush)=4/5=0.8Supp(Toothbrush) = 4/5 = 0.8

Supp(Tissues)=1/5=0.2Supp(Tissues) = 1/5 = 0.2

Supp(Soap)=3/5=0.6Supp(Soap) = 3/5 = 0.6

Supp(Handwash)=3/5=0.6Supp(Handwash) = 3/5 = 0.6

Note that if we use SuppThmin =0.4=0.4, then the third 11-⁠itemset (Supp(Tissues)=1/5=0.2)(Supp(Tissues) = 1/5 = 0.2) wouldn’t be included in the formation of 22-⁠itemsets.

Performance measures for 2-⁠itemsets#

Some of the example 22-⁠itemsets are described below with their respective performance measures. It must be noted that this isn’t an exhaustive list of all possible 22-⁠itemsets, but a subset to indicate how calculations are performed behind the scenes. This shows how 11-⁠itemsets and 22-⁠itemsets are generated and how their measures are computed. Based on the various measures, itemsets will be filtered to get rid of infrequent itemsets.

Conf(ToothpasteSoap)=2/4=0.5Conf(Toothpaste → Soap) = 2/4 = 0.5

Conf(SoapTissues)=1/3=0.33Conf(Soap → Tissues) = 1/3 = 0.33

Lev(ToothpasteSoap)=0.4(0.80.6)=0.08Lev(Toothpaste → Soap) = 0.4 - (0.8 * 0.6) = 0.08

Lev(SoapTissues)=0.2(0.60.2)=0.08Lev(Soap → Tissues) = 0.2 - (0.6 * 0.2) = 0.08

Conv(ToothpasteSoap)=(10.4)/(10.5)=1.2Conv(Toothpaste → Soap) = (1 - 0.4) /(1 – 0.5) = 1.2

Conv(SoapTissues)=(10.2)/(10.33)=1.19Conv(Soap → Tissues) = (1 - 0.2) / (1 – 0.33) = 1.19

2-itemsets Example

2-itemsets

Support

Confidence

Lift

Toothpaste → Toothbrush

0.8

1

1.25

Toothpaste → Tissues

0.2

0.25

1.25

Toothpaste → Soap

0.4

0.5

0.83

Toothpaste → Handwash

0.4

0.5

0.83

Toothbrush → Tissues

0.2

0.25

1.25

Toothbrush → Soap

0.4

0.5

0.83

Toothbrush → Handwash

0.4

0.5

0.83

Tissues → Soap

0.2

1

1.67

Tissues → Handwash

0.2

1

1.67

Soap → Handwash

0.4

0.67

1.11

It must be noted that after filtering the itemsets, only the remaining frequent itemsets will be used for the next iteration to form 33-⁠itemsets.

Apriori algorithm in the context of market basket analysis#

Recall that the Apriori algorithm takes the minimum support threshold and minimum confidence threshold as hyperparameters from the user. Assuming SuppThmin =0.4= 0.4 and ConfThmin =0.5= 0.5, during the merging phase all the itemsets with support < 0.40.4 will be dropped in that iteration (indicated with red color in the table above), and for final rule generation, if the confidence of a rule < 0.50.5, then that rule will be excluded from the final list of rules (as an example assume 22-⁠itemsets are the final itemsets; indicated with blue color in the table above).

As a conclusion, take a look at how changing the hyperparameters of the Apriori algorithm changes the end results. The dataset used as an example is the supermarket dataset from Weka Datasets“Wekamooc/Dataminingwithweka/Data/Supermarket.arff at Master · Fracpete/Wekamooc.” n.d. GitHub. Accessed December 7, 2023. https://github.com/fracpete/wekamooc/blob/master/dataminingwithweka/data/supermarket.arff.. It contains 4627 samples and 217 features (different products and departments). The presence of a product in a sale transaction was indicated by ‘t’ (i.e., true). The results were also generated using Weka, a free software licensed under the GNU General Public License for machine learning and data analysis.

Scenario I#

We get the following results with SuppThmin =0.15=0.15 and ConfThmin =0.9= 0.9 provided as hyperparameters.

Apriori algorithm number of frequent itemsets and rules for SuppThmin = 0.15 and ConfThmin = 0.9
Apriori algorithm number of frequent itemsets and rules for SuppThmin = 0.15 and ConfThmin = 0.9

The above image shows that with the aforementioned parameters, the number of frequent 11-⁠itemsets, 22-⁠itemsets, 33-⁠itemsets, 44-⁠itemsets, 55-⁠itemsets, and 66-⁠itemsets were 44, 380, 910, 633, 105, and 1 respectively.

The extracted 10 most important rules are listed in the lower panel. Let’s discuss the first rule to clarify how to interpret the results. The first rule states that if biscuits=t, frozen foods=t, fruit=t, total=high, then bread and cake=t.

  • Support of itemsets mentioned in front of them (e.g., bread and cake occur together in 723 samples)

  • Right in front of the rule, its corresponding measures' values are mentioned (e.g., confidence = 0.920.92, lift = 1.271.27, leverage = 0.030.03, and conviction = 3.353.35)

Scenario II#

With SuppThmin =0.4= 0.4 and ConfThmin =0.7= 0.7 provided as hyperparameters. We can see that with increased support comes a decline in confidence, lift, and conviction. Lift and conviction are comparatively closer to 11 now, indicating independence between the premise and the consequent.

Apriori algorithm number of frequent itemsets and rules for SuppThmin = 0.4 and ConfThmin = 0.7
Apriori algorithm number of frequent itemsets and rules for SuppThmin = 0.4 and ConfThmin = 0.7

Scenario III#

With SuppThmin =0.4=0.4 and ConfThmin =0.9= 0.9 provided as hyperparameters.

Apriori algorithm number of frequent itemsets and rules for SuppThmin = 0.4 and ConfThmin = 0.9
Apriori algorithm number of frequent itemsets and rules for SuppThmin = 0.4 and ConfThmin = 0.9

Note that with this combination of hyperparameters, the Apriori algorithm ends up with no rules that fulfill that criterion. So among the three configurations we tried, the better option would be scenario I.

We can try different parameter configurations, generate rules, and ponder over them in light of the performance measures to reach a set of beautiful and meaningful association rules.

Market basket analysis using the Apriori algorithm in Python#

Let's see the things we’ve discussed and learned so far in action. Consider a dataset of market basket analysis“Market_Basket_Optimisation.csv.” n.d. Www.kaggle.com. Accessed December 7, 2023. https://www.kaggle.com/datasets/devchauhan1/market-basket-optimisationcsv..

The dataset contains records of retail store transactions. Each transaction consists of names of products purchased by a customer (e.g., “'bread”, “toothpaste”, “dishsoap”, “door lock”, “ shampoo”). The following Python code uses the apyori library to find frequent itemsets and association rules based on the Apriori algorithm. It’ll smoothly run in Google Colab; you just need to upload the dataset and provide the correct path to it. The description of different statements is provided using comments. It would be better to break it down into smaller chunks for ease of discussion and we can do that by adding these blocks directly to separate cells of the Colab notebook, or adding them all together in a single cell in the exact same order.

# this project was built and executed on Google Colab,
# hence the description is w.r.t. its environment
# install the required package for Apriori algorithm
!pip install apyori
# importing libraries for data processing and data visualization
import numpy as np
import pandas as pd
Import libraries
# import the dataset, right now we have dataset uploaded to 'sample_data'
# in the colab as a csv file
# you can right click the file uploaded in the 'sample _data' section
#to copy the path conveniently
# or provide the correct path of the dataset in the read_csv method wherever
# the dataset exists
Data = pd.read_csv('/content/sample_data/Market_Basket_Optimisation.csv', header = None)
print(Data.head()) # displays the first 5 rows of the dataset
print (Data.info()) # displays the information about the dataset (e.g., features and samples etc.)
rows, cols =Data.shape ## recording number of samples and number of features for processing later
Read the dataset from the .csv file

The two print statements in the above code block will display the result of the print(Data.head()) statement and the print (Data.info()) statement.

Result of print(Data.head())
Result of print(Data.head())
Result of print(Data.info())
Result of print(Data.info())

Now, we need to process the dataset to convert it into an appropriate form for the Apriori method. Its documentation describes what the correct input format is:

# We need to train the model 'apyori' for mining frequent itemsets and rules
# but it takes input in a list format and the elements of the list should be strings.
# For that, we need to convert the pandas dataframe into a list
# Initializing the transactions list
transactionsList = []
# preparing the list of transactions. The number of features in the dataset are in object cols, the number of samples in the dataset are in object rows
for i in range(0, rows):
transactionsList.append([str(Data.values[i,j]) for j in range(0, cols)])
Prepare input data to be fed to the model

After the dataset is in the right format, we need to apply the Apriori algorithm.

#import apriori from apyori
from apyori import apriori
# the hyperparameters of the Apriori algorithm are specified as the method arguments
# the complete details can be found in the documentation of the apyori library
# the combination of hyperparameters below can be altered and the resulting output can be examined:
# minimum support = 0.003
# minimum confidence = 0.2
# minimum lift = 3
# minimum length = 2 i.e. minimum 2-itemsets frequent sets
# maximum length = 2 i.e. maximum 2-itemsets frequent sets
rules = apriori(transactions = transactionsList, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)
Invoke the Apriori method from the Apyori library

The results are returned in the rules object by the apriori method. The details of the parameters are mentioned in the comments.

After that, we need to convert the results to a list and then into a dataframe for easy viewing. The following piece of code does that for you.

# convert the resulting rules into a list
rules_list = list(rules)
# convert the list into a pandas dataframe
def listToTabularForm(rules_list):
leftHandSide = [tuple(resultTuple[2][0][0])[0] for resultTuple in rules_list]
rightHandSide = [tuple(resultTuple[2][0][1])[0] for resultTuple in rules_list]
support = [resultTuple[1] for resultTuple in rules_list]
confidence = [resultTuple[2][0][2] for resultTuple in rules_list]
lift = [resultTuple[2][0][3] for resultTuple in rules_list]
return list(zip(leftHandSide, rightHandSide, support, confidence, lift))
output_DataFrame = pd.DataFrame(listToTabularForm(rules_list), columns = ['Left_Hand_Side', 'Right_Hand_Side', 'Support', 'Confidence', 'Lift'])
output_DataFrame
Convert the generated output into a pandas dataframe

Voila! We have created our association rules.

Association rules with Apriori algorithm
Association rules with Apriori algorithm

If you found this blog interesting, more relevant material is available in the following course and skill path.

An Introductory Guide to Data Science and Machine Learning

Cover
An Introductory Guide to Data Science and Machine Learning

There is a lot of dispersed, and somewhat conflicting information on the internet when it comes to data science, making it tough to know where to start. Don't worry. This course will get you familiar with the state of data science and the related fields such as machine learning and big data. You will be going through the fundamental concepts and libraries which are essential to solve any problem in this field. You will work on real-time projects from Kaggle while also honing your mathematical skills which will be used extensively in most problems you face. You will also be taken through a systematic approach to learning about data acquisition to data wrangling and everything in between. This is your all-in-one guide to becoming a confident data scientist.

6hrs
Beginner
63 Playgrounds
160 Illustrations

Machine Learning with Python Libraries

Cover
Machine Learning with Python Libraries

Machine learning is used for software applications that help them generate more accurate predictions. It is a type of artificial intelligence operating worldwide and offers high-paying careers. This path will provide a hands-on guide on multiple Python libraries that play an important role in machine learning. This path also teaches you about neural networks, PyTorch Tensor, PyCaret, and GAN. By the end of this module, you’ll have hands-on experience in using Python libraries to automate your applications.

53hrs
Beginner
56 Challenges
62 Quizzes


  

Free Resources