Understanding association rule mining

Introduction to association rule mining#

Association rule mining (ARM) finds frequently occurring if-then patterns in the data. The output is in the form of rules that describe the most important combinations of features that co-occur frequently.

Association rule mining falls under the category of unsupervised learning as we don’t have access to the correct answers, i.e., what the correct association rules are. Hence, the evaluation of the results is subjective.

Take a quick look at the table below to learn the commonly used nomenclature. The dataset is shown in a denormalized 2D format where features one to four (features/predictors/attributes; all synonyms) are used to describe the characteristics or properties of an entity (i.e., the metadata of an entity). A row is a sample (observation/instance; all synonyms) containing the actual data values recorded for a single entity. These data values can be of various types, such as nominal, ordinal, numeric, etc. Usually, a prepared and preprocessed dataset consists of n such features (four in the following table) and m samples (two in the following table).

Classification vs. Association rules#

Let's see the differences and similarities between association rules and classification. One prominent difference is that classification is a form of supervised learning, whereas association rule mining is a form of unsupervised learning. Assume we have $n$ features and $m$ samples. In the case of classification, one feature is designated as the class label, which contains the class value for the sample (consider, class label = feature4 in the above table), and the remaining $n-1$ features are used for model training to determine what the class label is. On the contrary, in the case of association rules, there is no notion of class label; rather, all the features and any possible combination of features are used to draw useful correlations among them. Therefore, in terms of similarity, classification tries to find relations between n-1 features and the class label, and association rule mining tries to find important relations amongst all feature(s) in comparison to other feature(s). The following examples provide a glimpse of how classification vs. association rules might look.

Example of a classification rule:

Here, "feature1 = gvalue1 and feature4 = gvalue4" is termed as the antecedent/premise, and "feature2= gvalue2 and feature3 = gvalue3" is termed as the consequent. So we can interpret an association rule as the implication $X$ $→$ $Y$ , where $X$ and $Y$ are both combination(s) of feature-value pairs that do not intersect. And $X$ is the premise, with $Y$ as the consequent.

Because classification is finding relations among (up to $n-1$ ) features with $1$ feature designated as the class label, ARM finds important relations of (up to $n-1$ ) features with other (up to $n-1$ ) features. The number of extracted rules can be enormous in the latter case. We'll see shortly how to reduce the number of extracted rules by excluding infrequent patterns.

Possible applications of ARM#

Association rules can provide useful insights in many scenarios. A few of them are:

Marketing strategies and sales promotions (which products to retain, which to discontinue, how discontinuation of one product can affect another products' sales)
Supermarket shelf management or market basket analysis (i.e., which products are commonly bought together—bread/milk/eggs)
Inventory management (i.e., which parts/products to retain at different geographical locations of a retail outlet for quick delivery on demand)
Disease diagnosis assistance (i.e., what symptoms are associated with which disease)
Intelligent transportation system (ITS) (i.e., which routes are traversed together commonly, which seasons/regions result in a high number of vehicle or pedestrian accidents)
Web log data (i.e., preferences of users; recommendation systems, suggestions about items frequently bought together)
Fraud detection on the web (i.e., which transactions might be fraudulent—several purchases from the same buyer during a short period of time)

Algorithms commonly used for ARM#

The following are some commonly used algorithms to find association rules:

Apriori: It uses the notion of itemsets and mines the data for frequent itemsets in a bottom-up manner. It iteratively generates frequent 1-itemsets to n-itemsets based on a given minimum support threshold. The output includes frequent itemsets that fulfill the aforementioned conditions and a list of generated rules. AIS and SETM algorithms also use the notion of itemsets, but both suffer from the issue of possibly generating and counting many small candidate itemsets.
FP-Growth: It mines frequent patterns without generating frequent itemsets by generating a compact tree called FP-⁠tree to represent the dataset. Then, it finds frequent patterns by employing a recursive technique. Tree-based pattern mining is typically a faster approach than Apriori.
Equivalence class transformation (ECLAT): It identifies groups of data that exhibit similar behavior while sustaining adequate coverage. These groups are termed equivalence classes. This approach is widely used in software testing optimization.
Direct hashing and pruning: It works in two phases. As the name suggests, the first phase is hashing, and the second is pruning. During the hashing stage, features are condensed using a hash function to fixed-length values while retaining important information. In the second stage, pruning removes the redundant features, reducing the dimensionality of the hash values.
Tree projection: It converts the data into a parse tree representation based on the relationships found among data features utilizing dependency or constituency parsing techniques. The feature values are mapped to nodes, and the relationships among them are represented by the edges.

How to evaluate association rules#

The following measures are commonly used to evaluate an association rule algorithm’s performance. Among these, the first two are the most commonly used. However, a combination of all can result in improved selection of important association rules.

Support (Supp)
Confidence (Conf)
Lift (Lift)
Leverage (Lev)
Conviction (Conv)

Support indicates how frequently the items occur in the dataset, and is also known as support count. Where $freq(X, Y)$ indicates the number of observations/samples having both $X$ and $Y$ , and $m$ is the total number of observations. The value of support can range from $0$ to $1$ . The closer it is to $1$ , the more relevant and important the features might be, making it a good candidate for further investigation.

Why multiple measures? A rule can show a strong correlation in a dataset if it appears frequently but may occur far less when applied. This can be due to high support and low confidence. Conversely, a rule may appear less relevant in a dataset, but in real operation, it might occur frequently. This can happen in the case of high confidence and low support. So, guided by multiple measures rather than one measure, we can find important and relevant association rules.

Lift is the ratio of confidence to the proportion of all samples that are covered by the consequence. It evaluates the importance of the association that’s independent of support. With and without $X$ in the sample, how much is $Y$ affected? The value of lift can range from $0$ to infinity. If lift > $1$ , it indicates a positive correlation between $X$ and $Y$ (directly proportional), whereas lift < $1$ , shows a negative correlation between $X$ and $Y$ (inversely proportional). If lift = $1$ , $X$ and $Y$ are uncorrelated and are independent of one another.

Inclusion of lift to the two aforementioned measures (i.e., support and confidence) ensures the rules aren’t biased toward either rarely occurring feature combinations or rarely correct feature combinations.

Leverage is the proportion of additional observations covered by both the premise and the consequence beyond those expected if the premise and consequence were independent of one another. In other words, we can say that leverage is the probability of $X$ and $Y$ happening simultaneously, and the expected frequency if $X$ and $Y$ were independent. The value of leverage ranges from $-1$ to $1$ . If leverage = $0$ , it shows independence; leverage > $0$ indicates positive correlation and leverage < $0$ means negative correlation.

Conviction is another measure of departure from independence. Conviction is given by $Probability(X)Probability(!Y) / Probability(X, !Y)$ where the $!$ symbol represents logical NOT. It helps evaluate whether the rule was just found coincidently. A high value of conviction indicates that the consequent $Y$ is greatly dependent on the premise $X$ . It can be interpreted as lift. Its value can be in the range of $0$ to infinity. The conviction is $1$ if items are independent.

Apriori algorithm#

The Apriori algorithm begins with converting each feature into $1$ -itemsets and continues to iteratively filter and merge frequent itemsets from the previous pass. Based on the frequent itemsets, rules can be derived. The take-home point is that in this merging process, the Apriori algorithm incorporates pre-pruning of the association rules as infrequent itemsets are dropped before moving to the next iteration. The following is the simplified Apriori algorithm:

Inputs: Dataset consisting of n features and m samples

Outputs: Set of frequent itemsets from the final iteration that meet the minimum support threshold and rules that meet the minimum confidence threshold

Hyperparameters:

Minimum support threshold: The minimum support needed for an itemset to qualify as frequent (SuppTh_min)
Minimum performance metric threshold: The minimum performance metric threshold, such as confidence (ConfTh_min), for excluding less important rules

Steps:

Generate an initial list of $k$ -itemsets of size $k=1$ using unique features from the dataset.
Compute support for each candidate $k$ -itemset.
Exclude the candidate $k$ -itemsets that have support < SuppTh_min to obtain remaining filtered frequent $k$ -itemsets.
Merge the frequent $k$ -itemsets to create candidate $k+⁠1$ -⁠itemsets.
Compute support for the candidate $k+⁠1$ -⁠itemsets.
Filter the $k+1$ -itemsets according to criteria specified in step 3 to remove infrequent itemsets.
If filtered $k+1$ -itemsets are found, move to step 4.
Generate rules from the final frequent itemsets depending on minimum performance metric threshold such as confidence.
Exclude all rules with confidence < ConfTh_min.

Market basket analysis use case#

In its infancy, association rule mining was focused on finding items frequently bought together in supermarkets, so it became almost synonymous with market basket analysis. Later on, it was employed in many other application domains, as we discussed earlier. However, market basket analysis still remains a classic example of association rule mining. Let’s wrap things up by discussing a concrete example to see what we’ve learned so far in action.

Consider a fictional supermarket that sells in total five products: toothbrush, toothpaste, soap, tissues, and handwash. So, we end up with a transactions dataset that looks somewhat like the table below. One $(1)$ indicates the product was purchased, and zero $(0)$ means it wasn’t purchased.

Different measures we have seen so far can then be calculated as follows (feel free to revisit the formulae for the measures):

Support of 1-itemsets#

$Supp(Toothpaste) = 4/5 = 0.8$

$Supp(Toothbrush) = 4/5 = 0.8$

$Supp(Tissues) = 1/5 = 0.2$

$Supp(Soap) = 3/5 = 0.6$

$Supp(Handwash) = 3/5 = 0.6$

Note that if we use SuppTh_min $=0.4$ , then the third $1$ -⁠itemset $(Supp(Tissues) = 1/5 = 0.2)$ wouldn’t be included in the formation of $2$ -⁠itemsets.

Performance measures for 2-⁠itemsets#

Some of the example $2$ -⁠itemsets are described below with their respective performance measures. It must be noted that this isn’t an exhaustive list of all possible $2$ -⁠itemsets, but a subset to indicate how calculations are performed behind the scenes. This shows how $1$ -⁠itemsets and $2$ -⁠itemsets are generated and how their measures are computed. Based on the various measures, itemsets will be filtered to get rid of infrequent itemsets.

$Conf(Toothpaste → Soap) = 2/4 = 0.5$

$Conf(Soap → Tissues) = 1/3 = 0.33$

$Lev(Toothpaste → Soap) = 0.4 - (0.8 * 0.6) = 0.08$

$Lev(Soap → Tissues) = 0.2 - (0.6 * 0.2) = 0.08$

$Conv(Toothpaste → Soap) = (1 - 0.4) /(1 – 0.5) = 1.2$

$Conv(Soap → Tissues) = (1 - 0.2) / (1 – 0.33) = 1.19$

It must be noted that after filtering the itemsets, only the remaining frequent itemsets will be used for the next iteration to form $3$ -⁠itemsets.

Apriori algorithm in the context of market basket analysis#

Recall that the Apriori algorithm takes the minimum support threshold and minimum confidence threshold as hyperparameters from the user. Assuming SuppTh_min $= 0.4$ and ConfTh_min $= 0.5$ , during the merging phase all the itemsets with support < $0.4$ will be dropped in that iteration (indicated with red color in the table above), and for final rule generation, if the confidence of a rule < $0.5$ , then that rule will be excluded from the final list of rules (as an example assume $2$ -⁠itemsets are the final itemsets; indicated with blue color in the table above).

As a conclusion, take a look at how changing the hyperparameters of the Apriori algorithm changes the end results. The dataset used as an example is the supermarket dataset from Weka Datasets“Wekamooc/Dataminingwithweka/Data/Supermarket.arff at Master · Fracpete/Wekamooc.” n.d. GitHub. Accessed December 7, 2023. https://github.com/fracpete/wekamooc/blob/master/dataminingwithweka/data/supermarket.arff.. It contains 4627 samples and 217 features (different products and departments). The presence of a product in a sale transaction was indicated by ‘t’ (i.e., true). The results were also generated using Weka, a free software licensed under the GNU General Public License for machine learning and data analysis.

Scenario I#

We get the following results with SuppTh_min $=0.15$ and ConfTh_min $= 0.9$ provided as hyperparameters.

The above image shows that with the aforementioned parameters, the number of frequent $1$ -⁠itemsets, $2$ -⁠itemsets, $3$ -⁠itemsets, $4$ -⁠itemsets, $5$ -⁠itemsets, and $6$ -⁠itemsets were 44, 380, 910, 633, 105, and 1 respectively.

The extracted 10 most important rules are listed in the lower panel. Let’s discuss the first rule to clarify how to interpret the results. The first rule states that if biscuits=t, frozen foods=t, fruit=t, total=high, then bread and cake=t.

Support of itemsets mentioned in front of them (e.g., bread and cake occur together in 723 samples)
Right in front of the rule, its corresponding measures' values are mentioned (e.g., confidence = $0.92$ , lift = $1.27$ , leverage = $0.03$ , and conviction = $3.35$ )

Scenario II#

With SuppTh_min $= 0.4$ and ConfTh_min $= 0.7$ provided as hyperparameters. We can see that with increased support comes a decline in confidence, lift, and conviction. Lift and conviction are comparatively closer to $1$ now, indicating independence between the premise and the consequent.

Note that with this combination of hyperparameters, the Apriori algorithm ends up with no rules that fulfill that criterion. So among the three configurations we tried, the better option would be scenario I.

We can try different parameter configurations, generate rules, and ponder over them in light of the performance measures to reach a set of beautiful and meaningful association rules.

Market basket analysis using the Apriori algorithm in Python#

Let's see the things we’ve discussed and learned so far in action. Consider a dataset of market basket analysis“Market_Basket_Optimisation.csv.” n.d. Www.kaggle.com. Accessed December 7, 2023. https://www.kaggle.com/datasets/devchauhan1/market-basket-optimisationcsv..

The dataset contains records of retail store transactions. Each transaction consists of names of products purchased by a customer (e.g., “'bread”, “toothpaste”, “dishsoap”, “door lock”, “ shampoo”). The following Python code uses the apyori library to find frequent itemsets and association rules based on the Apriori algorithm. It’ll smoothly run in Google Colab; you just need to upload the dataset and provide the correct path to it. The description of different statements is provided using comments. It would be better to break it down into smaller chunks for ease of discussion and we can do that by adding these blocks directly to separate cells of the Colab notebook, or adding them all together in a single cell in the exact same order.

# import the dataset, right now we have dataset uploaded to 'sample_data' 
# in the colab as a csv file
# you can right click the file uploaded in the 'sample _data' section 
#to copy the path conveniently
# or provide the correct path of the dataset in the read_csv method wherever
        # the dataset exists
Data = pd.read_csv('/content/sample_data/Market_Basket_Optimisation.csv', header = None)
print(Data.head())  # displays the first 5 rows of the dataset
print (Data.info()) # displays the information about the dataset (e.g., features and samples etc.)
rows, cols =Data.shape ## recording number of samples and number of features for processing later

Read the dataset from the .csv file

# We need to train the model 'apyori' for mining frequent itemsets and rules
# but it takes input in a list format and the elements of the list should be strings.
# For that, we need to convert the pandas dataframe into a list
# Initializing the transactions list
transactionsList = []
# preparing the list of transactions. The number of features in the dataset are in object cols, the number of samples in the dataset are in object rows
for i in range(0, rows):
  transactionsList.append([str(Data.values[i,j]) for j in range(0, cols)])

Prepare input data to be fed to the model

#import apriori from apyori
from apyori import apriori
# the hyperparameters of the Apriori algorithm are specified as the method arguments
# the complete details can be found in the documentation of the apyori library
# the combination of hyperparameters below can be altered and the resulting output can be examined:
# minimum support = 0.003
# minimum confidence = 0.2
# minimum lift = 3
# minimum length = 2 i.e. minimum 2-itemsets frequent sets
# maximum length = 2 i.e. maximum 2-itemsets frequent sets 
rules = apriori(transactions = transactionsList, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

Invoke the Apriori method from the Apyori library

# convert the resulting rules into a list
rules_list = list(rules)
# convert the list into a pandas dataframe
def listToTabularForm(rules_list):
    leftHandSide         = [tuple(resultTuple[2][0][0])[0] for resultTuple in rules_list]
    rightHandSide         = [tuple(resultTuple[2][0][1])[0] for resultTuple in rules_list]
    support    = [resultTuple[1] for resultTuple in rules_list]
    confidence = [resultTuple[2][0][2] for resultTuple in rules_list]
    lift       = [resultTuple[2][0][3] for resultTuple in rules_list]
    return list(zip(leftHandSide, rightHandSide, support, confidence, lift))
output_DataFrame = pd.DataFrame(listToTabularForm(rules_list), columns = ['Left_Hand_Side', 'Right_Hand_Side', 'Support', 'Confidence', 'Lift'])
output_DataFrame

Convert the generated output into a pandas dataframe

An Introductory Guide to Data Science and Machine Learning

An Introductory Guide to Data Science and Machine Learning

There is a lot of dispersed, and somewhat conflicting information on the internet when it comes to data science, making it tough to know where to start. Don't worry. This course will get you familiar with the state of data science and the related fields such as machine learning and big data. You will be going through the fundamental concepts and libraries which are essential to solve any problem in this field. You will work on real-time projects from Kaggle while also honing your mathematical skills which will be used extensively in most problems you face. You will also be taken through a systematic approach to learning about data acquisition to data wrangling and everything in between. This is your all-in-one guide to becoming a confident data scientist.

6hrs

Beginner

63 Playgrounds

160 Illustrations

Machine Learning with Python Libraries

Machine learning is used for software applications that help them generate more accurate predictions. It is a type of artificial intelligence operating worldwide and offers high-paying careers. This path will provide a hands-on guide on multiple Python libraries that play an important role in machine learning. This path also teaches you about neural networks, PyTorch Tensor, PyCaret, and GAN. By the end of this module, you’ll have hands-on experience in using Python libraries to automate your applications.

53hrs