Apriori Algorithm and Association Rules

Association Rule Mining is an important technique. We use it to discover rules between variables in large databases. Apriori algorithm helps to find the frequent itemsets from which Association Rules are made. You’ll learn about these concepts here.

Association Rule Mining

Association Rule Mining helps us find rules and relationships in the dataset. It works for both relational databases and transactional databases. It is also used to find the correlated features with each other. An association rule has two parts: antecedent and consequent. An antecedent is found in the dataset at hand and a consequent is found by using the antecedent. One such example of association rule is:


Antecedent>Consequent{Antecedent} -> {Consequent}
Diaper>Beer{Diaper} -> {Beer}
X>YX -> Y


XX and YY are called antecedents and consequent, respectively. It can be read as: People who buy diapers are also likely to buy beer. “Diaper” and “beer” are the items. This rule has been deduced out of a dataset. This rule can help the companies to increase revenue and make smart decisions based on it.

Metrics for Evaluating Association Rules

There are various metrics involved in evaluating the Interest of Association Rules. Association Rules are carefully derived from the dataset. Let us consider the following transactional table.

Transactional ID Items
1 Bread, Milk
2 Bread, Diaper,Bear Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Support

Support tells us about how frequent or popular an itemset is, as measured by the proportion of transactions in which an itemset appears. It is a value between 0 and 1. Values closer to 1 show that itemsets occur more frequently in the dataset. We refer to an itemset as a frequent itemset if support is larger than a specified minimum-support threshold. In the above table, we have:

Support{Beer}=35Support\{Beer\}=\frac{3}{5}

There are a total of five transactions, and out of those three have the item beer appearing in them.

Support{Milk,Coke}=25Support\{Milk, Coke\}= \frac{2}{5}

Out of a total five transactions, we have {Milk,Coke}\{Milk, Coke\} appearing together in two transactions.

Confidence

Confidence tells us how likely an item YY is purchased when item XX is purchased and expressed as confidence{X>Y}confidence\{X->Y\}. XX and YY are called antecedents and consequents, respectively. The higher the confidence, the more of a chance the items occur together. It is also a value between 0 and 1.

confidence{X>Y}=support{X>Y}support{X}confidence \{X->Y\} = \frac{support\{X->Y\}}{support\{X\}}

The metric is not symmetric, meaning the confidence for {X>Y}\{X->Y\} is not the same as {Y>X}\{Y->X\}. When antecedents and consequents always occur together, the confidence is 1, or the maximum.

Confidence{Milk>Coke}=Support{Milk>Coke}Support{Milk}=2545Confidence\{Milk->Coke\} = \frac{Support\{Milk->Coke\}}{Support\{Milk\}} = \frac{\frac{2}{5}}{\frac{4}{5}}

In the denominator, you can see that confidence takes into account the popularity of {Milk}\{Milk\}, but not {Coke}\{Coke\}. Due to this, confidence measures might misrepresent the importance of an association.

Lift

In order to overcome the drawbacks of confidence measure, we introduce another measure named as lift. It also tells us how likely an item YY is purchased when item XX is purchased and expressed as lift{X>Y}lift\{X-> Y\} where XX and YY are called antecedents and consequents. This time, it also takes the popularity of YY into account. You can see this in the formula below. The higher the lift, the more of a chance the items occur together. It is a value in the range [0,][0,\infty]

lift{X>Y}=support{X>Y}support{X}support{Y}=confidence{X>Y}support{Y}lift \{X->Y\} = \frac{support\{X->Y\}}{support\{X\} * support\{Y\}} = \frac{confidence\{X->Y\}}{support\{Y\}}

lift{Milk>Coke}=Confidence{Milk>Coke}Support{Coke}lift\{Milk->Coke\} = \frac{Confidence\{Milk->Coke\}}{Support\{Coke\}}

Application of Association Rule Mining

Association Rule Mining helps in the following areas.

  • It is used in Market Basket Analysis, which involves finding the items that are frequently bought together, to boost the sales and meet some business objectives.

  • In the medicine domain, we use Association Rule Mining to find which drugs are suitable for a particular disease.

Algorithms for mining frequent patterns

The pre-requisite step to find the Association Rules is to find the frequent itemsets present in the dataset. These algorithms below are used for that.

Apriori Algorithm

The Apriori Algorithm is used to find frequent itemsets from the dataset . It says If an itemset is infrequent, then all its supersets must also be infrequent. If {beer}\{beer\} is found to be infrequent, we can expect that {beer,pizza}\{beer, pizza\} to be equally infrequent. Once frequent itemsets have been found, then we can derive the Association Rules out of them.

FP-growth Algorithm

The FP-growth Algorithm also finds the frequent itemsets but it holds some advantages over Apriori Algorithm. This scans the database for finding frequent patterns. It only scans the database twice as compared to Apriori, which scans for every transaction. It is efficient and scalable.

Get hands-on with 1400+ tech skills courses.