The Apriori algorithm is used for mining frequent itemsets and devising association rules from a transactional database. The parameters “support” and “confidence” are used. Support refers to items’ frequency of occurrence; confidence is a conditional probability.
Items in a transaction form an item set. The algorithm begins by identifying frequent, individual items (items with a frequency greater than or equal to the given support) in the database and continues to extend them to larger, frequent itemsets.
The apriori algorithm uses the downward closure property ,i.e., all the subsets of a frequent itemset are frequent, but the converse may not be true.
The following are the main steps of the algorithm:
Calculate the support of item sets (of size k = 1) in the transactional database (note that support is the frequency of occurrence of an itemset). This is called generating the candidate set.
Prune the candidate set by eliminating items with a support less than the given threshold.
Join the frequent itemsets to form sets of size k + 1, and repeat the above sets until no more itemsets can be formed. This will happen when the set(s) formed have a support less than the given support.
Let’s go over an example to see the algorithm in action. Suppose that the given support is 3 and the required confidence is 80%.
Now let’s create the association rules. This is where the given confidence is required. For rule , the confidence is calculated as .
The following rules can be obtained from the size of two frequent itemsets (2-frequent itemsets):
Since our required confidence is 80%, only rules 1 and 4 are included in the result. Therefore, it can be concluded that customers who bought item two (I2) always bought item three (I3) with it, and customers who bought item four (I4) always bought item 3 (I3) with it.