The Apriori algorithm
The most famous algorithm for association rule learning is Apriori. It was proposed by Agrawal and Srikant in 1994. The input of the algorithm is a dataset of transactions where each transaction is a set of items. The output is a collection of association rules for which support and confidence are greater than some specified threshold. The name comes from the Latin phrase a priori (literally, "from what is before") because of one smart observation behind the algorithm: if the item set is infrequent, then we can be sure in advance that all its subsets are also infrequent.
You can implement Apriori with the following steps:
- Count the support of all item sets of length 1, or calculate the frequency of every item in the dataset.
- Drop the item sets that have support lower than the threshold.
- Store all the remaining item sets.
- Extend each stored item set by one element with all possible extensions. This step is known as candidate generation.
- Calculate the support value of each...