Apriori algorithm explained with examples

Tooba Jamal
4 min readSep 27, 2021

Have you ever wondered how Amazon suggests items you might like? How does it associate those things with items in your cart or your shopping history? Well, all of it is possible because of the Apriori algorithm.

Assoication rule mining

Association rule mining refers to finding rules that predict the occurrence of an item in a transaction based on the occurrences of other items.

The definition above is quite confusing, no?

Let’s dive deeper into the Apriori algorithm to make things simpler. But, before moving ahead below are some of the terms you will need to be familiar with to understand things smoothly.

Itemset: A collection of one or more items

Support count: Frequency of occurrence of an item set. It is denoted by σ.

Support: Fraction of transactions that contain an itemset. It is denoted by s.

Confidence: Measures how often (A∪B) occurs in transactions that contain A. It is denoted by c.

I know it is still confusing but I promise you that it will be at your fingertips in a couple of moments.

Let’s take an example of an online grocery store. We are given the following transactions

Now let’s measure the support, support count, and confidence of {Milk, Diaper} → Beer

Support count will identify the number of transactions that contain Bread, Milk, and Beer together. Transactions 3 and 4 have all three of them together. Right? So the support count of {Milk, Diaper} → Beer is two

Remember that support is the fraction of transactions that contain an item set? Our itemset here is Bread, Milk, and Beer, and the fraction is found out by dividing the proportion of something (itemset in our case) by the total. Bread, Milk, and Beer occur two times together in all the transactions and the total number of transactions is five. Therefore our support (s) is 2/5 = 0.4 or 40%.

For confidence, we measure how often (AUB) occurs in transactions that contain A. For us A is Milk and Diaper and B is Beer Therefore AUB is Milk, DIaper, Beer. We already know that AUB occurs two times in all transactions. See that A (milk and diaper) occurs in transactions 3, 4, and 5 together. Therefore confidence will be 2/3 as in all the transactions that contain A (Milk and Diaper), AUB (Milk, Diaper, and Beer) occurs twice. Thus, c=0.67.

But what exactly is the use of support and confidence?

These measures the probability of items occurring together in a transaction. Suppose you are given a task of association rule mining for the online grocery store and you are told to find out the items that have a probability higher than 60% to occur together. You used the Apriori algorithm to come to the conclusion that customers who buy eggs also buy bread 65% of the time and customers who buy snacks also buy cold drinks 70% of the time. Based on these conclusions the customers of your online grocery store who buy eggs and snacks will be suggested to buy bread and cold drinks respectively.

The algorithm

Step 1: Calculate the support of each item in given transactions and eliminate the items with support less than the given threshold.

Step 2: Make pairs of the items with support greater than or equal to a given threshold. Calculate the support of each pair and eliminate those with support less than the threshold.

Step 3: Make triple itemsets out of the selected pairs. Calculate their support and eliminate those with support less than the threshold.

Step 4: Make associations out of the selected triple itemsets. Calculate the support and confidence of each association and select the associations with confidence greater than the minimum confidence threshold. The associations selected in the end are likely to occur in your transactions.

Those customers who buy milk and diaper together are likely to buy bread. Customers who buy bread and diaper together are likely to buy milk nad customers who buy milk and bread are likely to buy diapers.

--

--