Apriori algorithm (Items frequently bought together): A basic explanation of how it works

Aug 13

Aug 13 E-commerce Product Recommendations: Apriori algorithm (Items frequently bought together): A basic explanation of how it works

Disclaimer: I’m not the first to crack this. So, all ideas ideas here may/may not have been expressed in more simpler / more complicated ways. The original post I did was a bit short on the explanation and focused more on building product recommendations in Orange Machine Learning. That article also references other blog posts / videos that I used in trying to understand. I’ll try to be as basic as I can in explaining this [ELI5 version], this is obviously based on what I understood. Let’s try!

Edit: Post has been updated to use * symbol to imply an AND condition between two products instead of + symbol, based on Disqus comment.

Let’s say you have a website that sells three items: Item X, Item Y, Item Z. As it would happen, users purchase only X on some occasions while they might some times add X * Y, or go X * Z, or just Y, or just Z, or Y * Z or X * Z. Which products should you recommend?

In all explanations of apirori algorithm, you’ll come across three main metrics: Support, Confidence and Lift.

What are these metrics and how do they work together in building recommendations?

I’ll first try to explain these metrics (using X, Y, Z) and then use an example to bring it together via a live example.

Support: With all the combinations that R/Python (or any other platform) needs to run between product combinations, Support is a starting point. This metric measures what % of total transactions contain the two (or more) combinations that are being purchased. This seems fair. You wouldn’t show X and Y together, if you knew that these items were purchased together say, only 5% of the times, compared to X * Z, which were purchased 10% of the time. The higher the Support metric, the more you can be sure that these items have a closer relationship. In apriori algorithm, you also need to decide on a minimum support level. The lower the minimum support you set, the more flexible you’re being with the parameters in finding recommendations. You could say, I only want to show recommendations when the minimum support was at least 30% - which is fine, except that it becomes way harder to find any product combinations that were purchased this frequently. In Orange ML, it’s as simple as using the minimum support slider + refresh to see if Orange ML could find recommendations that met the criteria.

Confidence: Once you know Support, you then go trying to calculate Confidence metric. This metric divides the Support metric by the antecedent. Let’s go with X * Z as an example. X would the antecedent (product in basket) while Z would be the consequent (the recommendation).

So, Confidence metric divides X * Z / frequency of X. As the percentage total of transactions that contain X * Z has to be lower than the percentage of transactions that contain X, Confidence metric tries to understand how closely is X * Z related to X. The higher the confidence metric, the better it is. Again, you can set minimum Confidence levels in the algorithm. So, if you set a very high Confidence level, you might not get recommendations.

Lift: Once you know Confidence metric., you then divide it by frequency of item Z to know how much better is it. You’d want a Lift metric > 1 and any sorts that you do on the Lift column would be in descending order. The higher the Lift, the better results you can expect by showing these two items together. The higher the denominator frequency, the more popular the product is on its own anyway. Therefore, showing it together would make more sense.

Let’s work with an example: Below example looks at six transactions on our store, where X, Y , Z were sold.

items purchased together apriori algorithm sample data.JPG

N = 6 as there are six transactions.

4 transactions contain X. Therefore, freq(X) = 0.67 [ 4/6]

3 transactions contain Y. Therefore, freq(Y) = 0.50 [3/6]

4 transactions contain Z. Therefore, freq(Z) = 0.67 [4/6]

So, let’s say we can show X * Y, Y * Z, X * Z , where X is the antecedent. Y and Z would be the possible consequents.

X + Y scenario:

Support (X*Y)/N = 0.17 [1 transaction that contains both X and Y divided by total 6 transactions]

Confidence: Support/freq(X) = 0.25 [ 0.17 Support divided by 0.67]

Lift : Confidence/freq (Y) = 0.50 [0.25 Confidence divided by freq (Y) , 050]

X * Z scenario:

Support (X=Z)/N = 0.50 [3 transactions that contain X and Z out of 6 transactions]

Confidence: Support/freq(X): 0.75 [0.50 Support divided by freq(X), 0.67]

Lift: Confidence/freq(Z): 1.13 [0.75 Confidence divided by freq(Z), 0.67]

Now, when you compare Lift between X * Y and X * Z scenarios, the latter has a 2.3X better Lift metric and therefore, would be a much better combination to show.

That’s pretty much it. The apriori algorithm runs this at scale across all the product combinations to maximize the Lift metric based on your chosen level of minimum Support, Confidence.