E-commerce Product Recommendations: Apriori algorithm - Market Basket Analysis for Product Recommendations Via Orange Platform
I stumbled upon Orange after somebody mentioned it on Reddit and decided to give it a chance as it is GUI based [even though most of the work happens in Python at the backend and the script cannot be exported as a Python script].
The Apriori algorithm is available in Python and R [I’ll do a separate post on R]. This post is around accessing Orange for the first time and how to make sense of the recommendations that are provided.
In trying to understand the apriori algorithm, I found this link very helpful: https://infocenter.informationbuilders.com/wf80/index.jsp?topic=%2Fpubdocs%2FRStat16%2Fsource%2Ftopic49.htm
Manuel Curral’s YT video doing a quick walk-through of Association rules in Orange was also immensely helpful: https://www.youtube.com/watch?v=k--7n2e47Eo&t=237s
You need to download the Orange terminal from here: https://orange.biolab.si/
Once inside Orange, head over to Options > Add-ons > Associate > Install. This would add the apriori library to your Orange.
You should now see Associate in your left pane navigation.
Setup:
Once in, you should be able to drag and drop file. Over here, we’ll use a sample dataset via a Google sheet link. https://docs.google.com/spreadsheets/d/1Qfd5RI6Xju_765_Whq4MY3o62Vss0_0x2JioCXDp-44/edit#gid=0
Note: Check your data info to be sure the columns have been captured as categorical. In case they’re not, go back to the data import and change the field type from numerical to categorical. This is required to run the analysis.
You can then use the Data Info to get more info on the dataset. Along with this , you can also open the Data table tab to get more details on the object.
Before we view the association rules, let’s understand the 3 main concepts of Support, Confidence and Lift in our test transaction data. As mentioned earlier, this link helped me a lot in understanding this. https://infocenter.informationbuilders.com/wf80/index.jsp?topic=%2Fpubdocs%2FRStat16%2Fsource%2Ftopic49.htm
If total transactions are N and you’re trying to understand the association between item X and Y, then:
Support: freq(X+Y)/N >> What percentage of total transactions contain item X and Y
Confidence: Support/freq(X) >> For transactions that contain X, what % of them contain X + Y
Lift: Confidence/freq(Y) >> When Confidence is known, how much better it is compared to just Y
In the above test transaction data, this would translate into:
N = 4 [Total transactions]
freq(X) = 0.75 >> item X was purchased in 3 transactions out of total 4
freq(Y) = 0.50 >> item Y was purchased in 2 transactions out of total 4
X+Y = 2 >> 2 transactions where X AND Y were purchased
Therefore, Support = 0.5 >> (X+Y)/N >> 2 / 4
Confidence = 0.667 >> Support/freq(X) >> 0.50 / 0.75
Lift = 1.333 >> Confidence / freq(Y) >> 0.667/0.50
The calculations are in the second sheet here as well: https://docs.google.com/spreadsheets/d/1Qfd5RI6Xju_765_Whq4MY3o62Vss0_0x2JioCXDp-44/edit#gid=1106419383
In Orange, you can use the sliders to set higher minimum Support and Confidence. You obviously need to careful with this because the 3 metrics move together. If your minimum Support or Confidence levels are too high, you might not see a recommendation.
I would recommend leaving the checkbox next to Find Rules as unchecked because it starts finding rules while you’re adjusting the slider(s) on min. support and confidence. Leave it unchecked, decide on your filters and then click on Find Rules.
You can also use the settings to decide on the number of items that you want to see in Antecedent and Consequent. So, if you want a minimum of 3 items purchased to show the best 4th item, you’d have min 3 in Antecedent and 1 in the Consequent.
This was my first experience with Orange and it was pleasantly easy to use. Will be doing other posts on this platform in the future.