E-commerce Product Recommendations: Apriori algorithm - Market Basket Analysis for Product Recommendations Via Orange Platform

E-commerce Product Recommendations: Apriori algorithm - Market Basket Analysis for Product Recommendations Via Orange Platform

I stumbled upon Orange after somebody mentioned it on Reddit and decided to give it a chance as it is GUI based [even though most of the work happens in Python at the backend and the script cannot be exported as a Python script].

The Apriori algorithm is available in Python and R [I’ll do a separate post on R]. This post is around accessing Orange for the first time and how to make sense of the recommendations that are provided.

In trying to understand the apriori algorithm, I found this link very helpful: https://infocenter.informationbuilders.com/wf80/index.jsp?topic=%2Fpubdocs%2FRStat16%2Fsource%2Ftopic49.htm

Manuel Curral’s YT video doing a quick walk-through of Association rules in Orange was also immensely helpful: https://www.youtube.com/watch?v=k--7n2e47Eo&t=237s

You need to download the Orange terminal from here: https://orange.biolab.si/

Once inside Orange, head over to Options > Add-ons > Associate > Install. This would add the apriori library to your Orange.

You should now see Associate in your left pane navigation.

Setup:

Apriori algorithm Market Basket Analysis Orange Platform.JPG


Once in, you should be able to drag and drop file. Over here, we’ll use a sample dataset via a Google sheet link. https://docs.google.com/spreadsheets/d/1Qfd5RI6Xju_765_Whq4MY3o62Vss0_0x2JioCXDp-44/edit#gid=0

Each row represents an individual transaction. The 1’s signify that the specific item was purchased in that particular transaction, ? signifies otherwise.

Each row represents an individual transaction. The 1’s signify that the specific item was purchased in that particular transaction, ? signifies otherwise.

Data info orange platform.JPG
Changing data type to categorical.JPG

Note: Check your data info to be sure the columns have been captured as categorical. In case they’re not, go back to the data import and change the field type from numerical to categorical. This is required to run the analysis.

You can then use the Data Info to get more info on the dataset. Along with this , you can also open the Data table tab to get more details on the object.

Before we view the association rules, let’s understand the 3 main concepts of Support, Confidence and Lift in our test transaction data. As mentioned earlier, this link helped me a lot in understanding this. https://infocenter.informationbuilders.com/wf80/index.jsp?topic=%2Fpubdocs%2FRStat16%2Fsource%2Ftopic49.htm

If total transactions are N and you’re trying to understand the association between item X and Y, then:

Support: freq(X+Y)/N >> What percentage of total transactions contain item X and Y

Confidence: Support/freq(X) >> For transactions that contain X, what % of them contain X + Y

Lift: Confidence/freq(Y) >> When Confidence is known, how much better it is compared to just Y

In the above test transaction data, this would translate into:

N = 4 [Total transactions]

freq(X) = 0.75 >> item X was purchased in 3 transactions out of total 4

freq(Y) = 0.50 >> item Y was purchased in 2 transactions out of total 4

X+Y = 2 >> 2 transactions where X AND Y were purchased

Therefore, Support = 0.5 >> (X+Y)/N >> 2 / 4

Confidence = 0.667 >> Support/freq(X) >> 0.50 / 0.75

Lift = 1.333 >> Confidence / freq(Y) >> 0.667/0.50

Antecedent here is item-x , LHS while item-y is the Consequent, RHS.

Antecedent here is item-x , LHS while item-y is the Consequent, RHS.

The calculations are in the second sheet here as well: https://docs.google.com/spreadsheets/d/1Qfd5RI6Xju_765_Whq4MY3o62Vss0_0x2JioCXDp-44/edit#gid=1106419383

In Orange, you can use the sliders to set higher minimum Support and Confidence. You obviously need to careful with this because the 3 metrics move together. If your minimum Support or Confidence levels are too high, you might not see a recommendation.

Association rules Orange platform.JPG

I would recommend leaving the checkbox next to Find Rules as unchecked because it starts finding rules while you’re adjusting the slider(s) on min. support and confidence. Leave it unchecked, decide on your filters and then click on Find Rules.

You can also use the settings to decide on the number of items that you want to see in Antecedent and Consequent. So, if you want a minimum of 3 items purchased to show the best 4th item, you’d have min 3 in Antecedent and 1 in the Consequent.

This was my first experience with Orange and it was pleasantly easy to use. Will be doing other posts on this platform in the future.

How To Remove Extra Trailing Slashes From URL's in Google Tag Manager

How To Remove Extra Trailing Slashes From URL's in Google Tag Manager

How To Improve Load Time For Pages With Embedded Videos By Using Screaming Frog

How To Improve Load Time For Pages With Embedded Videos By Using Screaming Frog