Analytics Log - Adil Khan

View Original

How To Create a Sankey Diagram With Google Analytics Data In R Studio

Let’s say you want to rebuild the behavior flow report from Google Analytics in R Studio. This is only a basic tutorial and will add enhancements to this such as direct Google Analytics API data pull, multi-level Sankey etc [will cover in a separate post].

What we need for this:

The basic script is from a package: plotly

https://plot.ly/r/sankey-diagram/

Google Analytics Data on landing page by channel. Example:

Here, the channel nodes have been marked in rows [Nodes 0-4] while landing pages are in columns [nodes 5-8]. Even though we know this, this is implied in the R code. First element of the node array is given a position of 0.


Full code:

See this content in the original post

Once you know the node ID…you then just need to connect the source [Channels] to target [Landing pages] and assign values.

See this content in the original post

In Source, the first row covers Organic Search or node 0…while in Target, the first row represents the different landing pages that are getting traffic from Organic Search. This is then linked to the first row in Values. [400, 40, 15, 2]…meaning Organic Search had 400 sessions starting from the Homepage, 40 Organic Search sessions from Products, 15 Organic Search sessions from Services and 2 Organic Search sessions from Contact. This is then repeated for other channels.

Once the code executes, you can then publish this data to a web page and then sharing this with others. Example, I created this one using the below code:

http://rpubs.com/madilkhan/sankey-diagram-channel-landing-page-data

Will create a separate blog post on how to directly pull GA data from the API and convert it to a Sankey [after I learn] and also, multi-level Sankey diagram [Level 1 - Landing page, Level 2 - Next page path]


Full code below:

See this content in the original post