How To Create a Sankey Diagram With Google Analytics Data In R Studio
Let’s say you want to rebuild the behavior flow report from Google Analytics in R Studio. This is only a basic tutorial and will add enhancements to this such as direct Google Analytics API data pull, multi-level Sankey etc [will cover in a separate post].
What we need for this:
The basic script is from a package: plotly
https://plot.ly/r/sankey-diagram/
Google Analytics Data on landing page by channel. Example:
Here, the channel nodes have been marked in rows [Nodes 0-4] while landing pages are in columns [nodes 5-8]. Even though we know this, this is implied in the R code. First element of the node array is given a position of 0.
Full code:
node = list( label = c("Organic Search", "Direct", "Referral", "Social", "Paid Search", "Homepage","Products","Services","Contact"),
Once you know the node ID…you then just need to connect the source [Channels] to target [Landing pages] and assign values.
link = list( #All channels become the sources...so, nodes 0-4 source = c(0, 0,0,0,0, 1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3, 4,4,4,4,4), #Landing pages become the target...so, nodes 5-8 target = c(5,6,7,8, 5,6,7,8, 5,6,7,8, 5,6,7,8, 5,6,7,8), #Assigning values between nodes... #example, Node 0 to 4...Organic Search to Homepage value = 400 value = c(400,40,15,2, 100,120,50,30, 75,12,12,5, 124,11,11,4, 120,12,15,0, )
In Source, the first row covers Organic Search or node 0…while in Target, the first row represents the different landing pages that are getting traffic from Organic Search. This is then linked to the first row in Values. [400, 40, 15, 2]…meaning Organic Search had 400 sessions starting from the Homepage, 40 Organic Search sessions from Products, 15 Organic Search sessions from Services and 2 Organic Search sessions from Contact. This is then repeated for other channels.
Once the code executes, you can then publish this data to a web page and then sharing this with others. Example, I created this one using the below code:
http://rpubs.com/madilkhan/sankey-diagram-channel-landing-page-data
Will create a separate blog post on how to directly pull GA data from the API and convert it to a Sankey [after I learn] and also, multi-level Sankey diagram [Level 1 - Landing page, Level 2 - Next page path]
Full code below:
install.packages("plotly") library(plotly) #create a basic sankey p <- plot_ly( type = "sankey", orientation = "h", #each element is a node here...Orgainc search is node 0, #Direct is node 1, #Contact is node 8 node = list( label = c("Organic Search", "Direct", "Referral", "Social", "Paid Search", "Homepage","Products","Services","Contact"), #assign colours to each channel color = c("green","blue","yellow","pink","purple"), pad = 15, thickness = 20, line = list( color = "black", width = 0.5 ) ), link = list( #All channels become the sources...so, nodes 0-4 source = c(0, 0,0,0,0, 1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3, 4,4,4,4,4), #Landing pages become the target...so, nodes 5-8 target = c(5,6,7,8, 5,6,7,8, 5,6,7,8, 5,6,7,8, 5,6,7,8), #Assigning values between nodes... #example, Node 0 to 4...Organic Search to Homepage value = 400 value = c(400,40,15,2, 100,120,50,30, 75,12,12,5, 124,11,11,4, 120,12,15,0, ) ) ) %>% layout( title = "Website Top Landing Pages by Channel", font = list( size = 10 ) ) p # Create a shareable link to your chart # Click on publish in Viewer tab and setup a RPub account #example: http://rpubs.com/madilkhan/sankey-diagram-channel-landing-page-data