Analyze Bounce Rate, Page Load Time By Devices In R Studio
You could pick any two or more variables from your Google Analytics data and see how they behave. For this blog post, I chose to cover page load time and impact on bounce rate, pivoted by device categories.
Will walk you through the main snippets of the code before pasting it all in the end.
Basic steps before accessing the data:
Loading the GoogleAnalyticsR library [or installing it first]
Authenticating as a new user and getting the API access token
Checking the list of accounts to find the Google Analytics View
Assigning the View ID to a variable called ….viewID
Self explanatory but here goes: Have pulled in data from 01 Jan 2018 - 15 Jul 2019 with bounceRate, avgPageLoadTime metrics and date, deviceCategory as dimensions.
Before you go on and create graphs, check the summary:
What particularly stands out is that the median avgPageLoadTime was 6.966 [meaning 50% of the values are below it and 50% are above it] but the range is wide….Min = 0, max = 487 [some issue, I guess]. When you try to factor this in a chart, your axis could end up being very long and most of the points concentrated in a certain range.
Also, since we’ll be charting by device category, it’s good to get a sense of things by device category.
You can start by running a geom_bar ggplot to see the distribution of counts by device category. Seems like most of the data is going to be related to desktop.
If you switch the query to a geomboxplot, you can see the range in the data. Click here If you want a more detailed example of geom_bar in r studio or search for geom_box in the search box.
So, the box represents 50% of the values that fall with the line in between being the median bounce rate while the whisker on top being the max value. As we don’t have too many observations related to tablet, the median bounce rate is close to 100%.
Let’s create a geom_point [scatter plot] to show the bounce rates + load times by device categories
If you check the x axis in the above geom_point, you’ll notice that the scale goes up to max 500 [since we saw in the summary(gaData) command that the range is really wide. As a result, most of the values are very close to each other and in the first interval itself, 0-100 [makes sense], but you can’t tell much what’s going on here.
Let’s give it another shot by making the scales “free” or dependent on the device category that’s been shown in the facet. Can see if it helps.
If you look at the x axis scale above, it’s now changed depending on the device category. Slightly better but we still have a lot of that desktop data between 0 - 50 sec load time.
Let’s try and subset the data to see if we can improve this further.
We can now create a geom_point using the subset of data.
If you now want to add exponential smoothing to the scatter plot, you can add geom_smooth and then layer it with the scatter plots. The alpha in the code below shows which areas are dense.
Ok, so there you have it, a few ways of showing the relationship on how bounce rate is related to site load time.
Note: If you do know how to reduce the x axis to a custom limit, within the facet_wrap, please do let me know via comments or as a reply in the below Tweet.
Thanks!