Analyze Bounce Rate, Page Load Time By Devices In R Studio
You could pick any two or more variables from your Google Analytics data and see how they behave. For this blog post, I chose to cover page load time and impact on bounce rate, pivoted by device categories.
Will walk you through the main snippets of the code before pasting it all in the end.
Basic steps before accessing the data:
Loading the GoogleAnalyticsR library [or installing it first]
Authenticating as a new user and getting the API access token
Checking the list of accounts to find the Google Analytics View
Assigning the View ID to a variable called ….viewID
#Pull data from GA gaData <- google_analytics(viewId, date_range=c("2018-01-01","2019-07-15"), metrics=c("bounceRate","avgPageLoadTime"), dimensions = c("date","deviceCategory"))
Self explanatory but here goes: Have pulled in data from 01 Jan 2018 - 15 Jul 2019 with bounceRate, avgPageLoadTime metrics and date, deviceCategory as dimensions.
Before you go on and create graphs, check the summary:
summary(gaData)
What particularly stands out is that the median avgPageLoadTime was 6.966 [meaning 50% of the values are below it and 50% are above it] but the range is wide….Min = 0, max = 487 [some issue, I guess]. When you try to factor this in a chart, your axis could end up being very long and most of the points concentrated in a certain range.
Also, since we’ll be charting by device category, it’s good to get a sense of things by device category.
You can start by running a geom_bar ggplot to see the distribution of counts by device category. Seems like most of the data is going to be related to desktop.
#geom_bar to see the counts by devices ggplot(data=gaData) + geom_bar(mapping=aes(x=deviceCategory))
If you switch the query to a geomboxplot, you can see the range in the data. Click here If you want a more detailed example of geom_bar in r studio or search for geom_box in the search box.
#boxplot to see the range of data via boxplot ggplot(data=gaData) + geom_boxplot(mapping=aes(x=deviceCategory, y=bounceRate))
So, the box represents 50% of the values that fall with the line in between being the median bounce rate while the whisker on top being the max value. As we don’t have too many observations related to tablet, the median bounce rate is close to 100%.
Let’s create a geom_point [scatter plot] to show the bounce rates + load times by device categories
#1 row to show all different device categories #blue is manual aesthetics ggplot(data=gaData) + geom_point(mapping=aes(x=avgPageLoadTime, y=bounceRate), color="blue") + facet_wrap(~deviceCategory, nrow=1)
If you check the x axis in the above geom_point, you’ll notice that the scale goes up to max 500 [since we saw in the summary(gaData) command that the range is really wide. As a result, most of the values are very close to each other and in the first interval itself, 0-100 [makes sense], but you can’t tell much what’s going on here.
Let’s give it another shot by making the scales “free” or dependent on the device category that’s been shown in the facet. Can see if it helps.
#1 row to show all different bounceRates by device categories #blue is manual aesthetics #free scales applied to facet to have diff. scale per device ggplot(data=gaData) + geom_point(mapping=aes(x=avgPageLoadTime, y=bounceRate), color="blue") + facet_wrap(~deviceCategory, nrow=1, scales="free")
If you look at the x axis scale above, it’s now changed depending on the device category. Slightly better but we still have a lot of that desktop data between 0 - 50 sec load time.
Let’s try and subset the data to see if we can improve this further.
#see sumarry data where load time < 20 summary(subset(gaData,avgPageLoadTime <20))
We can now create a geom_point using the subset of data.
#Use subset in ggplot where avgPageLoadTime < 20 #ref: https://www.reddit.com/r/RStudio/comments/93qc0x/how_to_graph_a_subset_of_data_in_ggplot2/ ggplot(subset(gaData, avgPageLoadTime <20)) + geom_point(mapping=aes(x=avgPageLoadTime, y=bounceRate), color="blue") + facet_wrap(~deviceCategory, nrow=1,scales="fixed")
If you now want to add exponential smoothing to the scatter plot, you can add geom_smooth and then layer it with the scatter plots. The alpha in the code below shows which areas are dense.
#Use subset in ggplot geom_smooth + scatterplot in background #where avgPageLoadTime < 20 #ref: https://www.reddit.com/r/RStudio/comments/93qc0x/how_to_graph_a_subset_of_data_in_ggplot2/ ggplot(subset(gaData, avgPageLoadTime <20)) + geom_smooth(mapping=aes(x=avgPageLoadTime, y=bounceRate), color="blue") + geom_point(aes(x=avgPageLoadTime, y=bounceRate, alpha=0.5), colour="red") + facet_wrap(~deviceCategory, nrow=1, scales="fixed")
Ok, so there you have it, a few ways of showing the relationship on how bounce rate is related to site load time.
Note: If you do know how to reduce the x axis to a custom limit, within the facet_wrap, please do let me know via comments or as a reply in the below Tweet.
Thanks!
#Nothing of this script is original. It is a combined product of reading: #R for Data Science book by H. Wickham and G. Grolemund, #Mark Edmondson (GA R package creator), #Dartistics.com, #GA dev support page on R, GA dimensions explorer, #Searching for particular steps on StackOverflow, #Searching for particular steps on /r/rstudio &/r/rprogramming, #and then, applying the above and learning in the process #-- #load GA R Library library(googleAnalyticsR) #Authenticate GA R API as new user ga_auth(new_user=TRUE) #Check GA Views that you have access to ga_account_list() #GA View that you want to use, replace 12345678 with it viewId <- 41377551 #Help with GA API query ?google_analytics() #dimension names from #https://developers.google.com/analytics/devguides/reporting/core/dimsmets#q=loadtime&mode=web&cats=user,session,traffic_sources,adwords,goal_conversions,platform_or_device,geo_network,system,social_activities,page_tracking,content_grouping,internal_search,site_speed,app_tracking,event_tracking,ecommerce,social_interactions,user_timings,exceptions,content_experiments,custom_variables_or_columns,time,doubleclick_campaign_manager,audience,adsense,publisher,ad_exchange,doubleclick_for_publishers_backfill,doubleclick_for_publishers,lifetime_value_and_cohorts,channel_grouping,doubleclick_bid_manager,doubleclick_search #Pull data from GA gaData <- google_analytics(viewId, date_range=c("2018-01-01","2019-07-15"), metrics=c("bounceRate","avgPageLoadTime"), dimensions = c("date","deviceCategory")) #Get a sense of the raw data in GA pull gaData #check summary of data summary(gaData) #geom_bar to see the counts by devices ggplot(data=gaData) + geom_bar(mapping=aes(x=deviceCategory)) #boxplot to see the range of data via boxplot ggplot(data=gaData) + geom_boxplot(mapping=aes(x=deviceCategory, y=bounceRate)) #boxplot to see the range of data with deviceCategory as Facet wrap #Covered in Chap1 of "R for Data Science" by H. Wickham and G. Grotemund ggplot(data=gaData) + geom_boxplot(mapping=aes(x=avgPageLoadTime, y=bounceRate))+ facet_wrap(~deviceCategory, nrow=1) #First geom_point ggplot(data=gaData) + geom_point(mapping=aes(x=avgPageLoadTime, y=bounceRate), color="blue") #1 row to show all different device categories #blue is manual aesthetics ggplot(data=gaData) + geom_point(mapping=aes(x=avgPageLoadTime, y=bounceRate), color="blue") + facet_wrap(~deviceCategory, nrow=1) #1 row to show all different bounceRates by device categories #blue is manual aesthetics #free scales applied to facet to have diff. scale per device ggplot(data=gaData) + geom_point(mapping=aes(x=avgPageLoadTime, y=bounceRate), color="blue") + facet_wrap(~deviceCategory, nrow=1, scales="free") ?facet_wrap() #see sumarry data where load time < 20 summary(subset(gaData,avgPageLoadTime <20)) #Use subset in ggplot where avgPageLoadTime < 20 #ref: https://www.reddit.com/r/RStudio/comments/93qc0x/how_to_graph_a_subset_of_data_in_ggplot2/ ggplot(subset(gaData, avgPageLoadTime <20)) + geom_point(mapping=aes(x=avgPageLoadTime, y=bounceRate), color="blue") + facet_wrap(~deviceCategory, nrow=1,scales="fixed") #Use subset in ggplot geom_smooth where avgPageLoadTime < 20 #ref: https://www.reddit.com/r/RStudio/comments/93qc0x/how_to_graph_a_subset_of_data_in_ggplot2/ ggplot(subset(gaData, avgPageLoadTime <20)) + geom_smooth(mapping=aes(x=avgPageLoadTime, y=bounceRate), color="blue") + facet_wrap(~deviceCategory, nrow=1,scales="fixed") #Use subset in ggplot geom_smooth + scatterplot in background #where avgPageLoadTime < 20 #ref: https://www.reddit.com/r/RStudio/comments/93qc0x/how_to_graph_a_subset_of_data_in_ggplot2/ ggplot(subset(gaData, avgPageLoadTime <20)) + geom_smooth(mapping=aes(x=avgPageLoadTime, y=bounceRate), color="blue") + geom_point(aes(x=avgPageLoadTime, y=bounceRate, alpha=0.5), colour="red") + facet_wrap(~deviceCategory, nrow=1, scales="fixed")