How To Switch To Linear Regression In a Ggplot Geom_Smooth - R Programming

How To Switch To Linear Regression In a Ggplot Geom_Smooth - R Programming

So, a while back, I had done a post on showing the relationship between Bounce rate and Avg. Page load time via a Ggplot geomsmooth + facet_wrap. The plot came out like this:

geom_smooth with geom_point in ggplot2.JPG

I had to reuse this type of data and thought if it can be improved. Sometimes, showing less can be better.

  • In above viz, the Load Time = 0 and Bounce Rate = 100 make it a bit harder to read.

  • The curve can be somewhat confusing

  • The grey shaded area can add to some of it

Overall, there’s a lot happening with the viz. Let’s see if we can try to make it better.

First, let’s remove the extreme values where page load time > 10 and remove the ones where load time = 0

AND the ones where Bounce rate was 0 or 10. This is done by using the SUBSET function to take the existing full dataset but only pick the values where it meets below 4 conditions.

ggplot(subset(gaData, 
avgPageLoadTime > 0 & 
  avgPageLoadTime <10 & 
  bounceRate >0 & 
  bounceRate <100))

We are now creating a ggplot geom_point where I want to show the Avg. Page Load Time on X axis,

the Bounce Rate on Y axis and split it by Device Category via facet_wrap. On top of this, add a geom_smooth to show the trend. Show the line in red, fix the X axis scale at max 10 across all three device categories.

Mainly, use method=”lm” to show linear modelling instead of LOESS and SE=FALSE to hide the grey area. Solution via StackOverflow.

geom_smooth(mapping=
   aes(x=avgPageLoadTime, 
   y=bounceRate), 
  colour="red",se=FALSE,method="lm")+
  geom_point(mapping=
             aes(x=avgPageLoadTime, 
 y=bounceRate),
 colour="blue",alpha=0.5) + 
 facet_wrap (~deviceCategory,
             nrow=1,scales="fixed")

The grey area [SE=TRUE] would be a zone that covers 95% confidence level [that the values will be within that area]. https://ggplot2.tidyverse.org/reference/geom_smooth.html [Level=95% by default, can be increased to 99%]. Switching it to SE=FALSE hides it.

Once you combine the above code, you’ll get a below type output for your GA data. Quite different from LOESS method. Showing a linear regression line would sometimes be better. Will explore more details about LOESS method and comparison with Linear Regression in a separate post.

Linear equation in geom_smooth instead of LOESS.JPG

Full script below.

#install and load packages
install.packages("googleAuthR")
install.packages("googleAnalyticsR")
library(googleAuthR)
library(googleAnalyticsR)
library(ggplot2)
library(googleAuthR)

ga_auth()
ga_account_list()
#Your GA view ID goes here. Change from 1234567
viewId <- 1234567 

#2018-2020 dataset
gaData <- google_analytics(viewId = viewId,
date_range = c("2018-01-01","2020-05-10"),
metrics = c("bounceRate","avgPageLoadTime"),
dimensions = c("date","deviceCategory"),
anti_sample = TRUE)

head(gaData)
summary(gaData)
view(gaData)
#create subset of Gadata where load time < 10 + br > 0
#linear equation instead of exponential in modelling
#https://stackoverflow.com/questions/15633714/adding-a-regression-line-on-a-ggplot
#se = false removes 95% proability of area coverage
#subset the data to show the ones that meet
#below 4 conditions
ggplot(subset(gaData, 
avgPageLoadTime > 0 & 
  avgPageLoadTime <10 & 
  bounceRate >0 & 
  bounceRate <100)) + 
  geom_smooth(mapping=aes(x=avgPageLoadTime, 
                          y=bounceRate), 
  colour="red",se=FALSE,method="lm")+
  geom_point(mapping=aes(x=avgPageLoadTime, 
 y=bounceRate),
 colour="blue",alpha=0.5) + 
 facet_wrap (~deviceCategory,
             nrow=1,scales="fixed")
SEO: Web.Dev Vitals - Important User Experience Metrics From Chrome Team

SEO: Web.Dev Vitals - Important User Experience Metrics From Chrome Team

Create a Ggplot Geom_bar With Position = Dodge AND As Percentage of Type on X Axis In R Studio

Create a Ggplot Geom_bar With Position = Dodge AND As Percentage of Type on X Axis In R Studio