How To Switch To Linear Regression In a Ggplot Geom_Smooth - R Programming
So, a while back, I had done a post on showing the relationship between Bounce rate and Avg. Page load time via a Ggplot geomsmooth + facet_wrap. The plot came out like this:
I had to reuse this type of data and thought if it can be improved. Sometimes, showing less can be better.
In above viz, the Load Time = 0 and Bounce Rate = 100 make it a bit harder to read.
The curve can be somewhat confusing
The grey shaded area can add to some of it
Overall, there’s a lot happening with the viz. Let’s see if we can try to make it better.
First, let’s remove the extreme values where page load time > 10 and remove the ones where load time = 0
AND the ones where Bounce rate was 0 or 10. This is done by using the SUBSET function to take the existing full dataset but only pick the values where it meets below 4 conditions.
We are now creating a ggplot geom_point where I want to show the Avg. Page Load Time on X axis,
the Bounce Rate on Y axis and split it by Device Category via facet_wrap. On top of this, add a geom_smooth to show the trend. Show the line in red, fix the X axis scale at max 10 across all three device categories.
Mainly, use method=”lm” to show linear modelling instead of LOESS and SE=FALSE to hide the grey area. Solution via StackOverflow.
The grey area [SE=TRUE] would be a zone that covers 95% confidence level [that the values will be within that area]. https://ggplot2.tidyverse.org/reference/geom_smooth.html [Level=95% by default, can be increased to 99%]. Switching it to SE=FALSE hides it.
Once you combine the above code, you’ll get a below type output for your GA data. Quite different from LOESS method. Showing a linear regression line would sometimes be better. Will explore more details about LOESS method and comparison with Linear Regression in a separate post.
Full script below.