Analytics Log - Adil Khan

View Original

How To Switch To Linear Regression In a Ggplot Geom_Smooth - R Programming

So, a while back, I had done a post on showing the relationship between Bounce rate and Avg. Page load time via a Ggplot geomsmooth + facet_wrap. The plot came out like this:

I had to reuse this type of data and thought if it can be improved. Sometimes, showing less can be better.

  • In above viz, the Load Time = 0 and Bounce Rate = 100 make it a bit harder to read.

  • The curve can be somewhat confusing

  • The grey shaded area can add to some of it

Overall, there’s a lot happening with the viz. Let’s see if we can try to make it better.

First, let’s remove the extreme values where page load time > 10 and remove the ones where load time = 0

AND the ones where Bounce rate was 0 or 10. This is done by using the SUBSET function to take the existing full dataset but only pick the values where it meets below 4 conditions.

See this content in the original post

We are now creating a ggplot geom_point where I want to show the Avg. Page Load Time on X axis,

the Bounce Rate on Y axis and split it by Device Category via facet_wrap. On top of this, add a geom_smooth to show the trend. Show the line in red, fix the X axis scale at max 10 across all three device categories.

Mainly, use method=”lm” to show linear modelling instead of LOESS and SE=FALSE to hide the grey area. Solution via StackOverflow.

See this content in the original post

The grey area [SE=TRUE] would be a zone that covers 95% confidence level [that the values will be within that area]. https://ggplot2.tidyverse.org/reference/geom_smooth.html [Level=95% by default, can be increased to 99%]. Switching it to SE=FALSE hides it.

Once you combine the above code, you’ll get a below type output for your GA data. Quite different from LOESS method. Showing a linear regression line would sometimes be better. Will explore more details about LOESS method and comparison with Linear Regression in a separate post.

Full script below.

See this content in the original post