Exploratory Data Analysis Of Google Analytics Data In R Studio

Exploratory Data Analysis Of Google Analytics Data In R Studio

Came across this nice e-book that addresses Google Analytics data in R Studio. This blog post uses/modifies the code from: https://michalbrys.gitbooks.io/r-google-analytics/content/chapter4/exploratory_data_analysis.html

The idea of running this code is to provide the summary metrics about the underlying dataset before doing a drill-down.

In this R script, we’re using the 2018 YTD data with dimensions = date, metric = sessions.

If you plot the data, you can immediately see some variability in the daily

You can now run the min/max queries to find out the range.

Calculating minimum website sessions in R.JPG
Calculating maximum website sessions in R.JPG

Let’s say we wanted to find out dates where sessions = 0. In the subset function, we’re running the parameter, sessions = 0.

#days with 0 sessions
subset(gadata, gadata$sessions==0)
days with 0 sessions website data.JPG

You can now run a count on this data by running a conditional NRow function. This only counts rows where sessions = 0.

#conditional row count only 
#where sessions were 0 for that particular date
nrow(subset(gadata, gadata$sessions==0))

If you want to quickly know the summary stats for this data, you can run the summary function to provide this.

By running this, you now the min sessions = 0 was on 1st Jan (surprise, surprise). Max was on 13th Sep [shows that the traffic is increasing]. Mean/Median values are comparable with 7.2 and 7 while 50% of the website’s traffic values fall within 3 and 11 sessions [2nd Quartile - 3rd Quartile data]

Full code below:

library("googleAuthR")
library("googleAnalyticsR")

ga_auth()
ga_account_list()


ga_id <- 1234567

gadata <- google_analytics(viewId = 1234567, 
         date_range = c("2018-01-01","2018-09-15"),
         metrics = c("sessions"),
         dimensions = c("date"),
                           anti_sample = TRUE)

#code modified from
#https://github.com/michalbrys/R-Google-Analytics/blob/master/2_eda.R
gadata
#min and max sessions received 
min(gadata$sessions)
max(gadata$sessions)

#days with 0 sessions
subset(gadata, gadata$sessions==0)
#conditional row count only where sessions 
#were 0 for that particular date
nrow(subset(gadata, gadata$sessions==0))

  
#count days with 0 sessions
nrow(subset(gadata, gadata$sessions==0))

#summary data
summary(gadata)

#when did max traffic hit the site?
subset(gadata, gadata$sessions==24)

#combining max number of sessions with
#the date it was received
subset(gadata, gadata$sessions==max(gadata$sessions))

mean(gadata$sessions)
median(gadata$sessions)
sd(gadata$sessions)

summary(gadata)
#type parameter shows what to show values in graph,
#points, lines, 
#source: https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/plot.html
plot(gadata$date, gadata$sessions, type="l")

#number of days where traffic was greater than mean 
subset(gadata, gadata$sessions > mean(gadata$sessions))
Facebook Ads Brings Back Dynamic UTM Tagging...With A Few Constraints

Facebook Ads Brings Back Dynamic UTM Tagging...With A Few Constraints

How To Find Out Your Entire Website's PageSpeed Insights Score Using Screaming Frog Crawler

How To Find Out Your Entire Website's PageSpeed Insights Score Using Screaming Frog Crawler