How To Create Google Analytics Segments [Conditional Data Pulls] In R Studio
Alright! Another day, another R related question.
In the previous examples that I looked at using GA + R, the conditional data pull was based on dates only. Link to previous posts on this.
http://analyticslog.com/blog/how-to-find-outliers-in-boxplots-via-r-programming
http://analyticslog.com/blog/how-to-analyze-day-of-week-google-analytics-data-in-r-studio
So, what if we wanted to create a conditional data pull i.e. apply segments while pulling the info in R Studio. How would we go about it?
Had just begun searching for this topic when the good people at Digital Analytics Power Hour podcast provided the link on Twitter: http://www.dartistics.com/googleanalytics/simple-dynamic-segment.html
Creating dynamic Google Analytics segments in R Studio
This blog post will heavily use [steal] the code from Dartistics but will I try and apply a different condition to my segment definition as a test, see what pops up and mainly, try to go through the steps and see if I understand anything from the code.
Main bits from the code relating to the query [Full code at bottom of post]So, here's the bit that defines the segment_element
my_segment_element <- segment_element("landingPagePath", operator = "REGEXP", type = "DIMENSION", expressions = "GTM|Facebook")
This uses a GA reporting V4 syntax to define the segment. In this case, we are looking at the landingPagePath [landing page] dimension, using a matches RegEx where the landingPagePath matches GTM OR Facebook. Straightforward and as close what you'd do in GA interface.
There's 3 parts to fully defining the segment.
Executing ?segment_element brings up this in the Help tab.
segment_element is the lowest hierarchy of segment creation, for which you will also need:
segment_define : AND combination of segmentFilters
segment_vector_simple or segment_vector_sequence
segment_element that are combined in OR lists for segment_vectors_*
So, segment element is within segment_vector_simple which is within segment_define [which is used while defining the data to pull]
Next up, ?segment_vector_simple
Usage
segment_vector_simple(segment_elements)
Arguments
segment_elements
A list of OR lists of segment_element
ok...so it's just wrapping segment_element inside the vector. Reading the comments from Dartistics code now.
# Create a segment vector that has just one element. See ?segment_vector_simple() for details. Note # that the element is wrapped in a list(). This is how you would include multiple elements in the # definition.
...
So, segment_vector_simple() lets you add 1 or more segment elements in the definition....in this case, it's just one element...
Defines the segment to be a set of SegmentFilters which are combined together with a logical AND operation. segment_define is in the hierarchy of segment creation that also includes segment_vector_simple() and segment_element()
Usage
segment_define(segment_filters, not_vector = NULL)
...
Honestly, I didn't get why the code has list in it...
my_segment_definition <- segment_define(list(my_segment_vector))
Moving on...
my_segment <- segment_ga4("Landing Page matches RegEx FB|GTM", session_segment = my_segment_definition)
Ok, so this bit of the code is creating a variable called my_segment where the name of the segment in this case is "Landing page matches Regex GTM|FB" and the details of the segment are in the variable my_segment_definition.
# <whew>!!!
[line 50 of code....yes!!]
ga_data <- google_analytics(viewId = 12345678, date_range = c(start_date, end_date), metrics = "pageviews", dimensions = "landingPagePath", segments = my_segment)
pulls the dimensions and metrics only where segment parameters are met.
head(ga_data) looks [almost] fine
Now, let's try plotting the data.
Tried running this code but got an error
barchart <- ggplot(ga_data, aes(x=landingPagePath, y=pageviews)) + geom_bar() barchart Error: stat_count() must not be used with a y aesthetic.
What does that mean?
?geom_bar()
There are two types of bar charts: geom_bar makes the height of the bar proportional to the number of cases in each group (or if the weight aethetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use geom_col instead. geom_bar uses stat_count by default
So, geom_bar uses count...which is what we don't want.
Found a solution on Stackoverflow tackling this.
barchart <- ggplot(ga_data, aes(x=landingPagePath, y=pageviews)) + geom_bar(stat="identity") barchart
This one executes!
Went back and re-read geom_bar() help and found out about geom_col(). From the geom_box help: If you want the heights of the bars to represent values in the data, use geom_col instead.
barchart <- ggplot(ga_data, aes(x=landingPagePath, y=pageviews)) + geom_col() barchart
This led to another search query. How to tilt x axis labels in ggplot2...Again, StackOverflow
https://stackoverflow.com/questions/1330989/rotating-and-spacing-axis-labels-in-ggplot2
barchart + theme(axis.text.x = element_text(angle = 25, hjust = 1))
X axis title rotated at 25 degrees.
Ok, so we can see that two posts stand out in the last 30 days...GTM sending undefined value to GA and Difference in eng. rate between Ad Manager/FBInsights
Notice that the x axis label is cropped for the first label. Will try and search around for a solution for this and post in a separate blog post.
Full code below with the main chunk from Dartistics + my changes to segment_element and playing around with geom_bar / geom_col
# Load the necessary libraries. These libraries aren't all necessarily required for every # example, but, for simplicity's sake, we're going ahead and including them in every example. # The "typical" way to load these is simply with "library([package name])." But, the handy # thing about using the approach below -- which uses the pacman package -- is that it will # check that each package exists and actually install any that are missing before loading # the package. if (!require("pacman")) install.packages("pacman") pacman::p_load(googleAnalyticsR, # How we actually get the Google Analytics data tidyverse, # Includes dplyr, ggplot2, and others; very key! devtools, # Generally handy googleVis, # Useful for some of the visualizations scales) # Useful for some number formatting in the visualizations # Authorize GA. Depending on if you've done this already and a .ga-httr-oauth file has # been saved or not, this may pop you over to a browser to authenticate. ga_auth(token = ".ga-httr-oauth") # Set the view ID and the date range. If you want to, you can swap out the Sys.getenv() # call and just replace that with a hardcoded value for the view ID. And, the start # and end date are currently set to choose the last 30 days, but those can be # hardcoded as well. view_id <- Sys.getenv(12345678) start_date <- Sys.Date() - 30 # 30 days back from yesterday end_date <- Sys.Date() - 1 # Yesterday ?segment_element() # Create a segment element object. See ?segment_element() for details. ?segment_element my_segment_element <- segment_element("landingPagePath", operator = "REGEXP", type = "DIMENSION", expressions = "GTM|Facebook") # Create a segment vector that has just one element. See ?segment_vector_simple() for details. Note # that the element is wrapped in a list(). This is how you would include multiple elements in the # definition. ?segment_vector_simple my_segment_vector <- segment_vector_simple(list(list(my_segment_element))) # Define the segment with just the one segment vector in it. See ?segment_define() for details. ?segment_define my_segment_definition <- segment_define(list(my_segment_vector)) # Create the actual segment object that we're going to use in the query. See ?segment_ga4() # for details. ?segment_ga4 my_segment <- segment_ga4("Landing Page matches RegEx FB", session_segment = my_segment_definition) # <whew>!!! # Pull the data. See ?google_analytics_4() for additional parameters. Depending on what # you're expecting back, you probably would want to use an "order" argument to get the # results in descending order. But, we're keeping this example simple. Note, though, that # we're still wrapping my_segment in a list() (of one element). ga_data <- google_analytics(viewId = 41377551, date_range = c(start_date, end_date), metrics = "pageviews", dimensions = "landingPagePath", segments = my_segment) # Go ahead and do a quick inspection of the data that was returned. This isn't required, # but it's a good check along the way. head(ga_data) #First attempt at barchart but get error stat_count() must not be used with y aesthetic barchart <- ggplot(ga_data, aes(x=landingPagePath, y=pageviews)) + geom_bar() barchart ?geom_bar() #Second attempt at barchart but x axis labels overlap completely barchart <- ggplot(ga_data, aes(x=landingPagePath, y=pageviews)) + geom_bar(stat="identity") barchart #Third attempt at barchart, rotating the x axis label barchart + theme(axis.text.x = element_text(angle = 45, hjust = 1)) #Another way of bypassing the stat_count() issue ?geom_bar() ?geom_col() barchart <- ggplot(ga_data, aes(x=landingPagePath, y=pageviews)) + geom_col() barchart barchart + theme(axis.text.x = element_text(angle = 25, hjust = 1))