How To Create Google Analytics Segments [Conditional Data Pulls] In R Studio

How To Create Google Analytics Segments [Conditional Data Pulls] In R Studio

Alright! Another day, another R related question. 

In the previous examples that I looked at using GA + R, the conditional data pull was based on dates only. Link to previous posts on this.

http://analyticslog.com/blog/how-to-find-outliers-in-boxplots-via-r-programming

http://analyticslog.com/blog/how-to-analyze-day-of-week-google-analytics-data-in-r-studio

So, what if we wanted to create a conditional data pull i.e. apply segments while pulling the info in R Studio. How would we go about it? 

Had just begun searching for this topic when the good people at Digital Analytics Power Hour podcast provided the link on Twitter: http://www.dartistics.com/googleanalytics/simple-dynamic-segment.html 

Creating dynamic Google Analytics segments in R Studio

This blog post will heavily use [steal] the code from Dartistics but will I try and apply a different condition to my segment definition as a test, see what pops up and mainly, try to go through the steps and see if I understand anything from the code.

Main bits from the code relating to the query [Full code at bottom of post]So, here's the bit that defines the segment_element

my_segment_element <- segment_element("landingPagePath", 
                                      operator = "REGEXP",
                                      type = "DIMENSION",
                                      expressions = "GTM|Facebook")


 

This uses a GA reporting V4 syntax to define the segment. In this case, we are looking at the landingPagePath [landing page] dimension, using a matches RegEx where the landingPagePath matches GTM OR Facebook. Straightforward and as close what you'd do in GA interface.

There's 3 parts to fully defining the segment.

Executing ?segment_element brings up this in the Help tab.

segment_element is the lowest hierarchy of segment creation, for which you will also need:

segment_define : AND combination of segmentFilters

segment_vector_simple or segment_vector_sequence

segment_element that are combined in OR lists for segment_vectors_*

So, segment element is within segment_vector_simple which is within segment_define [which is used while defining the data to pull]

Next up, ?segment_vector_simple

Usage

segment_vector_simple(segment_elements)
Arguments

segment_elements    
A list of OR lists of segment_element
 

ok...so it's just wrapping segment_element inside the vector. Reading the comments from Dartistics code now.

# Create a segment vector that has just one element. See ?segment_vector_simple() for details. Note # that the element is wrapped in a list(). This is how you would include multiple elements in the # definition.

...

So, segment_vector_simple() lets you add 1 or more segment elements in the definition....in this case, it's just one element...

Defines the segment to be a set of SegmentFilters which are combined together with a logical AND operation. segment_define is in the hierarchy of segment creation that also includes segment_vector_simple() and segment_element()

Usage

segment_define(segment_filters, not_vector = NULL)

...

Honestly, I didn't get why the code has list in it...

my_segment_definition <- segment_define(list(my_segment_vector))
 

Moving on...

my_segment <- segment_ga4("Landing Page matches RegEx FB|GTM",
                          session_segment = my_segment_definition)

 

Ok, so this bit of the code is creating a variable called my_segment where the name of the segment in this case is  "Landing page matches Regex GTM|FB" and the details of the segment are in the variable my_segment_definition.

# <whew>!!!
[line 50 of code....yes!!]

ga_data <- google_analytics(viewId = 12345678,
                            date_range = c(start_date, end_date),
                            metrics = "pageviews",
                            dimensions = "landingPagePath",
                            segments = my_segment)


pulls the dimensions and metrics only where segment parameters are met.

head(ga_data) looks [almost] fine

head ga_data in R Studio.JPG

 

Now, let's try plotting the data.

Tried running this code but got an error

barchart <- ggplot(ga_data, aes(x=landingPagePath, y=pageviews)) + geom_bar()
barchart                    
Error: stat_count() must not be used with a y aesthetic.


What does that mean?

?geom_bar()

There are two types of bar charts: geom_bar makes the height of the bar proportional to the number of cases in each group (or if the weight aethetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use geom_col instead. geom_bar uses stat_count by default

So, geom_bar uses count...which is what we don't want.

Found a solution on Stackoverflow tackling this.

https://stackoverflow.com/questions/39679057/r-ggplot2-stat-count-must-not-be-used-with-a-y-aesthetic-error-in-bar-graph

barchart <- ggplot(ga_data, aes(x=landingPagePath, 
                                y=pageviews)) + geom_bar(stat="identity")
barchart 

                   
 

This one executes!

Went back and re-read geom_bar() help and found out about geom_col(). From the geom_box help: If you want the heights of the bars to represent values in the data, use geom_col instead.

barchart <- ggplot(ga_data, 
                   aes(x=landingPagePath,
                       y=pageviews)) +
                            geom_col()
barchart
Notice the clean x axis labels!

Notice the clean x axis labels!

This led to another search query. How to tilt x axis labels in ggplot2...Again, StackOverflow

https://stackoverflow.com/questions/1330989/rotating-and-spacing-axis-labels-in-ggplot2

barchart + theme(axis.text.x = element_text(angle = 25, hjust = 1))

X axis title rotated at 25 degrees.

Rplot01.jpeg

Ok, so we can see that two posts stand out in the last 30 days...GTM sending undefined value to GA and Difference in eng. rate between Ad Manager/FBInsights

Notice that the x axis label is cropped for the first label. Will try and search around for a solution for this and post in a separate blog post.                 

 

Full code below with the main chunk from Dartistics + my changes to segment_element and playing around with geom_bar / geom_col

# Load the necessary libraries. These libraries aren't all necessarily required for every
# example, but, for simplicity's sake, we're going ahead and including them in every example.
# The "typical" way to load these is simply with "library([package name])." But, the handy
# thing about using the approach below -- which uses the pacman package -- is that it will
# check that each package exists and actually install any that are missing before loading
# the package.
if (!require("pacman")) install.packages("pacman")
pacman::p_load(googleAnalyticsR,  # How we actually get the Google Analytics data
               tidyverse,         # Includes dplyr, ggplot2, and others; very key!
               devtools,          # Generally handy
               googleVis,         # Useful for some of the visualizations
               scales)            # Useful for some number formatting in the visualizations

# Authorize GA. Depending on if you've done this already and a .ga-httr-oauth file has
# been saved or not, this may pop you over to a browser to authenticate.
ga_auth(token = ".ga-httr-oauth")

# Set the view ID and the date range. If you want to, you can swap out the Sys.getenv()
# call and just replace that with a hardcoded value for the view ID. And, the start 
# and end date are currently set to choose the last 30 days, but those can be 
# hardcoded as well.
view_id <- Sys.getenv(12345678)
start_date <- Sys.Date() - 30        # 30 days back from yesterday
end_date <- Sys.Date() - 1           # Yesterday
?segment_element()

# Create a segment element object. See ?segment_element() for details.
?segment_element
my_segment_element <- segment_element("landingPagePath", 
                                      operator = "REGEXP",
                                      type = "DIMENSION",
                                      expressions = "GTM|Facebook")

# Create a segment vector that has just one element. See ?segment_vector_simple() for details. Note
# that the element is wrapped in a list(). This is how you would include multiple elements in the
# definition.
?segment_vector_simple
my_segment_vector <- segment_vector_simple(list(list(my_segment_element)))

# Define the segment with just the one segment vector in it. See ?segment_define() for details.
?segment_define
my_segment_definition <- segment_define(list(my_segment_vector))

# Create the actual segment object that we're going to use in the query. See ?segment_ga4()
# for details.
?segment_ga4
my_segment <- segment_ga4("Landing Page matches RegEx FB",
                          session_segment = my_segment_definition)

# <whew>!!!

# Pull the data. See ?google_analytics_4() for additional parameters. Depending on what
# you're expecting back, you probably would want to use an "order" argument to get the
# results in descending order. But, we're keeping this example simple. Note, though, that
# we're still wrapping my_segment in a list() (of one element).
ga_data <- google_analytics(viewId = 41377551,
                            date_range = c(start_date, end_date),
                            metrics = "pageviews",
                            dimensions = "landingPagePath",
                            segments = my_segment)

# Go ahead and do a quick inspection of the data that was returned. This isn't required,
# but it's a good check along the way. 
head(ga_data)

#First attempt at barchart but get error stat_count() must not be used with y aesthetic
barchart <- ggplot(ga_data, aes(x=landingPagePath, y=pageviews)) + geom_bar()
barchart  
?geom_bar()
#Second attempt at barchart but x axis labels overlap completely                   
barchart <- ggplot(ga_data, aes(x=landingPagePath, y=pageviews)) + geom_bar(stat="identity")
barchart                    
#Third attempt at barchart, rotating the x axis label
barchart + theme(axis.text.x = element_text(angle = 45, hjust = 1))

#Another way of bypassing the stat_count() issue
?geom_bar()
?geom_col()
barchart <- ggplot(ga_data, aes(x=landingPagePath, y=pageviews)) + geom_col()
barchart
barchart + theme(axis.text.x = element_text(angle = 25, hjust = 1))                    

               

Improving on Data Visualization - Showing Change in Cost Per Engagement Based On Promotion Duration.

Improving on Data Visualization - Showing Change in Cost Per Engagement Based On Promotion Duration.

How To Find Outliers in Boxplots [via R Programming]

How To Find Outliers in Boxplots [via R Programming]