Analytics Log - Adil Khan

View Original

How To Create Google Analytics Segments [Conditional Data Pulls] In R Studio

Alright! Another day, another R related question. 

In the previous examples that I looked at using GA + R, the conditional data pull was based on dates only. Link to previous posts on this.

http://analyticslog.com/blog/how-to-find-outliers-in-boxplots-via-r-programming

http://analyticslog.com/blog/how-to-analyze-day-of-week-google-analytics-data-in-r-studio

So, what if we wanted to create a conditional data pull i.e. apply segments while pulling the info in R Studio. How would we go about it? 

Had just begun searching for this topic when the good people at Digital Analytics Power Hour podcast provided the link on Twitter: http://www.dartistics.com/googleanalytics/simple-dynamic-segment.html 

Creating dynamic Google Analytics segments in R Studio

This blog post will heavily use [steal] the code from Dartistics but will I try and apply a different condition to my segment definition as a test, see what pops up and mainly, try to go through the steps and see if I understand anything from the code.

Main bits from the code relating to the query [Full code at bottom of post]So, here's the bit that defines the segment_element

See this content in the original post


 

This uses a GA reporting V4 syntax to define the segment. In this case, we are looking at the landingPagePath [landing page] dimension, using a matches RegEx where the landingPagePath matches GTM OR Facebook. Straightforward and as close what you'd do in GA interface.

There's 3 parts to fully defining the segment.

Executing ?segment_element brings up this in the Help tab.

segment_element is the lowest hierarchy of segment creation, for which you will also need:

segment_define : AND combination of segmentFilters

segment_vector_simple or segment_vector_sequence

segment_element that are combined in OR lists for segment_vectors_*

So, segment element is within segment_vector_simple which is within segment_define [which is used while defining the data to pull]

Next up, ?segment_vector_simple

Usage

segment_vector_simple(segment_elements)
Arguments

segment_elements    
A list of OR lists of segment_element
 

ok...so it's just wrapping segment_element inside the vector. Reading the comments from Dartistics code now.

# Create a segment vector that has just one element. See ?segment_vector_simple() for details. Note # that the element is wrapped in a list(). This is how you would include multiple elements in the # definition.

...

So, segment_vector_simple() lets you add 1 or more segment elements in the definition....in this case, it's just one element...

Defines the segment to be a set of SegmentFilters which are combined together with a logical AND operation. segment_define is in the hierarchy of segment creation that also includes segment_vector_simple() and segment_element()

Usage

segment_define(segment_filters, not_vector = NULL)

...

Honestly, I didn't get why the code has list in it...

my_segment_definition <- segment_define(list(my_segment_vector))
 

Moving on...

See this content in the original post

 

Ok, so this bit of the code is creating a variable called my_segment where the name of the segment in this case is  "Landing page matches Regex GTM|FB" and the details of the segment are in the variable my_segment_definition.

# <whew>!!!
[line 50 of code....yes!!]

See this content in the original post


pulls the dimensions and metrics only where segment parameters are met.

head(ga_data) looks [almost] fine

 

Now, let's try plotting the data.

Tried running this code but got an error

See this content in the original post


What does that mean?

?geom_bar()

There are two types of bar charts: geom_bar makes the height of the bar proportional to the number of cases in each group (or if the weight aethetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use geom_col instead. geom_bar uses stat_count by default

So, geom_bar uses count...which is what we don't want.

Found a solution on Stackoverflow tackling this.

https://stackoverflow.com/questions/39679057/r-ggplot2-stat-count-must-not-be-used-with-a-y-aesthetic-error-in-bar-graph

See this content in the original post

                   
 

This one executes!

Went back and re-read geom_bar() help and found out about geom_col(). From the geom_box help: If you want the heights of the bars to represent values in the data, use geom_col instead.

See this content in the original post

Notice the clean x axis labels!

This led to another search query. How to tilt x axis labels in ggplot2...Again, StackOverflow

https://stackoverflow.com/questions/1330989/rotating-and-spacing-axis-labels-in-ggplot2

See this content in the original post

X axis title rotated at 25 degrees.

Ok, so we can see that two posts stand out in the last 30 days...GTM sending undefined value to GA and Difference in eng. rate between Ad Manager/FBInsights

Notice that the x axis label is cropped for the first label. Will try and search around for a solution for this and post in a separate blog post.                 

 

Full code below with the main chunk from Dartistics + my changes to segment_element and playing around with geom_bar / geom_col

See this content in the original post