Sample SQL Queries For Google Analytics BigQuery Public Dataset
Got messing around with BigQuery and thought of doing this post around using GA data in BigQuery. The Google merchandise store data is available for access on BQ and some of these queries should you help you. It uses a bit of what I learned off Udemy and StackOverflow.
A lot of dimensions can be swapped here to suit your needs. I have used samples here to stay within my BQ limits. Let's start with examples right away:
How to query all the distinct browsers that were used during a certain period
Let's revise this query a bit.
How to query a distinct list of browser where browser = Chrome
Here, we use the LIKE operator
How to query all possible data against a particular visitId [Session ID]
How to query using wildcard operator along with LIKE
Here, we use geoNetwork.country LIKE 'United%' - all possible matches AFTER United are included in this match. This yields UAE, UK and USA.
How to query using wildcard operator before and after a word
In the below query, we'll use wildcard operator to find all distinct countries where the name contains Arab.
How to query all hits during second half of the day. This query uses UNNEST function to put all hits from an array to separate rows. This considers the fact that hits make up a session. Here's more details on the StackOverflow question I posted.
https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays
How to query data and SUM it within the query
How to query data and assign a mathematical operator to the field. In this example, we summed the totalTransactionRevenue and used 20% of it as Cost Of Goods Sold. In Big Query, the Revenue value was coming off in scientific notation format. Therefore, the CAST operator uses the nested value of SUM *0.20 and formats it as numeric format.
https://stackoverflow.com/questions/50904865/bigquery-convert-scientific-notation-to-decimal-format
In GA, there are four possible scopes: User, Session, Hit and Product.
Within a session, a user can have multiple transactions. In the above BigQuery, it does not need the UNNEST function as we had to in the hour of day query. The answer I got on StackOverflow wa that revenue is already an aggregated field made up of individual transaction and product revenues. Hence, you don't need to UNNEST it and move it from array to rows.
https://www.w3schools.com/sql/func_sqlserver_cast.asp
These are some of the examples that I could come up with it - thanks to Udemy and StackOverflow. Will create a separte post for more examples that I can find.