Analytics Log - Adil Khan

View Original

R Programming: Combine Boxplot and Scatterplot Into Single Visualization

This will be an east combination in ggplot to combine a boxplot and jitter plot to create a better visualization. Full script is at the end.

For the purpose of demo, the data is in Excel file called channelData that contains the channel name, date, sessions, revenue and revenue per session as columns.

Load the data via ReadXL and you’ll see the below. If you’d prefer to load the file via code, it’s in the bottom right corner.

You can then run the below command as part of dplyr to check on some summary stats, by channel. This groups the data by channel and then provides min, max, media and IQR. Here’s another blog post that looks at exploratory data analysis in R.

See this content in the original post

Once you know the summary stats, you can guess how the boxplot will appear [as the IQR showing 50% of the values between 1st and 3rd quartile is quite different for Paid Search vs Display and Social.

Go ahead and create the boxplot here.

See this content in the original post

In the visualization, you can see a few dots beyond the whiskers of the boxplot. Here’s a separate post that addresses how to find outliers in boxplots within r programming.
Let’s say, you wanted to check what a scatter plot would look like with the same data.

See this content in the original post

This is obviously hard to understand. To get around this issue, let’s use geom_jitter to add a bit of randomness to the data points on the chart.

See this content in the original post

The alpha in the code adds a bit of transparency to the dots and makes it easier to understand the density.

Now that we have a boxplot and jitterplot, all that’s left is to layer the jitter plot on top of the boxplot to explain the distribution of values.

See this content in the original post

Adding the ylim = 0.75 helps remove some of the outliers.

Here’s a link to the full script.

See this content in the original post