R Programming: Combine Boxplot and Scatterplot Into Single Visualization
This will be an east combination in ggplot to combine a boxplot and jitter plot to create a better visualization. Full script is at the end.
For the purpose of demo, the data is in Excel file called channelData that contains the channel name, date, sessions, revenue and revenue per session as columns.
Load the data via ReadXL and you’ll see the below. If you’d prefer to load the file via code, it’s in the bottom right corner.
You can then run the below command as part of dplyr to check on some summary stats, by channel. This groups the data by channel and then provides min, max, media and IQR. Here’s another blog post that looks at exploratory data analysis in R.
Once you know the summary stats, you can guess how the boxplot will appear [as the IQR showing 50% of the values between 1st and 3rd quartile is quite different for Paid Search vs Display and Social.
Go ahead and create the boxplot here.
In the visualization, you can see a few dots beyond the whiskers of the boxplot. Here’s a separate post that addresses how to find outliers in boxplots within r programming.
Let’s say, you wanted to check what a scatter plot would look like with the same data.
This is obviously hard to understand. To get around this issue, let’s use geom_jitter to add a bit of randomness to the data points on the chart.
Now that we have a boxplot and jitterplot, all that’s left is to layer the jitter plot on top of the boxplot to explain the distribution of values.
Here’s a link to the full script.