Barebone bacon barplot: A minimal plot with R + ggplot2
August 16, 2014
A few months back, a blog post by Darkhorse Analytics illustrated how to enhance a plot by removing unnecessary ink and making other simplifications. The process is animated in this gif:
I like the final result: clean, simple, and without unnecessary details. We can make many of the same tweaks to improve plots in R. This post is a walkthrough of a similar process using the awesome ggplot2 package.
First, load the library and create a dataframe with our to-be-plotted data:
As a starting point, let’s try a default qplot, or quickplot.
which gives us:
Not bad! In general, the defaults in ggplot2 are excellent. For everyday analysis, I rarely need to mess with defaults when exploring a dataset. However, when presenting analyses or results to others, simplifying your plots can help tremendously. Simpler plots reduce distractions and guide your audience’s attention to the key points. By removing some unnecessary features and slightly adjusting some others, we can highlight what really matters here: bacon.
In the code samples below, I’ll gradually build on the original qplot, one or two lines at a time.
Step 1: Remove unnecessary ink
The legend here is redundant, because each bar is also labeled along the x-axis. Remove the legend by changing the value of legend.position to “none”.
Many features in ggplot can often be adjusted through theme(). To get an overview of the possibilities, try ?theme.
Likewise, the grey background is unnecessary and can be removed by setting panel.background to element_blank().
We can use the same method to remove several axis elements:
Step 2: Improve scales
Now we need to improve what’s left. First, let’s use a more sensible ordering on the x-axis. By default, when a factor like data$Type_of_Food is mapped to an x- or y-axis, it will be ordered according to the order of the underlying factor levels. We can check this by typing the name of the vector (below) or with levels(data$Type_of_Food).
You can see that the order of the factors is alphabetical, and that is why they are ordered so in our plot. I’d prefer the bars to be ordered according to the corresponding caloric value, from highest to lowest. New factor levels can be set using factor() and supplying the new set of levels, in order, in the levels argument. Note that in our existing dataframe, the rows are already ordered from highest to lowest calorie, so desired order of levels is exactly as they appear in data$Type_of_Food.
Now the order of the levels starts with French Fries (highest calories) and ends with Chili Dog (lowest calorie, allegedly). With the levels properly ordered, we can simply rerun the same code:
Much better. Next, we can make bacon really shine by simplifying the color scale. Here, the fill argument specifies the variable mapped to each bar’s fill colors. Because the fill color is mapped to data$Type_of_Food, a factor with five levels, we get five different colors across the bars.
I’ll create a new variable in data called barcolor that only takes two values, 0 and 1. Only bacon will have a value of 1.
Now we can map fill to barcolor, and use scale_fill_manual() to manually set which two colors will correspond to each value in barcolor.
We might have gone overboard with the earlier ink removal, because we have no way of decoding the underlying calorie values without the y-axis text! Instead, we can just label each bar with the corresponding calorie value using annotate(), which can place text (and other things) given (x,y) coordinates. It looks like a lot below, but I’m repeating a similar line here five times. In each line, I specify that I’m placing text, the x- and y-coordinates to place the text, and the label (the text that I want to appear in the plot).
Each bar is centered at 1, 2, 3, 4, and 5 along the x-axis, so I can use those same values as the x-values when placing the text.
The height of each bar is equal to the corresponding calorie value, or data$Number_of_Calories, and I want the text to appear just below the top of each bar. I found that using the calorie value minus 30 worked pretty well here.
Finally, the label, or actual text that I want to place at each point is the corresponding value of data$Number_of_Calories.
Step 3: Final adjustments
Almost done, just a few tweaks left!
Sometimes you’ll notice that barplots in ggplot2 will “hover” just above the x-axis. To make them sit flat against the axis, adjust the expand parameter in scale_y_continuous. I’ve also set the limits of the y-axis at 0 and 650 here.
I like the look of the slimmer bars in the Darkhorse example, so let’s reduce the width of bars with the width parameter.
Our x-axis labels are too small, so I’ll increase the corresponding size parameter in axis.text.x:
Finally, I changed the size, color, and hjust (or horizontal justification) of the title.
That’s it! You could polish this plot further with tools like Adobe Illustrator or Inkscape, but I think it’s pretty good as is. Also, there are probably (almost certainly) more efficient ways to make the adjustments shown above, but it should help get you started in the right direction. Thanks for reading – I’m off to start my chili dog diet.