Barebone bacon barplot: A minimal plot with R + ggplot2

A few months back, a blog post by Darkhorse Analytics illustrated how to enhance a plot by removing unnecessary ink and making other simplifications. The process is animated in this gif:

DarkhorseGIF

I like the final result: clean, simple, and without unnecessary details. We can make many of the same tweaks to improve plots in R. This post is a walkthrough of a similar process using the awesome ggplot2 package.

First, load the library and create a dataframe with our to-be-plotted data:

library(ggplot2)

data <- data.frame(Type_of_Food = c("French Fries", "Potato Chips", "Bacon", "Pizza", "Chili Dog"),
             Number_of_Calories = c(607, 542, 533, 296, 260)) 

data

  Type_of_Food Number_of_Calories
1 French Fries                607
2 Potato Chips                542
3        Bacon                533
4        Pizza                296
5    Chili Dog                260

As a starting point, let’s try a default qplot, or quickplot.

qplot(data=data, x=Type_of_Food, y=Number_of_Calories, 
      geom="bar", stat="identity", fill=factor(Type_of_Food), 
      main="Calories per 100g")

which gives us:

barplot0

Not bad! In general, the defaults in ggplot2 are excellent. For everyday analysis, I rarely need to mess with defaults when exploring a dataset. However, when presenting analyses or results to others, simplifying your plots can help tremendously. Simpler plots reduce distractions and guide your audience’s attention to the key points. By removing some unnecessary features and slightly adjusting some others, we can highlight what really matters here: bacon.

In the code samples below, I’ll gradually build on the original qplot, one or two lines at a time.

Step 1: Remove unnecessary ink

The legend here is redundant, because each bar is also labeled along the x-axis. Remove the legend by changing the value of legend.position to “none”.

qplot(data=data, x=Type_of_Food, y=Number_of_Calories, 
      geom="bar", stat="identity", fill=factor(Type_of_Food), 
      main="Calories per 100g") + 
      theme(legend.position = "none")

barplot1a

Many features in ggplot can often be adjusted through theme(). To get an overview of the possibilities, try ?theme.

Likewise, the grey background is unnecessary and can be removed by setting panel.background to element_blank().

qplot(data=data, x=Type_of_Food, y=Number_of_Calories, 
      geom="bar", stat="identity", fill=factor(Type_of_Food), 
      main="Calories per 100g") + 
      theme(legend.position = "none" ,
      panel.background = element_blank())

barplot1b

We can use the same method to remove several axis elements:

qplot(data=data, x=Type_of_Food, y=Number_of_Calories, 
      geom="bar", stat="identity", fill=factor(Type_of_Food), 
      main="Calories per 100g") + 
      theme(legend.position = "none", 
      panel.background = element_blank(),
      axis.ticks = element_blank(), 
      axis.text.y = element_blank(), 
      axis.title = element_blank())

barplot1c

Step 2: Improve scales

Now we need to improve what’s left. First, let’s use a more sensible ordering on the x-axis. By default, when a factor like data$Type_of_Food is mapped to an x- or y-axis, it will be ordered according to the order of the underlying factor levels. We can check this by typing the name of the vector (below) or with levels(data$Type_of_Food).

data$Type_of_Food

[1] French Fries Potato Chips Bacon        Pizza        Chili Dog   
Levels: Bacon Chili Dog French Fries Pizza Potato Chips

You can see that the order of the factors is alphabetical, and that is why they are ordered so in our plot. I’d prefer the bars to be ordered according to the corresponding caloric value, from highest to lowest. New factor levels can be set using factor() and supplying the new set of levels, in order, in the levels argument. Note that in our existing dataframe, the rows are already ordered from highest to lowest calorie, so desired order of levels is exactly as they appear in data$Type_of_Food.

data$Type_of_Food <- factor(data$Type_of_Food, levels=data$Type_of_Food)

data$Type_of_Food

[1] French Fries Potato Chips Bacon        Pizza        Chili Dog   
Levels: French Fries Potato Chips Bacon Pizza Chili Dog

Now the order of the levels starts with French Fries (highest calories) and ends with Chili Dog (lowest calorie, allegedly). With the levels properly ordered, we can simply rerun the same code:

qplot(data=data, x=Type_of_Food, y=Number_of_Calories, 
      geom="bar", stat="identity", fill=factor(Type_of_Food), 
      main="Calories per 100g") + 
      theme(legend.position = "none",
      panel.background = element_blank(),
      axis.ticks = element_blank(), 
      axis.text.y = element_blank(), 
      axis.title = element_blank())

barplot2a

Much better. Next, we can make bacon really shine by simplifying the color scale. Here, the fill argument specifies the variable mapped to each bar’s fill colors. Because the fill color is mapped to data$Type_of_Food, a factor with five levels, we get five different colors across the bars.

I’ll create a new variable in data called barcolor that only takes two values, 0 and 1. Only bacon will have a value of 1.

data$barcolor <- ifelse(data$Type_of_Food=="Bacon", 1, 0)

data

  Type_of_Food Number_of_Calories barcolor
1 French Fries                607        0
2 Potato Chips                542        0
3        Bacon                533        1
4        Pizza                296        0
5    Chili Dog                260        0

Now we can map fill to barcolor, and use scale_fill_manual() to manually set which two colors will correspond to each value in barcolor.

qplot(data=data, x=Type_of_Food, y=Number_of_Calories, 
      geom="bar", stat="identity", fill=factor(barcolor), 
      main="Calories per 100g") + 
      scale_fill_manual(values=c("grey50", "indianred"), breaks=c(0, 1)) +
      theme(legend.position = "none",
      panel.background = element_blank(),
      axis.ticks = element_blank(), 
      axis.text.y = element_blank(), 
      axis.title = element_blank())

barplot2b

We might have gone overboard with the earlier ink removal, because we have no way of decoding the underlying calorie values without the y-axis text! Instead, we can just label each bar with the corresponding calorie value using annotate(), which can place text (and other things) given (x,y) coordinates. It looks like a lot below, but I’m repeating a similar line here five times. In each line, I specify that I’m placing text, the x- and y-coordinates to place the text, and the label (the text that I want to appear in the plot).

Each bar is centered at 1, 2, 3, 4, and 5 along the x-axis, so I can use those same values as the x-values when placing the text.

The height of each bar is equal to the corresponding calorie value, or data$Number_of_Calories, and I want the text to appear just below the top of each bar. I found that using the calorie value minus 30 worked pretty well here.

Finally, the label, or actual text that I want to place at each point is the corresponding value of data$Number_of_Calories.

qplot(data=data, x=Type_of_Food, y=Number_of_Calories, 
      geom="bar", stat="identity", fill=factor(barcolor), main="Calories per 100g") + 
      scale_fill_manual(values=c("grey50", "indianred"), breaks=c(0, 1))  +
      annotate("text", x=1, y=data$Number_of_Calories[data$Type_of_Food=="French Fries"]-30, label=data$Number_of_Calories[data$Type_of_Food=="French Fries"], colour="white", size=7) +
      annotate("text", x=2, y=data$Number_of_Calories[data$Type_of_Food=="Potato Chips"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Potato Chips"], colour="white", size=7) +
      annotate("text", x=3, y=data$Number_of_Calories[data$Type_of_Food=="Bacon"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Bacon"], colour="white", size=7) +
      annotate("text", x=4, y=data$Number_of_Calories[data$Type_of_Food=="Pizza"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Pizza"], colour="white", size=7) +
      annotate("text", x=5, y=data$Number_of_Calories[data$Type_of_Food=="Chili Dog"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Chili Dog"], colour="white", size=7) +
      theme(legend.position = "none",
            panel.background = element_blank(),
            axis.ticks = element_blank(), 
            axis.text.y = element_blank(), 
            axis.title = element_blank())

barplot2c

Step 3: Final adjustments

Almost done, just a few tweaks left!

Sometimes you’ll notice that barplots in ggplot2 will “hover” just above the x-axis. To make them sit flat against the axis, adjust the expand parameter in scale_y_continuous. I’ve also set the limits of the y-axis at 0 and 650 here.

      qplot(data=data, x=Type_of_Food, y=Number_of_Calories, 
      geom="bar", stat="identity", fill=factor(barcolor), main="Calories per 100g") + 
      scale_fill_manual(values=c("grey50", "indianred"), breaks=c(0, 1)) +
      scale_y_continuous(expand=c(0, 0), limits=c(0, 650)) +
      annotate("text", x=1, y=data$Number_of_Calories[data$Type_of_Food=="French Fries"]-30, label=data$Number_of_Calories[data$Type_of_Food=="French Fries"], colour="white", size=7) +
      annotate("text", x=2, y=data$Number_of_Calories[data$Type_of_Food=="Potato Chips"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Potato Chips"], colour="white", size=7) +
      annotate("text", x=3, y=data$Number_of_Calories[data$Type_of_Food=="Bacon"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Bacon"], colour="white", size=7) +
      annotate("text", x=4, y=data$Number_of_Calories[data$Type_of_Food=="Pizza"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Pizza"], colour="white", size=7) +
      annotate("text", x=5, y=data$Number_of_Calories[data$Type_of_Food=="Chili Dog"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Chili Dog"], colour="white", size=7) +
      theme(legend.position = "none",
            panel.background = element_blank(),
            axis.ticks = element_blank(), 
            axis.text.y = element_blank(), 
            axis.title = element_blank())

barplot3a

I like the look of the slimmer bars in the Darkhorse example, so let’s reduce the width of bars with the width parameter.

qplot(data=data, x=Type_of_Food, y=Number_of_Calories, 
      geom="bar", stat="identity", fill=factor(barcolor), main="Calories per 100g", width=.7) + 
      scale_fill_manual(values=c("grey50", "indianred"), breaks=c(0, 1)) +
      scale_y_continuous(expand=c(0, 0), limits=c(0, 650)) +
      annotate("text", x=1, y=data$Number_of_Calories[data$Type_of_Food=="French Fries"]-30, label=data$Number_of_Calories[data$Type_of_Food=="French Fries"], colour="white", size=7) +
      annotate("text", x=2, y=data$Number_of_Calories[data$Type_of_Food=="Potato Chips"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Potato Chips"], colour="white", size=7) +
      annotate("text", x=3, y=data$Number_of_Calories[data$Type_of_Food=="Bacon"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Bacon"], colour="white", size=7) +
      annotate("text", x=4, y=data$Number_of_Calories[data$Type_of_Food=="Pizza"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Pizza"], colour="white", size=7) +
      annotate("text", x=5, y=data$Number_of_Calories[data$Type_of_Food=="Chili Dog"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Chili Dog"], colour="white", size=7) +
      theme(legend.position = "none",
            panel.background = element_blank(),
            axis.ticks = element_blank(), 
            axis.text.y = element_blank(), 
            axis.title = element_blank())

barplot3b

Our x-axis labels are too small, so I’ll increase the corresponding size parameter in axis.text.x:

qplot(data=data, x=Type_of_Food, y=Number_of_Calories, 
      geom="bar", stat="identity", fill=factor(barcolor), main="Calories per 100g", width=.7) + 
      scale_fill_manual(values=c("grey50", "indianred"), breaks=c(0, 1)) +
      scale_y_continuous(expand=c(0, 0), limits=c(0, 650)) +
      annotate("text", x=1, y=data$Number_of_Calories[data$Type_of_Food=="French Fries"]-30, label=data$Number_of_Calories[data$Type_of_Food=="French Fries"], colour="white", size=7) +
      annotate("text", x=2, y=data$Number_of_Calories[data$Type_of_Food=="Potato Chips"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Potato Chips"], colour="white", size=7) +
      annotate("text", x=3, y=data$Number_of_Calories[data$Type_of_Food=="Bacon"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Bacon"], colour="white", size=7) +
      annotate("text", x=4, y=data$Number_of_Calories[data$Type_of_Food=="Pizza"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Pizza"], colour="white", size=7) +
      annotate("text", x=5, y=data$Number_of_Calories[data$Type_of_Food=="Chili Dog"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Chili Dog"], colour="white", size=7) +
      theme(legend.position = "none",
            panel.background = element_blank(),
            axis.ticks = element_blank(), 
            axis.text.y = element_blank(), 
            axis.title = element_blank(),
            axis.text.x = element_text(size=15))

barplot3c

Finally, I changed the size, color, and hjust (or horizontal justification) of the title.

qplot(data=data, x=Type_of_Food, y=Number_of_Calories, 
      geom="bar", stat="identity", fill=factor(barcolor), main="Calories per 100g", width=.7) + 
      scale_fill_manual(values=c("grey50", "indianred"), breaks=c(0, 1)) +
      scale_y_continuous(expand=c(0, 0), limits=c(0, 650)) +
      annotate("text", x=1, y=data$Number_of_Calories[data$Type_of_Food=="French Fries"]-30, label=data$Number_of_Calories[data$Type_of_Food=="French Fries"], colour="white", size=7) +
      annotate("text", x=2, y=data$Number_of_Calories[data$Type_of_Food=="Potato Chips"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Potato Chips"], colour="white", size=7) +
      annotate("text", x=3, y=data$Number_of_Calories[data$Type_of_Food=="Bacon"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Bacon"], colour="white", size=7) +
      annotate("text", x=4, y=data$Number_of_Calories[data$Type_of_Food=="Pizza"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Pizza"], colour="white", size=7) +
      annotate("text", x=5, y=data$Number_of_Calories[data$Type_of_Food=="Chili Dog"]-30, label=data$Number_of_Calories[data$Type_of_Food=="Chili Dog"], colour="white", size=7) +
      theme(legend.position = "none",
            panel.background = element_blank(),
            axis.ticks = element_blank(), 
            axis.text.y = element_blank(), 
            axis.title = element_blank(),
            axis.text.x = element_text(size=15),
            title = element_text(size=20, colour="grey50", hjust=0))

barplot3d

That’s it! You could polish this plot further with tools like Adobe Illustrator or Inkscape, but I think it’s pretty good as is. Also, there are probably (almost certainly) more efficient ways to make the adjustments shown above, but it should help get you started in the right direction. Thanks for reading – I’m off to start my chili dog diet.