member spotlight: Adam Ribaudo

 
An image of Adam Ribaudo, the September 2020 Member Spotlight.
 

The following is a guest post from Adam Ribaudo (originally posted on his website, noisetosignal.io).
Adam is a regular contributor to the
SWD community and recently challenged himself to recreate storytelling with data visuals in R. Keep reading to learn more about his project and don’t forget to pop over to the community to chat with him directly as part of our September Member Spotlight.

Each month, we select someone from the community to highlight through our Member Spotlight. These are people who contribute in ways that foster an even more connected and diverse learning environment. Our goal is to help you get to know your fellow community members a little better and learn from one another. 


Title.jpg

I’m putting the finishing touches on a project that I started just over a month ago: Re-creating the visualizations of storytelling with data using R. All the code is available on Github here. If you’re not familiar with the Storytelling with Data (SWD) book, it’s a master-class in communication through data and has become a must-read for data professionals. The lessons shared in this book have inspired many to prioritize context, reducing clutter, and focusing your audience’s attention through color, size, and position. A great example is shown below.

 
A slope graph comparing favorable employee feedback results from 2014 and 2015.
 

The author, Cole Nussbaumer Knaflic, has also built a community portal that allows data visualization enthusiasts to collaborate and critique each other’s work.

Recently, I’ve discussed how R can be an analyst’s best friend. While I firmly believe that to be the case, I also like to challenge that notion. Are there areas where R falls apart? Can it do everything an analyst needs to do? It was through this critical lens that I thought to put R to the test and see if I could re-create all of the charts in SWD. The results are in and I’d love to share what I learned along the way.

This project wouldn’t have been possible were it not for the fact that the book’s author provided the raw data and original charts (built in Excel no less!). Before finding that link, I had considered estimating each chart’s data points by sight which would have been insane! It demonstrates, however, how excited I was to get started.

The beauty of working with R’s ggplot2 library (which stands for ‘grammar of graphics’) is that any plot can be deconstructed into its constituent parts which then have corresponding functions. Those parts are:

  • geometry (ex: line, bar, point, text)

  • scale (ex: x-axis, y-axis, color, fill)

  • mapping of data to scales (ex: car type -> x-axis)

  • theme (ex: Title font, caption color)

This meant that, over time, it became second nature to look at an SWD chart and pick out the corresponding ggplot2 functions that would be necessary. Here’s some of the before and after output:

A side-by-side comparison of a scatterplot from Cole Nussbaumer Knafflic’s storytelling with data and Adam’s version created in R.
A side-by-side comparison of a line chart from Cole Nussbaumer Knafflic’s storytelling with data and Adam’s version created in R.

Lessons Learned

If you can dream it, you can build it in ggplot2

The results speak for themselves. That I (and my collaborator, who I’ll introduce later) were able to recreate each of the plots with near-pixel-perfection shows the power and flexibility of ggplot2. The breadth and diversity of chart types covered in SWD provides compelling evidence that ggplot2 can tackle anything you throw at it. This is often true when using standard ggplot2 functions, but especially true when you start to consider the add-on libraries such as ggtext and grid that can handle edge cases and unique plotting requirements. That said…

Just because you can dream it doesn’t mean that you *should* build it in ggplot2

It became clear after manually positioning annotation text for the 40th time that there are some functions that are best left for WYSIWYG presentation software (like PowerPoint) rather than code. If an element can’t be mapped back to data and is difficult to arrange in code, it may be better to leave it for a downstream tool. The broader question one must keep in mind is: Can my chart accept new data points and how much effort must I expend to update it? In the case of annotations, you can easily retype and arrange a line of text in PowerPoint if the data changes so it passes this test.

Consistent practice is the key to mastery

While I have much left to learn, my level of comfort and ability with ggplot2 before and after this project is incomparable. I went from someone who had to Google every single function and parameter to someone who could smoothly generate visualizations and transformations as they came to mind. The key was (almost) daily practice as well as having some external motivation from a collaborator who joined half-way through. Speaking of collaboration…

The data viz and R communities are strong. Whenever possible, collaborate and seek out new perspectives

When I started this project, it was meant as a personal challenge. As an afterthought, I posted on the SWD forums asking if anyone else would want to share in the fun. Thankfully, Wal McConnell, took me up on that offer and helped in ways I didn’t realize were needed. It taught me an important lesson about R: no single person can keep track of all its packages and capabilities, so you’ll produce your best work when you can collaborate with others.

A few key tidyverse transformations go a long way

Last, I’ll share some more technical tips related to the data and visual transformations I came back to over and over again. These are the tidyverse functions I had a passing familiarity with but know by heart at this point. If you’re only going to memorize a few functions, these are the ones.

  • Data Transformation

    • pivot_longer / pivot_wider – Your first step in transforming data is making sure it’s tidy. This often involves some sort of pivot.

    • case_when to create switch statements that manipulate/mutate values based on conditions

    • forcats::fct_relevel() – ggplot2 takes its cues from factor levels when ordering elements. If you need a different order, you often need to go back to the original factor.

  • ggplot2

Did anyone ask for this project? Absolutely not. Did I gain a ton of experience that I’ll apply to my professional work? Yes! It goes to show that it’s more important to set goals than to fret over which goals to set. I hope this helps inspire others to take on similar challenges.

You can find the GitHub repository with code and images here:  https://github.com/adamribaudo/storytelling-with-data-ggplot


You can find more of Adam’s work and thoughts on his site: noisetosignal.io. Head over to the community to chat with Adam directly this month as part of the September Member Spotlight.

There are plenty of noteworthy members in the SWD community—too many to spotlight at once. This program is part of our larger efforts to find new ways to share multiples voices and experiences. As you discover great work and ideas in the community, be sure to give appropriate kudos and spread the word!