UPDATE: Alberto Cairo has offered to give a signed copy of his new book away to three lucky participants—participate in this challenge to get practice with this important concept plus a chance to get a new book!
How do we communicate the appropriate sense of uncertainty in our data? This question was raised at a recent workshop and is one people commonly encounter when communicating data. I often grapple with this challenge.
People like things that are concrete and certain: and so, when we plot our data in a solid point, line, or bar on a graph, it’s easy for people to consider it “truth.” There are times when that is appropriate, and times when it definitely isn’t.
Data is rarely perfect. The things we do with it are never perfect. We aim to be robust and not to mislead within the constraints of the data we have available. We might summarize from an incomplete sample, make assumptions that may or may not prove accurate, model behavior that works in aggregate but breaks down if you get too granular, or forecast something into the future when conditions aren’t fully known or may otherwise vary from expectations.
In all of these cases (and many more I haven’t called out specifically), there is uncertainty that we ought to convey in order to provide a more complete picture to our audience. It should be noted that there is also a risk of being too circumspect, adding so many caveats to our work that the actual data loses its credibility. How do we balance these requirements and communicate uncertainty appropriately to people who are trusting in our graphs and using them to make decisions?
The general approach I recommend is to design the visual in a way that makes the uncertainty clear. This can be done in a number of different ways, depending on what you visualize and how you show it: dotted lines or patterned bars to differentiate forecast data from actual; shaded areas to highlight confidence intervals or projected variance; multiple lines to illustrate expected outcomes given varying assumptions; or words on the graph to assist with interpretation or properly caveat. This is definitely not a comprehensive list—just some nuggets to get you started.
Here are some additional perspectives and resources related to this topic:
Alberto Cairo NYT article: This Hurricane Map Doesn’t Mean What You Think It Means
Data Stories podcast: Visualizing Uncertainty with Jessica Hullman & Matthew Kay
Elijah Meek’s Medium article: Sketchy Data Visualization
Jessica Hullman’s paper: Why Authors Don’t Visualize Uncertainty
Matthew Kay’s Tapestry keynote: A biased tour of the uncertainty visualization zoo
Nathan Yau’s post: Visualizing the Uncertainty in Data
If you’re aware of additional related resources, please leave a comment on this post. Without further ado, next come the details for this month’s challenge.
CHALLENGE: Identify a scenario and data where there is inherent uncertainty and design a visualization that makes that clear. If you’re in search of data, check out this list of publicly available data sources. You're also welcome to use a real work example if you'd like, just please don't share anything confidential.
DEADLINE: Friday the 13th (couldn’t help myself) by midnight PST.
SUBMISSION INSTRUCTIONS: upload your visual and related commentary via at storytellingwithdata.com/SWDchallengeSUBMIT. (DO NOT EMAIL: we are no longer monitoring the old alias!) Feel free to also share on social media at any point using #SWDchallenge. For inclusion in the summary recap post, submissions must be officially submitted to us (still a time-intensive process and we aren’t able to scrape Twitter and other social media sites).
Stay tuned for the recap post later this month, where we’ll share back with you all of the visuals created and shared as part of this challenge. Also check out the #SWDchallenge page for past challenge details and recaps. I’m certainly excited to see what you come up with this month!