visualizing uncertainty

We often have some measure of uncertainty in our data—a forecast, prediction or range of possible values. A common challenge is how to visualize that uncertainty and help our audience understand the implications. In today’s post, I’ll use a real-world example to illustrate one approach and share tactics for creating in Excel.

The client’s original visual looked similar to the one below. It shows 2017 earnings per share (EPS) and the forecast outlook for the next four years. The client used a CAGR to forecast a range of possible EPS values from 2018 - 2021.  

 
Picture1.png
 

At first glance, it wasn’t obvious that the blue bars represented a forecast (even with the x-axis labeling of “E” for expected). The first yellow bar represents the 2017 actual EPS and next four blue bars are the forecast for 2018 - 2021 where the solid section represents the midpoint and the data labels is the uncertain piece—the range of projected values.  

I made a few design changes to make the graph a little easier to interpret. I first changed the bars to lines and used a dotted line for 2018 - 2021 with unfilled data markers to help visually reinforce the uncertainty.

 
Picture4.png
 

In Excel, there are two potential ways to achieve this formatting. A brute-force approach is to use a single data series and format each individual data point as a dotted line. Another approach is to graph two separate data series, one as a solid point or line and the second as a dashed line or unfilled circle, with a point of overlap to make the lines connect. You can read more detail about these two approaches in this prior post.

We often face the decision of preserving the y-axis vs. labeling data directly. I’ve done the latter in the visual below. One consideration in this decision point is the level of specificity your audience needs: are the actual values important? Or is the overall shape of the data more important? You can read more about these considerations in this prior post.

 
 

Next, let’s revisit how to show the range of forecast values. The original visual is shown again below where the forecast EPS values are represented by the data labels on top of the bars.

 
 

Rather than leave the audience with the highly taxing processing of reading these values, we can aid interpretation by instead depicting the forecast as a shaded range around the point estimate. This keeps the emphasis on the midpoints, while reducing clutter and eliminating the additional work the audience has to do. If the specific forecast values are important to the audience, we’ll deal with that momentarily.

 
 

The brute-force Excel method to adding this grey band requires a little math, graphing a second data series as a stacked bar and then formatting the stacked bar so that the bottom section renders white and the top section grey. You can download the accompanying Excel file to see how I accomplished this.

 
Picture6.png
 

But the visual is not yet complete. We should take the opportunity to add value to this data by telling the intended audience what they should know. Let’s assume this is a positive story where the outlook from the original base year (2016) has been extended to 2018. I might add explanatory text, paired with strategic use of color (I chose green to depict positivity) to focus attention on the relevant points of the data. If specific forecast EPS values are important for a given year, I could include them for context in the text. For a very technical audience, I might include even more detail with the statistics around the forecast. Just a reminder to always design with the audience’s needs in mind!

 
Picture5.png
 

 

Are you aware of other methods to achieve this effect? Have you seen other examples of uncertainty depicted effectively or tips you’d like to share? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

align against a common baseline

There is a recommendation I find myself often voicing to workshop participants: Think about what you want your audience to be able to easily compare. Put those things as physically close together as you can and align them along a common baseline. This post features a makeover applying this recommendation.

Read More

design with audience in mind

In this post, I makeover less-than-ideal visuals from a recent USA Today graphic summarizing diversity stats across a number of Bay Area tech companies and discuss my design thought process when doing so.

Read More

alternatives to pies

My disdain for pie charts is well documented. While opinions on their usefulness run the gamut, I am certainly not alone in my contempt. In my workshops, I sometimes get the question, "In what situation would you recommend a pie chart?" For me, the answer is never. There are a number of alternatives, each with their own benefits. It's these alternatives that I'll focus on in this post.

Read More

the story you want to tell...and the one your data shows

This is a case where the story being told wasn't quite right, or at least wasn't exactly the story I would tell after looking at the data in a couple of different ways. Here, I'll walk you through my thought process and makeover.

Read More

exploratory vs explanatory analysis

I often draw a distinction between exploratory and explanatory data analysis.

Exploratory analysis is what you do to get familiar with the data. You may start out with a hypothesis or question, or you may just really be delving into the data to determine what might be interesting about it. Exploratory analysis is the process of turning over 100 rocks to find perhaps 1 or 2 precious gemstones.

Explanatory analysis is what happens when you have something specific you want to show an audience - probably about those 1 or 2 precious gemstones. In my blogging and writing, I tend to focus mostly on this latter piece, explanatory analysis, when you've already gone through the exploratory analysis and from this have determined something specific you want to communicate to a given audience: in other words, when you want to tell a story with data.

Keeping this distinction in mind, I thought it might be interesting to look at a recent makeover and show how the visual you could use for the exploratory and explanatory steps of the analytical process might differ.

For this (generalized & simplified*) example, imagine that you work for a car manufacturer. You're looking at customer feedback, specifically to better understand how failed or less-than-ideal performance across various dimensions for a given make and model impacts customer satisfaction. The primary output variable you're looking at in this case is an overall question in your customer satisfaction survey, where customers are asked to express their overall satisfaction with their car along a 5-point Likert scale (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied). Let's assume you're most interested in anyone responding with anything other than Very Satisfied, and want to understand how this varies by customers who have reported specific issue(s) with their car, by the type of issue.

*Please keep in mind that I'm making up the specific scenario here; the makeover is a generalized example from a past workshop where I don't have all of the details and also don't have other data that would possibly be of benefit in the exploratory and explanatory phases. For example, there are likely other things that drive the overall satisfaction with the car, which we're ignoring here. Also, anytime you show percents like this, I'd recommend also showing the N count - in this case, the number of people reporting the given issue - which will be helpful for the interpretation of the data.

Your initial visual might look something like the following:

In the above, I've grouped all of the "less than very satisfied" responses (in orange), with the data arranged in descending order of this metric. With this visual, you can scan through the various issues and see the relevant "less than very satisfied" metric. This might be useful for part of your exploratory analysis.

Once you've identified something or some things to focus on, in some cases it will make sense to create a different visual for the purpose of focusing on that thing or those things. Alternatively, the same visual and be modified for explanatory purposes by drawing attention to the points of interest, while preserving the other data for context:

We can use the same visual and approach for highlighting another potential point of interest:

Or another:

Note how, when we focus on one aspect or story, it's actually harder to see the others. That's one of the reasons it's important to do exploratory analysis before you get to the explanatory phase: so you can have confidence that you're focusing your audience on the right thing(s).

In case it's of interest, the Excel workbook with the above graphs can be downloaded here.