visualizing uncertainty

We often have some measure of uncertainty in our data—a forecast, prediction or range of possible values. A common challenge is how to visualize that uncertainty and help our audience understand the implications. In today’s post, I’ll use a real-world example to illustrate one approach and share tactics for creating in Excel.

The client’s original visual looked similar to the one below. It shows 2017 earnings per share (EPS) and the forecast outlook for the next four years. The client used a CAGR to forecast a range of possible EPS values from 2018 - 2021.  

 
Picture1.png
 

At first glance, it wasn’t obvious that the blue bars represented a forecast (even with the x-axis labeling of “E” for expected). The first yellow bar represents the 2017 actual EPS and next four blue bars are the forecast for 2018 - 2021 where the solid section represents the midpoint and the data labels is the uncertain piece—the range of projected values.  

I made a few design changes to make the graph a little easier to interpret. I first changed the bars to lines and used a dotted line for 2018 - 2021 with unfilled data markers to help visually reinforce the uncertainty.

 
Picture4.png
 

In Excel, there are two potential ways to achieve this formatting. A brute-force approach is to use a single data series and format each individual data point as a dotted line. Another approach is to graph two separate data series, one as a solid point or line and the second as a dashed line or unfilled circle, with a point of overlap to make the lines connect. You can read more detail about these two approaches in this prior post.

We often face the decision of preserving the y-axis vs. labeling data directly. I’ve done the latter in the visual below. One consideration in this decision point is the level of specificity your audience needs: are the actual values important? Or is the overall shape of the data more important? You can read more about these considerations in this prior post.

 
 

Next, let’s revisit how to show the range of forecast values. The original visual is shown again below where the forecast EPS values are represented by the data labels on top of the bars.

 
 

Rather than leave the audience with the highly taxing processing of reading these values, we can aid interpretation by instead depicting the forecast as a shaded range around the point estimate. This keeps the emphasis on the midpoints, while reducing clutter and eliminating the additional work the audience has to do. If the specific forecast values are important to the audience, we’ll deal with that momentarily.

 
 

The brute-force Excel method to adding this grey band requires a little math, graphing a second data series as a stacked bar and then formatting the stacked bar so that the bottom section renders white and the top section grey. You can download the accompanying Excel file to see how I accomplished this.

 
Picture6.png
 

But the visual is not yet complete. We should take the opportunity to add value to this data by telling the intended audience what they should know. Let’s assume this is a positive story where the outlook from the original base year (2016) has been extended to 2018. I might add explanatory text, paired with strategic use of color (I chose green to depict positivity) to focus attention on the relevant points of the data. If specific forecast EPS values are important for a given year, I could include them for context in the text. For a very technical audience, I might include even more detail with the statistics around the forecast. Just a reminder to always design with the audience’s needs in mind!

 
Picture5.png
 

 

Are you aware of other methods to achieve this effect? Have you seen other examples of uncertainty depicted effectively or tips you’d like to share? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

#SWDchallenge: your choice makeover

UPDATE: DEADLINE EXTENDED—SUBMIT YOUR MAKEOVER BY 7/12 at MIDNIGHT PST.

Less-than-ideal graphs are prominent in the world around us. Your challenge this month should be a fun one: find one of these and work to make it better. Improving upon an existing graph can be a great way to build your skills and reflecting on this process and sharing is a great way to also bring benefit to others.

As part of this month's #SWDchallenge, I have a few requests... First, be kind. A real person made the graph you will be critiquing and likely faced a number of constraints that simply don't exist in this sort of exercise. (For some great insight on this, listen to or read the transcript from Episode 1 of the SWD podcast, which focuses on the art of feedback in data visualization. In particular, check out Alex Selby-Boothroyd's response to the hurricane data viz challenge and the constraints his team at The Economist faced in creating the original graph.) It's easy to forget this and totally rip apart someone else's work. That said, there is great value in constructive critique (both giving and receiving), which can help us all become more effective when communicating visually with data.

Please share both the before and after. It's useful to see the original visual (cite source as appropriate), as well as your revamped version. Note also that these don't have to be major changes: minor improvements are often what is needed to take something from good to great. Full submission details can be found at the bottom of this post. Please do follow them, as it makes the manual process of pulling together the recap post a bit easier.

Finally, I'd love to not only see your makeover, but have you share insight into your thought process, something I suspect other readers will appreciate as well. A sentence or two (or a few) is great; if you find you want to write more, please feel free to post externally (if you don't have a dedicated spot to do so, you can post an article on LinkedIn) and provide the link for us to include in our recap. 

In case seeing some example before-and-afters would be helpful, below are a few historical posts where I've shared with my makeovers of less-than-ideal visuals I've encountered in the world around me:

You have until July 8th (by midnight PST) to create and share your makeover. Please do so following the instructions below.

SUBMISSION INSTRUCTIONS:

  • Make it. Identify a graph and improve upon it using the tool of your choice. If you need help finding data, check out this list of publicly available data sources (for this particular challenge, you may find the curated content at viz.wtf inspiring). You're also welcome to use a real work example, just please don't share anything confidential.
  • Share it. Email your entry (please include both before & after) to SWDchallenge@storytellingwithdata.com by the deadline. Attach your image(s) as a .PNG. Put any commentary you’d like included in our follow up post in the body of the email (e.g. what tool you used, any notes on your methods or thought process you’d like to share); if there’s a social media profile or blog/site you’d like mentioned, please embed the links directly in your commentary (e.g. Blog | Twitter). If you’re going to write more than a paragraph or so, I encourage you to post it externally and provide a link or summary for inclusion. Feel free to also share on social media at any point using #SWDchallenge.
  • The fine print. I reserve the right to post and potentially reuse examples shared.

I look forward to seeing what you come up with! Stay tuned for the recap post in the second half of July. Check out the #SWDchallenge page for past challenges and recaps.

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

accessible data viz is better data viz

Accessibility in data visualization goes well beyond considerations for color blindness. This is something I personally understand much better after my interactions with Amy Cesal in preparation for bringing you the following post. Back in April, we posted a short article here with some accessibility considerations in visual design and Amy sent me a private note basically expressing, “but wait…there’s more!” I love when content posted on the blog starts conversations like this—as I continue to learn, it means there’s more great content to share with you. Just be aware that if you reach out like this, I may talk you into doing some writing and sharing: that’s what has happened here when I invited Amy to pen a guest post and she graciously accepted!

I was already familiar with some of Amy’s work (if you haven’t seen it, be sure to check out her #DayDohViz, where she is in the process of creating 100 visualizations made out of PlayDoh, including a couple she’s shared via #SWDchallenge—you can view them all here). Amy first started being interested in accessibility for data visualization through her work in the federal government, where everything has to be up to 508 accessibility standards. At CFPB, she led the creation of the data visualization portion of the design manual, which is the first style guide for the federal government that includes accessibility for data visualization. Please join me in thanking Amy for writing and sharing the following expanded post on accessibility in data visualization!


accessibility matters

Often, when we are creating charts and graphs, we think of ourselves as the ideal user. This is not only a problem because we know more about the data than the target user, but because other users might have a different set of constraints than we do.

Inclusive design principles and accessibility (often posted about with the tag #a11y) are important to take into consideration when designing data visualization because they help a broader audience understand your graphic. Designing with accessibility in mind can even help make your visualizations easier to understand for people without disabilities.

To help you understand how other people experience the world, empathy prompts provide a variety of suggestions. The Microsoft guide to the principles of inclusive design, excepted below, suggests thinking about temporary and situational disabilities as well as permanent ones.  

Being clear with text, distinctive labeling, and adding multiple ways to identify the point to your visuals will make it easier for people with impairments and those without to interpret your graphs. There are easy ways to add the principles of accessibility into your visual communications. Here are five simple ones.
 

5 easy ways to make your data visualization more accessible
 

1. Add Alt text

Alternative text (referred to as Alt text) is displayed when the image cannot be. Screen readers, the assistive technology used by people who are visually impaired, read alt text out loud in place of people seeing the image. It’s important to have valuable alt text instead of “figure-13.jpg,” which doesn’t help a user understand the content they are missing. Screen readers speak alt text without allowing users to speed up or skip, so make sure the information is descriptive but succinct.

According to the CFPB data visualization guide, which I helped create, good alt text includes: one sentence of what the chart is, including the chart type for users with limited vision who may only see part of it. It should also include a link to a CSV or other machine-readable data format so people with impaired vision can tab through the chart data with a screen reader.

If you frequently use data visualizations in Microsoft products, this guide will help you learn how to add alt text to them.
 

2. Employ a takeaway title

Research suggests that users read the title of the graph first. People also tend to just rephrase the title of graph when asked to interpret the meaning of the visualization. When the graph title includes the point, the cognitive load of understanding the chart decreases. Recently, when writing about how to effectively use words in graphs, Cole advocated for what she called a takeaway title. People know what to look for in the data when they read the graph takeaway first as part of the title.
 

3. Label data directly

One way to reduce the cognitive burden on users it to directly label your data rather than using legends. This is especially useful for colorblind or visually impaired users who may have difficulty matching colors within the plot to those in the legend. It also decreases the work of scanning back and forth trying to match the legend with the data.

Notice the difference in the following visual example. Compare how quickly you understand them and the relative ease of processing the one that is labeled directly.

 An example graph using legend vs. direct labeling

An example graph using legend vs. direct labeling

4. Check type and color contrast

In the data visualization community, we’ve started addressing how to design for red/green colorblindness. This is an issue for 8% of men and 0.5% of women with Northern European ancestry. However, we should also consider users with low vision and a variety of other conditions that affect vision. The W3 WCAG guidelines specify necessary contrast and text sizes for readability on screen. There are a number of tools to help you abide by these contrast and size standards. For example, in the following screenshot the top example passes color contrast standards, where the below version is too light and fails.

 Example of the for the  color palette contrast evaluation tool  WCAG standards

Example of the for the color palette contrast evaluation tool WCAG standards

Tools to help get you started:

5. Use white space

White space is your friend. When information is too densely packed, the graphic can feel overwhelming and unreadable. It can be helpful to leave a gap between sections of a chart. Judicious use of white space increases the legibility by helping to demarcate and distinguish between different sections without relying only on color. This can also supplement accessible color choices by helping users distinguish the difference between colors that identify separate sections.

In the following example, compare how easily you can distinguish the sections. Now imagine if you more difficulty distinguishing between the colors. The lines between them help you to visually identify each chunk.

 Example graph with no line and with a white divider line between categories

Example graph with no line and with a white divider line between categories

These are just a few things you can do to help everyone easily comprehend the graphs you create. You should strive to make sure that everyone, not just you or your ideal user, understands the point of the visualization. When you consider accessibility, you create a better product for everyone.

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

putting graph elements in cells directly

Today's short post focuses on an Excel tip that I find myself using frequently and I get asked about a ton. While I focus on Excel here, I imagine the same could apply when working in any spreadsheet application. For additional formatting flexibility, put some graph elements in the cells directly.

This is probably easiest to illustrate through an example. Below is a visual from storytelling with data: a data visualization guide for business professionals. It's highlighted in Chapter 6 (pg. 161), which focuses on what I consider to be model examples of data visualization. I regularly get questions about how this graph was created.

 Source: Knaflic, Cole.  Storytelling With Data: A Data Visualization Guide for Business Professionals , Wiley, © 2015. (Original figure had data labels on the stacked bars; I've omitted those here for simplicity.)

Source: Knaflic, Cole. Storytelling With Data: A Data Visualization Guide for Business Professionals, Wiley, © 2015. (Original figure had data labels on the stacked bars; I've omitted those here for simplicity.)

There are two common questions posed about this visual:

  1. How did you format some of the y-axis labels differently from the rest? 
  2. How did you include the TOTAL % in the graph?

The trick in both of these cases is the same: these elements aren't part of the graph directly, rather they are in individual cells in Excel. In this particular graph, the only things directly in the graph element are the stacked bars. Everything else is done in cells. I do this for additional formatting flexibility. In Excel, you can only apply a single format to axis labels on a given axis. You also have to get creative (and it can sometimes be limiting) if you want to add additional numbers or labels to the visual that aren't part of the data you're visualizing directly. Using the cells allows me to break free from both of these constraints.

In case it's easier to see what I'm talking about, below is what the visual looks like if I show the gridlines in Excel (in the preceding view, all of my cells were filled with white).

FIG0605 with gridlines.png

When approaching this way, you do have to take care to ensure that everything is lined up correctly. This means precisely aligning the graph with the cells and making both cell and graph heights line up (otherwise your bars won't correctly align with the words and numbers). Also, be aware that if you shuffle your data around, you'll need to adjust the elements you've put into the cells directly. Changing the order of your data would change the graph in this case, but not the PRIORITY or TOTAL %, which would have to be done manually (unless you set up so it's pointing to the data you reshuffle). Both of these things lead me to express a word of caution: when I break the pieces apart like this, it's easier to introduce issues and have things not line up: both from a visual standpoint as well for data and labels to get out of sync. Because of this, the tradeoffs may not be worth it if you're trying to automate or it's a repeated process that you don't want to have to adjust each time. But so long as you're careful and pay attention to detail, when you need the additional formatting flexibility, consider putting some elements into cells directly.

If it's of interest, you can download my Excel workbook.

Are there other graphs from storytelling with data that you have questions about or would like a behind-the-scenes peek? Or other tricks you employ when graphing data that you'd like to share? Leave a comment!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

an alternative to treemaps

 
 

We recently worked with a client who was using treemaps in many of their visuals and the client questioned whether there was a more effective way to visualize their data. Today’s post illustrates some pros and cons of using treemaps, plus possible alternatives.

Treemaps are increasingly being included in most data visualization tools (including the latest Excel 2016 version, grouped under “Hierarchy charts”). The treemap was originally designed to visualize a very large amount of data in a hierarchical, tree-structured diagram where the size of the rectangles organized from largest to smallest. Color is used to encode a second dimension. Today, they’re often used generally for categorical data. For further reading on treemaps, check out these posts from Jeffrey Shaffer and Robert Kosara.

appropriate use cases for treemaps

Treemaps can work well if your data falls into this scenario:

  1. You want to visualize a part-to-whole relationship amongst a large number of categories.
  2. Precise comparisons between categories is not important.
  3. The data is hierarchical.

Steve Wexler provides a nice example of why a treemap was more effective than a bar chart when visualizing the Electoral College votes from the 2012 election in this post.

the challenge with treemaps

While treemaps may seem like a sexy choice for visualizing data, they’re often used when another chart type would serve the data better. The limitation is that when we’re encoding data with area and intensity of color, our eyes aren’t great a detecting relatively minor differences in either of these dimensions. If our data is such that our audience needs to make precise comparisons between categories, it’s even more cumbersome when the categories aren’t aligned to a common baseline. We should never make our audience do more work than necessary to understand a graph!

Bottom line: treemaps do have a use case however consider reaching for something else when your audience needs to be able to make specific comparisons. Andy Kriebel provides a great litmus test for the effectiveness of a treemap in this this post: if it takes longer than 2 seconds to compare categories, go with a simpler design.

a real-world makeover

Back to our client: their treemap looked similar to the one below (I’ve anonymized the data to protect confidentiality). The intent was to compare returns activity (measured by volume of dollars claimed and number of returns) for their customer base where % of returns is encoded by the size of the rectangles while % of dollars is encoded by color - both in differences in hue and intensity.

 
original treemap.png
 

I can make a few general conclusions from this visual: three customers (A, B, and C) are roughly half of the return activity (visually estimated based on the size of the rectangles) and these same three are also pretty drastically different in terms of dollars claimed (judged by the intensity of color—although without a color legend it’s difficult to discern much more than that).

I’d argue that this visual might work well in an exploratory phase of analysis, where the analyst is sorting through the data to find what might be interesting to stakeholders. However, it doesn’t work so well when communicating what’s interesting in this data.  

What if I asked you which are the top 10 customers by dollars claimed? Or what’s the exact difference between Customer A and B’s % of returns? We can see how easily a treemap breaks down when trying to make these comparisons.

One possible solution is a bar chart. Below, I’ve displayed all customers’ returns and dollars claimed horizontally. This lends itself well to comparing across categories and we can easily see that four customers comprise the majority of return activity.

 
allcustomers.png
 

When communicating this data, we can help our audience understand what’s important by focusing only on the top four customers. This condensed view could lend itself well to a discussion towards action on these top four customers.

 
top4.png
 

To address the remaining customers, I could provide the entire bar chart for additional detail if the audience needs it. I might also choose to add the other customers with a footnote (as shown) or group them into an “All other” category. The decision point here would be determined by the level of detail my audience needs.

Taking this a step further, we can introduce color strategically as visual cue, paired with some explanatory text, to prompt our audience to act on what this data may suggest. 

 
final solution.PNG
 

For an additional example, Stephen Few provides a similar bar chart solution in this before & after post.

For a deeper look at the graphs, you can download the accompanying Excel file. What do you think? Are there situations where you’ve seen a treemap more effective than an alternate solution? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the storytelling with data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.