Do you find yourself needing to communicating with qualitative data? This post discusses three best practices when communicating with qualitative data—effectively using color, reducing text and considering if audience needs quantitative context—and illustrates through example.Read More
When presenting live, you have a ton of opportunity to build a graph or a story piece by piece for your audience. Check out the 90-second video in this post illustrating an example of how we do this at storytelling with data.Read More
We often have some measure of uncertainty in our data—a forecast, prediction or range of possible values. A common challenge is how to visualize that uncertainty and help our audience understand the implications. In today’s post, I’ll use a real-world example to illustrate one approach and share tactics for creating in Excel.
The client’s original visual looked similar to the one below. It shows 2017 earnings per share (EPS) and the forecast outlook for the next four years. The client used a CAGR to forecast a range of possible EPS values from 2018 - 2021.
At first glance, it wasn’t obvious that the blue bars represented a forecast (even with the x-axis labeling of “E” for expected). The first yellow bar represents the 2017 actual EPS and next four blue bars are the forecast for 2018 - 2021 where the solid section represents the midpoint and the data labels is the uncertain piece—the range of projected values.
I made a few design changes to make the graph a little easier to interpret. I first changed the bars to lines and used a dotted line for 2018 - 2021 with unfilled data markers to help visually reinforce the uncertainty.
In Excel, there are two potential ways to achieve this formatting. A brute-force approach is to use a single data series and format each individual data point as a dotted line. Another approach is to graph two separate data series, one as a solid point or line and the second as a dashed line or unfilled circle, with a point of overlap to make the lines connect. You can read more detail about these two approaches in this prior post.
We often face the decision of preserving the y-axis vs. labeling data directly. I’ve done the latter in the visual below. One consideration in this decision point is the level of specificity your audience needs: are the actual values important? Or is the overall shape of the data more important? You can read more about these considerations in this prior post.
Next, let’s revisit how to show the range of forecast values. The original visual is shown again below where the forecast EPS values are represented by the data labels on top of the bars.
Rather than leave the audience with the highly taxing processing of reading these values, we can aid interpretation by instead depicting the forecast as a shaded range around the point estimate. This keeps the emphasis on the midpoints, while reducing clutter and eliminating the additional work the audience has to do. If the specific forecast values are important to the audience, we’ll deal with that momentarily.
The brute-force Excel method to adding this grey band requires a little math, graphing a second data series as a stacked bar and then formatting the stacked bar so that the bottom section renders white and the top section grey. You can download the accompanying Excel file to see how I accomplished this.
But the visual is not yet complete. We should take the opportunity to add value to this data by telling the intended audience what they should know. Let’s assume this is a positive story where the outlook from the original base year (2016) has been extended to 2018. I might add explanatory text, paired with strategic use of color (I chose green to depict positivity) to focus attention on the relevant points of the data. If specific forecast EPS values are important for a given year, I could include them for context in the text. For a very technical audience, I might include even more detail with the statistics around the forecast. Just a reminder to always design with the audience’s needs in mind!
Are you aware of other methods to achieve this effect? Have you seen other examples of uncertainty depicted effectively or tips you’d like to share? Leave a comment with your thoughts!
We recently worked with a client who was using treemaps in many of their visuals and the client questioned whether there was a more effective way to visualize their data. Today’s post illustrates some pros and cons of using treemaps, plus possible alternatives.
Treemaps are increasingly being included in most data visualization tools (including the latest Excel 2016 version, grouped under “Hierarchy charts”). The treemap was originally designed to visualize a very large amount of data in a hierarchical, tree-structured diagram where the size of the rectangles organized from largest to smallest. Color is used to encode a second dimension. Today, they’re often used generally for categorical data. For further reading on treemaps, check out these posts from Jeffrey Shaffer and Robert Kosara.
appropriate use cases for treemaps
Treemaps can work well if your data falls into this scenario:
- You want to visualize a part-to-whole relationship amongst a large number of categories.
- Precise comparisons between categories is not important.
- The data is hierarchical.
Steve Wexler provides a nice example of why a treemap was more effective than a bar chart when visualizing the Electoral College votes from the 2012 election in this post.
the challenge with treemaps
While treemaps may seem like a sexy choice for visualizing data, they’re often used when another chart type would serve the data better. The limitation is that when we’re encoding data with area and intensity of color, our eyes aren’t great a detecting relatively minor differences in either of these dimensions. If our data is such that our audience needs to make precise comparisons between categories, it’s even more cumbersome when the categories aren’t aligned to a common baseline. We should never make our audience do more work than necessary to understand a graph!
Bottom line: treemaps do have a use case —however consider reaching for something else when your audience needs to be able to make specific comparisons. Andy Kriebel provides a great litmus test for the effectiveness of a treemap in this this post: if it takes longer than 2 seconds to compare categories, go with a simpler design.
a real-world makeover
Back to our client: their treemap looked similar to the one below (I’ve anonymized the data to protect confidentiality). The intent was to compare returns activity (measured by volume of dollars claimed and number of returns) for their customer base where % of returns is encoded by the size of the rectangles while % of dollars is encoded by color - both in differences in hue and intensity.
I can make a few general conclusions from this visual: three customers (A, B, and C) are roughly half of the return activity (visually estimated based on the size of the rectangles) and these same three are also pretty drastically different in terms of dollars claimed (judged by the intensity of color—although without a color legend it’s difficult to discern much more than that).
I’d argue that this visual might work well in an exploratory phase of analysis, where the analyst is sorting through the data to find what might be interesting to stakeholders. However, it doesn’t work so well when communicating what’s interesting in this data.
What if I asked you which are the top 10 customers by dollars claimed? Or what’s the exact difference between Customer A and B’s % of returns? We can see how easily a treemap breaks down when trying to make these comparisons.
One possible solution is a bar chart. Below, I’ve displayed all customers’ returns and dollars claimed horizontally. This lends itself well to comparing across categories and we can easily see that four customers comprise the majority of return activity.
When communicating this data, we can help our audience understand what’s important by focusing only on the top four customers. This condensed view could lend itself well to a discussion towards action on these top four customers.
To address the remaining customers, I could provide the entire bar chart for additional detail if the audience needs it. I might also choose to add the other customers with a footnote (as shown) or group them into an “All other” category. The decision point here would be determined by the level of detail my audience needs.
Taking this a step further, we can introduce color strategically as visual cue, paired with some explanatory text, to prompt our audience to act on what this data may suggest.
For an additional example, Stephen Few provides a similar bar chart solution in this before & after post.
For a deeper look at the graphs, you can download the accompanying Excel file. What do you think? Are there situations where you’ve seen a treemap more effective than an alternate solution? Leave a comment with your thoughts!
After reading storytelling with data or participating in a workshop, people often ask how they can incorporate the lessons into a recurring (i.e. monthly, quarterly) report. These reports often materialize as a PowerPoint deck, which started sparsely, but over time has taken on a life of its own and now resembles the “slideument”: part presentation, part document but not exactly either at its best.
Consider the slide below, which is based on an actual slide from a recent client workshop. (I’ve anonymized the client’s data to preserve confidentiality.) Today’s post demonstrates how to apply data storytelling lessons to a visual from a monthly deck, illustrating the thought process to improve it.
This slide shows a monthly trend of customer service complaints: in total (top chart) and broken down by category (bottom chart). The commentary section tells us (the audience) what the important points of reference are: what happened this month compared to last month (complaints are up 14%), where it changed (Employees) and their proposed next steps. However, notice how much work takes to read through all this text and then find evidence of this in the graphs.
Imagine if you were given this slide to determine an action plan. If you were in a live meeting, would you be able to read all of this text and listen to the presenter at the same time? If you weren’t in the meeting and were reading through the deck, how much time would you realistically spend trying to digest the information presented? We can improve on this visual in both scenarios with a few design changes.
In both cases, I used the commentary as a guidepost for the important takeaways and re-designed the visuals accordingly.
First, let’s a closer look at the top chart. The commentary tells us that complaints were up 14% vs the prior month.
Where did your eyes go first in this graph? Mine went to the red Average line, which I visually estimated to be about 410 per month. In looking for evidence of the 14% increase in December, I had to do a lot of mental math (add the Solicited + Unsolicited for November and compare it to Solicited + Unsolicited for December) which took more time than someone would likely spend doing this.
If that 14% increase is what the audience should know, check out the difference between the original visual and this:
When applying the “where are your eyes drawn?” test, my eye went straight to the data markers & labels at the end of the total line, where I could see both the absolute numbers and annotations telling me it’s a 14% increase. Since we’re visualizing time, I changed the graph type from a bar chart to a line chart, unstacked the data series, and added a series for the total. This was intentional based on the commentary, which only referenced the total trend. I chose to de-emphasize the subcomponent pieces (Unsolicited and Solicited) by using grey.
Side note: what about the Average line? If the monthly deviation from average was really important, one option would be to keep it in the graph for reference with the tradeoff that adding a fourth data series could create clutter. Another option is an entirely different choice of visual, depicting the monthly change (from average), with a visual cue to indicate that December’s data is acceptable. Both are choices the information designer would make knowing the audience and what context is relevant. In this case, I didn’t feel that this additional point added anything to the overarching story, so I chose to eliminate it altogether.
Let’s take another look at the second visual now. The commentary tells us that complaints were up in a specific category: Employees. Not only did they increase, but they increased from 87 to 117. Apply the “where are your eyes drawn?” test again with the original visual.
If I took an informal poll of readers here, some might have gone to the black line, others might have noticed the blue list first and others (like me) went to the red line. Regardless of which line you focused on first, I’d likely bet that you didn’t focus first on the November to December increase in the Operations line (red). In fact, it’s difficult to discern the absolute numbers (87 and 117) here because of the general clutter: overlapping data series, gridlines, color, heavy chart border and legend at the bottom requiring some visual work to figure out which line goes with which complaint category.
When setting out to improve a visual, there’s not necessarily a right or wrong answer in choosing a visual type: it often takes looking at the same data several different ways to find which view is going to create that magical “lightbulb” moment. Let's look at a few different variations of this visual.
First let’s keep the existing line chart, remove some of the clutter and focus attention on the November to December change in Employees.
This view gives the audience the full context of the 12 month trend, while focusing attention strategically on a specific point. However, if the emphasis is really about the November to December change, we could also visualize only those two data points. Let’s look at a few different ways of displaying this.
First, this horizontal bar chart compares this month (December) to last month (November). Horizontal bar charts are useful when your category names are long and therefore can be displayed horizontally from left to right on the y-axis without having to rotate or shorten them.
Another option is a vertical bar chart, if you’re more inclined to preserve the left-to-right construct of displaying time.
As a third option, we could use a slopegraph. Slopegraphs can work well in making change visually apparent across categories. Check out how clear it is that some of these categories changed more drastically than others. In fact, looking at the data this way, we see that there was also a marked increase in service-related complaints, something that didn't stand out as much in the other views of the data. You can read more about slopegraphs, including design considerations, in this previous post.
Any of these three visuals could work for depicting this data, I chose the slopegraph for the final version to keep the emphasis on the change in the two data points.
Here's what it could look like if all of this needed to be on a single slide:
In the remade version, I’ve moved the text to be closer to the data it describes and used color strategically to create a visual link between the text and where to look in the graphs for evidence. I’ve also made the call to action more visible—remember when communicating with data for explanatory purposes, we should always want our audience to do something with the data we’re showing them!
Check out the difference between the original and the remade version:
This single view works well as a remake of the original, but not as well in a live presentation. There’s still too much text to read and process, while listening to a presenter at the same time. For a live setting we can still use the same visuals, but build piece by piece (using animation), which forces the audience to listen to the presenter describing the data. For example, consider the Complaints over time visual again:
Now imagine if each of these images were its own slide. Sparse slides lead to better presentations because a person is there to narrate what’s happening.
One final note on the choice of red as the emphasis color. Some readers may be surprised to see something different from our usual blue & orange as emphasis colors (and readers who are Michigan fans are probably having heart palpitations!). In this case, red was the client’s brand color so we chose to stay consistent with the rest of their visuals. If that weren’t the case, we might avoid red because it could a negative connotation, even though this is a somewhat positive story (complaints declining over time).
In conclusion, we can indeed incorporate visual cues such as strategic use of color and words into a monthly recurring presentation so that our audience clearly knows 1) what’s important and 2) what action to take. You can download the Excel file with accompanying visuals here.
I don't use a lot of area graphs. But I found myself pausing on one that was submitted as part of the recent annotated line graph #SWDchallenge. It was created by Mike M. and the interesting thing to me was that the focus of this particular area graph wasn't on the area so much, but rather on the line that separated the areas.
This apparently stuck with me, because I found myself recommending a similar approach in a recent client makeover.
The original graph looked something like the following (data has been modified to protect confidentiality):
This is collections data from a bank. In case you aren't familiar with how collections work, typically an automated dialer makes calls to overdue accounts. The grey bars above represent total dials made. When someone answers the phone on the other end, the dialer connects them to a collections agent, who talks to the person who hasn't paid their bill and tries to get them to make a payment. The accounts where a person is reached (a collections agent talks to someone) are considered to be "worked," which is what the teal bars above represent. The penetration ratio, depicted by the black line, is...hmm. What is a penetration ratio exactly? This one threw me. I'm familiar with penetration rate, which would be the proportion of accounts that were worked out of the total dialed. So in other words, if penetration rate is 33%, we worked a third of the accounts. The ratio seems less straightforward. I think to describe it, it would be something like "if the penetration ratio is 3, it means we dialed 3x more accounts than we talked to." This seems unnecessarily complicated. Let's see if we can make some changes to how we show this data to make it more straightforward. Oh, and let's use that cool idea that I picked up from Mike M, too.
First, I'm going to remove the secondary y-axis on the right side of the graph and the data (Penetration Ratio) that goes with it. That gets us a simple two-series bar chart:
In the above, we see accounts worked (teal) and total dials made (grey). Dials made is the sum of accounts that were worked and those that weren't reached. So I'm going to change this data slightly—from dials made in grey to those not reached—and stack the bars on top of each other.
We can get the same information out of the view above as the previous one: we can see total dials made (overall height of bars) and within that, the portion that were worked and the portion that were not reached. Notice that because worked series is on the bottom of the stack, we can easily see how it has varied over time. Total dials made have decreased over time, so has the number of accounts we've worked. But are we working a lower proportion of total dials now than we have historically? It's hard to tell here. Let's shift to 100% view to answer that question:
With the 100% stacked bar, we lose the context that overall call volume (total dials made) has decreased over time. But that's ok, because we know it now, so we can state it in words: "Call volume decreased 47% over the course of the year." With the 100% view, we can see that the proportion of accounts that we are working has decreased recently. So in spite of reduced call volume, we are reaching a lower proportion of accounts. Interesting. Perhaps we can make that a little easier to see?
Let's remove the space between the bars and turn this into an area graph:
Bingo! With this view, we can see the proportion of accounts that were worked out of the total dialed. The white line separating the teal from the grey now represents the penetration rate. We can make this clear by adding some text and calling out the most recent data point:
I might add a headline that says something like, "Despite decreasing call volume, penetration rate hit a 12-month low in December." And like that, we've used an area graph to highlight a line.
What do you think? Do you like this approach? What might you do differently? Where else could an approach like this work? Leave a comment with your thoughts!
You can download the Excel file with the above visuals.