March dataviz madness: table vs graph

March madness is here—this three-week period when college basketball fever sweeps the States on the path to crowning the NCAA national champion. We’re pulled into the drama and tension of a single elimination tournament (who will emerge as the Cinderella team to upset a No. 1 seed?) and the stakes are high for teams: one sub-par performance and you’re out.  

When it comes to communicating with data, the stakes can also be high. Maybe not quite as ruthless as a single elimination tournament (one ineffective graph usually doesn’t mean our season is over) but a subpar visual might mean a missed opportunity for our audience to make a data-driven decision.

In data visualization, well-designed visuals are buzzer beating 3-pointers: they capture our attention because they get the main point across quickly and effectively. In today’s post, we’ll look at a dataviz match-up: will it be the table or the graph for communicating an underlying message?

Imagine you’ve encountered the following table: either in a live setting (someone has shown this on a PowerPoint slide) or own your own (said PowerPoint slide has been emailed to you).

 
table.png
 

What’s your initial reaction to this much data? If you’re like me, you’d probably groan and move on, totally disregarding all the hard work that was done behind the scenes to produce this table. Ouch.

When deciding whether to use a table or a graph, consider what the audience needs to do with the data: Do they need a certain level of detail? Are there different units of measure that need to be relayed together? Will they need to refer to a specific line of interest or compare things one by one? If yes, then a table may meet those needs. However, if there’s an overarching message or story in the data, think about making it visual for your audience.

Back to our match-up—imagine that the underlying story is that in recent years, packaging costs have increased at a higher rate and are projected to exceed budget at the end of the fiscal year. Refer back to the tabular data—how long does it take you to find the data that supports this?

Contrast that time-consuming process with the visual below, where I’ve visualized the relevant pieces and added explanatory text and focus through sparing color to make the data more accessible:

 
after visual.png
 

So what is the appropriate use case for a table? When your audience needs detail on specific values or when you have multiple units of measure to report simultaneously. In my previous roles, we used tables frequently in monthly status meetings when the main goal was for participants to give updates on their lines of business and participants wanted to be able to go row by row (or column by column) and refer to specific lines of data. Over time we realized many of these tables weren’t being used and we’d push them to the appendix—they remained there for reference but weren’t competing for attention with the main takeaways.

While we won’t know who wins it all in March Madness until the national championship on April 8, in this match-up we can choose a clear winner: the graph!

In fact, the graph will typically win when there’s an overarching message in the data. A well-designed graph simply gets that information across more quickly than a well-designed table. Don’t make your audience do more work than necessary to understand the data!

For more examples of how to consider if a table is more effective than a graph, check out our previous posts:


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

three tips for storytelling with qualitative data

Do you find yourself needing to communicating with qualitative data? This post discusses three best practices when communicating with qualitative data—effectively using color, reducing text and considering if audience needs quantitative context—and illustrates through example.

Read More

animating data

When presenting live, you have a ton of opportunity to build a graph or a story piece by piece for your audience. Check out the 90-second video in this post illustrating an example of how we do this at storytelling with data.

Read More

visualizing uncertainty

We often have some measure of uncertainty in our data—a forecast, prediction or range of possible values. A common challenge is how to visualize that uncertainty and help our audience understand the implications. In today’s post, I’ll use a real-world example to illustrate one approach and share tactics for creating in Excel.

The client’s original visual looked similar to the one below. It shows 2017 earnings per share (EPS) and the forecast outlook for the next four years. The client used a CAGR to forecast a range of possible EPS values from 2018 - 2021.  

 
Picture1.png
 

At first glance, it wasn’t obvious that the blue bars represented a forecast (even with the x-axis labeling of “E” for expected). The first yellow bar represents the 2017 actual EPS and next four blue bars are the forecast for 2018 - 2021 where the solid section represents the midpoint and the data labels is the uncertain piece—the range of projected values.  

I made a few design changes to make the graph a little easier to interpret. I first changed the bars to lines and used a dotted line for 2018 - 2021 with unfilled data markers to help visually reinforce the uncertainty.

 
Picture4.png
 

In Excel, there are two potential ways to achieve this formatting. A brute-force approach is to use a single data series and format each individual data point as a dotted line. Another approach is to graph two separate data series, one as a solid point or line and the second as a dashed line or unfilled circle, with a point of overlap to make the lines connect. You can read more detail about these two approaches in this prior post.

We often face the decision of preserving the y-axis vs. labeling data directly. I’ve done the latter in the visual below. One consideration in this decision point is the level of specificity your audience needs: are the actual values important? Or is the overall shape of the data more important? You can read more about these considerations in this prior post.

 
 

Next, let’s revisit how to show the range of forecast values. The original visual is shown again below where the forecast EPS values are represented by the data labels on top of the bars.

 
 

Rather than leave the audience with the highly taxing processing of reading these values, we can aid interpretation by instead depicting the forecast as a shaded range around the point estimate. This keeps the emphasis on the midpoints, while reducing clutter and eliminating the additional work the audience has to do. If the specific forecast values are important to the audience, we’ll deal with that momentarily.

 
 

The brute-force Excel method to adding this grey band requires a little math, graphing a second data series as a stacked bar and then formatting the stacked bar so that the bottom section renders white and the top section grey. You can download the accompanying Excel file to see how I accomplished this.

 
Picture6.png
 

But the visual is not yet complete. We should take the opportunity to add value to this data by telling the intended audience what they should know. Let’s assume this is a positive story where the outlook from the original base year (2016) has been extended to 2018. I might add explanatory text, paired with strategic use of color (I chose green to depict positivity) to focus attention on the relevant points of the data. If specific forecast EPS values are important for a given year, I could include them for context in the text. For a very technical audience, I might include even more detail with the statistics around the forecast. Just a reminder to always design with the audience’s needs in mind!

 
Picture5.png
 

 

Are you aware of other methods to achieve this effect? Have you seen other examples of uncertainty depicted effectively or tips you’d like to share? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

an alternative to treemaps

 
 

We recently worked with a client who was using treemaps in many of their visuals and the client questioned whether there was a more effective way to visualize their data. Today’s post illustrates some pros and cons of using treemaps, plus possible alternatives.

Treemaps are increasingly being included in most data visualization tools (including the latest Excel 2016 version, grouped under “Hierarchy charts”). The treemap was originally designed to visualize a very large amount of data in a hierarchical, tree-structured diagram where the size of the rectangles organized from largest to smallest. Color is used to encode a second dimension. Today, they’re often used generally for categorical data. For further reading on treemaps, check out these posts from Jeffrey Shaffer and Robert Kosara.

appropriate use cases for treemaps

Treemaps can work well if your data falls into this scenario:

  1. You want to visualize a part-to-whole relationship amongst a large number of categories.
  2. Precise comparisons between categories is not important.
  3. The data is hierarchical.

Steve Wexler provides a nice example of why a treemap was more effective than a bar chart when visualizing the Electoral College votes from the 2012 election in this post.

the challenge with treemaps

While treemaps may seem like a sexy choice for visualizing data, they’re often used when another chart type would serve the data better. The limitation is that when we’re encoding data with area and intensity of color, our eyes aren’t great a detecting relatively minor differences in either of these dimensions. If our data is such that our audience needs to make precise comparisons between categories, it’s even more cumbersome when the categories aren’t aligned to a common baseline. We should never make our audience do more work than necessary to understand a graph!

Bottom line: treemaps do have a use case however consider reaching for something else when your audience needs to be able to make specific comparisons. Andy Kriebel provides a great litmus test for the effectiveness of a treemap in this this post: if it takes longer than 2 seconds to compare categories, go with a simpler design.

a real-world makeover

Back to our client: their treemap looked similar to the one below (I’ve anonymized the data to protect confidentiality). The intent was to compare returns activity (measured by volume of dollars claimed and number of returns) for their customer base where % of returns is encoded by the size of the rectangles while % of dollars is encoded by color - both in differences in hue and intensity.

 
original treemap.png
 

I can make a few general conclusions from this visual: three customers (A, B, and C) are roughly half of the return activity (visually estimated based on the size of the rectangles) and these same three are also pretty drastically different in terms of dollars claimed (judged by the intensity of color—although without a color legend it’s difficult to discern much more than that).

I’d argue that this visual might work well in an exploratory phase of analysis, where the analyst is sorting through the data to find what might be interesting to stakeholders. However, it doesn’t work so well when communicating what’s interesting in this data.  

What if I asked you which are the top 10 customers by dollars claimed? Or what’s the exact difference between Customer A and B’s % of returns? We can see how easily a treemap breaks down when trying to make these comparisons.

One possible solution is a bar chart. Below, I’ve displayed all customers’ returns and dollars claimed horizontally. This lends itself well to comparing across categories and we can easily see that four customers comprise the majority of return activity.

 
allcustomers.png
 

When communicating this data, we can help our audience understand what’s important by focusing only on the top four customers. This condensed view could lend itself well to a discussion towards action on these top four customers.

 
top4.png
 

To address the remaining customers, I could provide the entire bar chart for additional detail if the audience needs it. I might also choose to add the other customers with a footnote (as shown) or group them into an “All other” category. The decision point here would be determined by the level of detail my audience needs.

Taking this a step further, we can introduce color strategically as visual cue, paired with some explanatory text, to prompt our audience to act on what this data may suggest. 

 
final solution.PNG
 

For an additional example, Stephen Few provides a similar bar chart solution in this before & after post.

For a deeper look at the graphs, you can download the accompanying Excel file. What do you think? Are there situations where you’ve seen a treemap more effective than an alternate solution? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the storytelling with data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

“how do I incorporate visual design into our monthly deck?”

After reading storytelling with data or participating in a workshop, people often ask how they can incorporate the lessons into a recurring (i.e. monthly, quarterly) report. These reports often materialize as a PowerPoint deck, which started sparsely, but over time has taken on a life of its own and now resembles the “slideument”: part presentation, part document but not exactly either at its best.

Consider the slide below, which is based on an actual slide from a recent client workshop. (I’ve anonymized the client’s data to preserve confidentiality.) Today’s post demonstrates how to apply data storytelling lessons to a visual from a monthly deck, illustrating the thought process to improve it.

Picture1.png

This slide shows a monthly trend of customer service complaints: in total (top chart) and broken down by category (bottom chart). The commentary section tells us (the audience) what the important points of reference are: what happened this month compared to last month (complaints are up 14%), where it changed (Employees) and their proposed next steps. However, notice how much work takes to read through all this text and then find evidence of this in the graphs.

Imagine if you were given this slide to determine an action plan. If you were in a live meeting, would you be able to read all of this text and listen to the presenter at the same time? If you weren’t in the meeting and were reading through the deck, how much time would you realistically spend trying to digest the information presented? We can improve on this visual in both scenarios with a few design changes.

In both cases, I used the commentary as a guidepost for the important takeaways and re-designed the visuals accordingly.

First, let’s a closer look at the top chart. The commentary tells us that complaints were up 14% vs the prior month.

Picture2.png
 

Where did your eyes go first in this graph? Mine went to the red Average line, which I visually estimated to be about 410 per month.  In looking for evidence of the 14% increase in December, I had to do a lot of mental math (add the Solicited + Unsolicited for November and compare it to Solicited + Unsolicited for December) which took more time than someone would likely spend doing this.

If that 14% increase is what the audience should know, check out the difference between the original visual and this:

Picture3.png
 

When applying the “where are your eyes drawn?” test, my eye went straight to the data markers & labels at the end of the total line, where I could see both the absolute numbers and annotations telling me it’s a 14% increase. Since we’re visualizing time, I changed the graph type from a bar chart to a line chart, unstacked the data series, and added a series for the total. This was intentional based on the commentary, which only referenced the total trend. I chose to de-emphasize the subcomponent pieces (Unsolicited and Solicited) by using grey.

Side note: what about the Average line? If the monthly deviation from average was really important, one option would be to keep it in the graph for reference with the tradeoff that adding a fourth data series could create clutter. Another option is an entirely different choice of visual, depicting the monthly change (from average), with a visual cue to indicate that December’s data is acceptable. Both are choices the information designer would make knowing the audience and what context is relevant. In this case, I didn’t feel that this additional point added anything to the overarching story, so I chose to eliminate it altogether.

Let’s take another look at the second visual now. The commentary tells us that complaints were up in a specific category: Employees. Not only did they increase, but they increased from 87 to 117. Apply the “where are your eyes drawn?” test again with the original visual.

Picture4.png
 

If I took an informal poll of readers here, some might have gone to the black line, others might have noticed the blue list first and others (like me) went to the red line. Regardless of which line you focused on first, I’d likely bet that you didn’t focus first on the November to December increase in the Operations line (red).  In fact, it’s difficult to discern the absolute numbers (87 and 117) here because of the general clutter: overlapping data series, gridlines, color, heavy chart border and legend at the bottom requiring some visual work to figure out which line goes with which complaint category.

When setting out to improve a visual, there’s not necessarily a right or wrong answer in choosing a visual type: it often takes looking at the same data several different ways to find which view is going to create that magical “lightbulb” moment. Let's look at a few different variations of this visual.  

First let’s keep the existing line chart, remove some of the clutter and focus attention on the November to December change in Employees.

Picture5.png
 

This view gives the audience the full context of the 12 month trend, while focusing attention strategically on a specific point. However, if the emphasis is really about the November to December change, we could also visualize only those two data points. Let’s look at a few different ways of displaying this.

First, this horizontal bar chart compares this month (December) to last month (November). Horizontal bar charts are useful when your category names are long and therefore can be displayed horizontally from left to right on the y-axis without having to rotate or shorten them.

Picture6.png
 

Another option is a vertical bar chart, if you’re more inclined to preserve the left-to-right construct of displaying time.

 

As a third option, we could use a slopegraph. Slopegraphs can work well in making change visually apparent across categories. Check out how clear it is that some of these categories changed more drastically than others. In fact, looking at the data this way, we see that there was also a marked increase in service-related complaints, something that didn't stand out as much in the other views of the data. You can read more about slopegraphs, including design considerations, in this previous post.

Picture8.png
 

Any of these three visuals could work for depicting this data, I chose the slopegraph for the final version to keep the emphasis on the change in the two data points.

Here's what it could look like if all of this needed to be on a single slide:

Picture10.png

In the remade version, I’ve moved the text to be closer to the data it describes and used color strategically to create a visual link between the text and where to look in the graphs for evidence. I’ve also made the call to action more visible—remember when communicating with data for explanatory purposes, we should always want our audience to do something with the data we’re showing them!

Check out the difference between the original and the remade version:

Picture9.png
Picture10.png

This single view works well as a remake of the original, but not as well in a live presentation. There’s still too much text to read and process, while listening to a presenter at the same time. For a live setting we can still use the same visuals, but build piece by piece (using animation), which forces the audience to listen to the presenter describing the data. For example, consider the Complaints over time visual again:

Picture11.png
 

Now imagine if each of these images were its own slide. Sparse slides lead to better presentations because a person is there to narrate what’s happening.

Picture12.png
 

One final note on the choice of red as the emphasis color. Some readers may be surprised to see something different from our usual blue & orange as emphasis colors (and readers who are Michigan fans are probably having heart palpitations!). In this case, red was the client’s brand color so we chose to stay consistent with the rest of their visuals. If that weren’t the case, we might avoid red because it could a negative connotation, even though this is a somewhat positive story (complaints declining over time).  

In conclusion, we can indeed incorporate visual cues such as strategic use of color and words into a monthly recurring presentation so that our audience clearly knows 1) what’s important and 2) what action to take.  You can download the Excel file with accompanying visuals here


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the "so-what?" Connect with Elizabeth on LinkedIn or Twitter .