declutter! (and question default settings)

Decluttering is having a major moment.

Fans of Netflix’s Tidying Up with Marie Kondo have been inspired by guru Kondo’s Japanese-based method of clearing out the clutter in their homes. The benefits are huge. Devotees report living more peacefully and co-existing better with their partners. The key element? Actively working to identify and eliminate anything that doesn’t “spark joy.”

We can apply this same thought process to our data visualizations.

When it comes to clutter in our visuals, we challenge you to regularly examine what specific elements aren’t adding information. What’s making it harder for our audience to get at the data? When we identify and remove clutter from our visuals, the data stands out more.  

We’ve discussed this topic frequently. In this video, Cole provides five tips for how to avoid clutter in visuals; SWD book and workshops each have an entire section focused on decluttering. We don’t intend to create cluttered visuals—rather they often materialize when we don’t take a step back and question our tools’ default settings. Today’s post illustrates one such example and the benefit we can reap from decluttering.

I recently encountered a visualization similar to the following graph. This shows the percentage of babies born within a 24-hour period, broken down by day of the week (having welcomed a baby several months ago, all things maternity still linger in my various news feeds). I recognize this graph: it’s what happens when I put data into Excel and create a stacked bar chart with default settings.

 
Picture2.png
 

This caught my eye not because of the topic but because of how much time it took me to figure out what information it was trying to convey. What should I do with this? There’s a lot competing for my attention in this chart and distracting me from the data.

Spend a moment examining this graph and take note of which specific elements are challenging. Make a list: what might we eliminate or change to reduce cognitive burden?

I came up with eight specific design changes I would make. How does my list compare with yours?

  1. Remove the chart border as it isn’t adding informative value. Often, we use a border to differentiate parts of our slide/visual. In most cases, we can better set them apart with white space.

  2. Delete the gridlines. Will the audience be physically dragging their fingers across the y-axis to identify an exact value? If that level of specificity is important, label the data point(s) directly.

  3. Be sparing in use of data labels. Use them in cases where the exact values are important to the audience. Otherwise, remove and use the axis instead.

  4. Thicken the bars. While there are no hard and fast rules, the bars should be wider than the white space between them so we can more easily compare. In this case, the superfluous white space can be reduced.

  5. Title the axes appropriately. Exceptions are rare for omitting an axis or chart title. Don’t make the audience do work to figure out what they’re looking at, and instead make a habit of titling appropriately to enable the audience’s understanding before they get to the data. Let’s take two related steps here:

    1. Use a more descriptive y-axis title: Instead of the vague %, we can eliminate the guesswork and be more specific: % of total births. While we’re at it, let’s drop the unnecessary trailing zeroes from our y-axis labels.

    2. Clean up x-axis: Diagonally rotated text is slower to read. We can abbreviate the days of the week so they render horizontally. A super-category (such as Weekday or Weekend) could also simplify the process of taking in the information.

  6. Move the legend directly next to the data it describes. This alleviates the work of referring back and forth between the legend and the data.

  7. Use color sparingly. There are so many colors in this graph that our attention is scattered and it’s hard to focus on any one thing. Depending on what we want our audience to take from the graph, we can use color more effectively to focus attention on those pieces only.

  8. Add a takeaway title. Don’t assume that two different people looking at this same graph will walk away with the same conclusion. If there is a conclusion the audience should reach, we should state it in words with an effective takeaway title.

Each step seems relatively minor on its own, but check out the impact when I apply all eight steps simultaneously:

 
 

Now we can more easily see that babies delivered on a weekend are more likely to arrive during the early hours of the day (midnight - 6am), compared to weekday deliveries. Related note: this dataset didn’t include the absolute number of babies born each day. Ideally, we’d want that information for context, but for the purposes of this illustrative example, we’ll assume the numbers are large enough to accurately compare across days of the week.

By reducing clutter, the audience can use their precious brainpower to decide what potential actions might be warranted, rather than trying to figure out how to read the graph. Taking time to modify the default settings means we can focus on the data and the message.

In my case, I might have wanted to get some extra rest on the weekends as my due date approached! As it turned out, baby Henry arrived safe and sound among the 17% of Thursday babies born in the 12am-5:59am window.

UPDATE: You can download the file for a further look at how I tackled this in Excel.

For more on the power of decluttering, check out these prior posts:
Declutter this graph: an example of eliminating unnecessary elements
Minor changes, major impact
How to declutter in Excel (with tactical step-by-steps)


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

March dataviz madness: table vs graph

March madness is here—this three-week period when college basketball fever sweeps the States on the path to crowning the NCAA national champion. We’re pulled into the drama and tension of a single elimination tournament (who will emerge as the Cinderella team to upset a No. 1 seed?) and the stakes are high for teams: one sub-par performance and you’re out.  

When it comes to communicating with data, the stakes can also be high. Maybe not quite as ruthless as a single elimination tournament (one ineffective graph usually doesn’t mean our season is over) but a subpar visual might mean a missed opportunity for our audience to make a data-driven decision.

In data visualization, well-designed visuals are buzzer beating 3-pointers: they capture our attention because they get the main point across quickly and effectively. In today’s post, we’ll look at a dataviz match-up: will it be the table or the graph for communicating an underlying message?

Imagine you’ve encountered the following table: either in a live setting (someone has shown this on a PowerPoint slide) or own your own (said PowerPoint slide has been emailed to you).

 
table.png
 

What’s your initial reaction to this much data? If you’re like me, you’d probably groan and move on, totally disregarding all the hard work that was done behind the scenes to produce this table. Ouch.

When deciding whether to use a table or a graph, consider what the audience needs to do with the data: Do they need a certain level of detail? Are there different units of measure that need to be relayed together? Will they need to refer to a specific line of interest or compare things one by one? If yes, then a table may meet those needs. However, if there’s an overarching message or story in the data, think about making it visual for your audience.

Back to our match-up—imagine that the underlying story is that in recent years, packaging costs have increased at a higher rate and are projected to exceed budget at the end of the fiscal year. Refer back to the tabular data—how long does it take you to find the data that supports this?

Contrast that time-consuming process with the visual below, where I’ve visualized the relevant pieces and added explanatory text and focus through sparing color to make the data more accessible:

 
after visual.png
 

So what is the appropriate use case for a table? When your audience needs detail on specific values or when you have multiple units of measure to report simultaneously. In my previous roles, we used tables frequently in monthly status meetings when the main goal was for participants to give updates on their lines of business and participants wanted to be able to go row by row (or column by column) and refer to specific lines of data. Over time we realized many of these tables weren’t being used and we’d push them to the appendix—they remained there for reference but weren’t competing for attention with the main takeaways.

While we won’t know who wins it all in March Madness until the national championship on April 8, in this match-up we can choose a clear winner: the graph!

In fact, the graph will typically win when there’s an overarching message in the data. A well-designed graph simply gets that information across more quickly than a well-designed table. Don’t make your audience do more work than necessary to understand the data!

For more examples of how to consider if a table is more effective than a graph, check out our previous posts:


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

three tips for storytelling with qualitative data

Do you find yourself needing to communicating with qualitative data? This post discusses three best practices when communicating with qualitative data—effectively using color, reducing text and considering if audience needs quantitative context—and illustrates through example.

Read More

animating data

When presenting live, you have a ton of opportunity to build a graph or a story piece by piece for your audience. Check out the 90-second video in this post illustrating an example of how we do this at storytelling with data.

Read More

visualizing uncertainty

We often have some measure of uncertainty in our data—a forecast, prediction or range of possible values. A common challenge is how to visualize that uncertainty and help our audience understand the implications. In today’s post, I’ll use a real-world example to illustrate one approach and share tactics for creating in Excel.

The client’s original visual looked similar to the one below. It shows 2017 earnings per share (EPS) and the forecast outlook for the next four years. The client used a CAGR to forecast a range of possible EPS values from 2018 - 2021.  

 
Picture1.png
 

At first glance, it wasn’t obvious that the blue bars represented a forecast (even with the x-axis labeling of “E” for expected). The first yellow bar represents the 2017 actual EPS and next four blue bars are the forecast for 2018 - 2021 where the solid section represents the midpoint and the data labels is the uncertain piece—the range of projected values.  

I made a few design changes to make the graph a little easier to interpret. I first changed the bars to lines and used a dotted line for 2018 - 2021 with unfilled data markers to help visually reinforce the uncertainty.

 
Picture4.png
 

In Excel, there are two potential ways to achieve this formatting. A brute-force approach is to use a single data series and format each individual data point as a dotted line. Another approach is to graph two separate data series, one as a solid point or line and the second as a dashed line or unfilled circle, with a point of overlap to make the lines connect. You can read more detail about these two approaches in this prior post.

We often face the decision of preserving the y-axis vs. labeling data directly. I’ve done the latter in the visual below. One consideration in this decision point is the level of specificity your audience needs: are the actual values important? Or is the overall shape of the data more important? You can read more about these considerations in this prior post.

 
 

Next, let’s revisit how to show the range of forecast values. The original visual is shown again below where the forecast EPS values are represented by the data labels on top of the bars.

 
 

Rather than leave the audience with the highly taxing processing of reading these values, we can aid interpretation by instead depicting the forecast as a shaded range around the point estimate. This keeps the emphasis on the midpoints, while reducing clutter and eliminating the additional work the audience has to do. If the specific forecast values are important to the audience, we’ll deal with that momentarily.

 
 

The brute-force Excel method to adding this grey band requires a little math, graphing a second data series as a stacked bar and then formatting the stacked bar so that the bottom section renders white and the top section grey. You can download the accompanying Excel file to see how I accomplished this.

 
Picture6.png
 

But the visual is not yet complete. We should take the opportunity to add value to this data by telling the intended audience what they should know. Let’s assume this is a positive story where the outlook from the original base year (2016) has been extended to 2018. I might add explanatory text, paired with strategic use of color (I chose green to depict positivity) to focus attention on the relevant points of the data. If specific forecast EPS values are important for a given year, I could include them for context in the text. For a very technical audience, I might include even more detail with the statistics around the forecast. Just a reminder to always design with the audience’s needs in mind!

 
Picture5.png
 

 

Are you aware of other methods to achieve this effect? Have you seen other examples of uncertainty depicted effectively or tips you’d like to share? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

an alternative to treemaps

 
 

We recently worked with a client who was using treemaps in many of their visuals and the client questioned whether there was a more effective way to visualize their data. Today’s post illustrates some pros and cons of using treemaps, plus possible alternatives.

Treemaps are increasingly being included in most data visualization tools (including the latest Excel 2016 version, grouped under “Hierarchy charts”). The treemap was originally designed to visualize a very large amount of data in a hierarchical, tree-structured diagram where the size of the rectangles organized from largest to smallest. Color is used to encode a second dimension. Today, they’re often used generally for categorical data. For further reading on treemaps, check out these posts from Jeffrey Shaffer and Robert Kosara.

appropriate use cases for treemaps

Treemaps can work well if your data falls into this scenario:

  1. You want to visualize a part-to-whole relationship amongst a large number of categories.
  2. Precise comparisons between categories is not important.
  3. The data is hierarchical.

Steve Wexler provides a nice example of why a treemap was more effective than a bar chart when visualizing the Electoral College votes from the 2012 election in this post.

the challenge with treemaps

While treemaps may seem like a sexy choice for visualizing data, they’re often used when another chart type would serve the data better. The limitation is that when we’re encoding data with area and intensity of color, our eyes aren’t great a detecting relatively minor differences in either of these dimensions. If our data is such that our audience needs to make precise comparisons between categories, it’s even more cumbersome when the categories aren’t aligned to a common baseline. We should never make our audience do more work than necessary to understand a graph!

Bottom line: treemaps do have a use case however consider reaching for something else when your audience needs to be able to make specific comparisons. Andy Kriebel provides a great litmus test for the effectiveness of a treemap in this this post: if it takes longer than 2 seconds to compare categories, go with a simpler design.

a real-world makeover

Back to our client: their treemap looked similar to the one below (I’ve anonymized the data to protect confidentiality). The intent was to compare returns activity (measured by volume of dollars claimed and number of returns) for their customer base where % of returns is encoded by the size of the rectangles while % of dollars is encoded by color - both in differences in hue and intensity.

 
original treemap.png
 

I can make a few general conclusions from this visual: three customers (A, B, and C) are roughly half of the return activity (visually estimated based on the size of the rectangles) and these same three are also pretty drastically different in terms of dollars claimed (judged by the intensity of color—although without a color legend it’s difficult to discern much more than that).

I’d argue that this visual might work well in an exploratory phase of analysis, where the analyst is sorting through the data to find what might be interesting to stakeholders. However, it doesn’t work so well when communicating what’s interesting in this data.  

What if I asked you which are the top 10 customers by dollars claimed? Or what’s the exact difference between Customer A and B’s % of returns? We can see how easily a treemap breaks down when trying to make these comparisons.

One possible solution is a bar chart. Below, I’ve displayed all customers’ returns and dollars claimed horizontally. This lends itself well to comparing across categories and we can easily see that four customers comprise the majority of return activity.

 
allcustomers.png
 

When communicating this data, we can help our audience understand what’s important by focusing only on the top four customers. This condensed view could lend itself well to a discussion towards action on these top four customers.

 
top4.png
 

To address the remaining customers, I could provide the entire bar chart for additional detail if the audience needs it. I might also choose to add the other customers with a footnote (as shown) or group them into an “All other” category. The decision point here would be determined by the level of detail my audience needs.

Taking this a step further, we can introduce color strategically as visual cue, paired with some explanatory text, to prompt our audience to act on what this data may suggest. 

 
final solution.PNG
 

For an additional example, Stephen Few provides a similar bar chart solution in this before & after post.

For a deeper look at the graphs, you can download the accompanying Excel file. What do you think? Are there situations where you’ve seen a treemap more effective than an alternate solution? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the storytelling with data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.