how would you show this data?

While riding the subway from Manhattan to Brooklyn this morning (which I mention simply because that's not a sentence I get to say every day), I came across the following graph from the Economist, headlined, "Hurricanes in America have become less frequent."

Economist_hurricane.png

This graph gave me pause for a number of reasons. But rather than recount them here, I thought it would be interesting to turn this graph (and the data) over to you. If you were reporting on this data, how would you show it? What would your headline be?

Send your makeover and headline to makeover@storytellingwithdata.com (if you'd like, include a social media profile you'd like me to link to) by next Friday, 9/22, and I'll follow up with a blog post featuring the remakes I receive. I look forward to seeing what you come up with!

axis vs data labels

A common decision point that arises when creating a graph is whether to show the axis or label the data points directly. Today's post focuses on the things I think about when making this decision and illustrates a few possibilities via example.

Probably the most common approach you'll see is where the axis is shown directly, as in the following graph.

In the above graph, we have monthly spend, in millions of dollars, on the y-axis. The x-axis is time, with Jan-Apr depicting actual data and May forward are forecast figures.

Showing both axes directly is a good default approach. This works especially well if you want your audience to focus on the shape of the data without cluttering it up with a bunch of numbers.

That said, there are circumstances that might cause you to vary from this. One consideration is the level of specificity the audience needs to have with the individual data points. If the particular numbers themselves are important, I'd suggest labeling the data points directly. In this case, you can get rid of the y-axis altogether: it's no longer necessary and becomes redundant when the points are individually labeled.

While I wouldn't typically preserve the y-axis and label every single data point (due to the redundancy previously mentioned), there may be cases where it makes sense to keep the axis and label just some of the data directly. For example, if we thought our audience would want to know how much we've actually spent each month but for the forecast we just want people to focus on the general shape of the data that illustrates our expectations, I might do something like the following:

Notice how the data labels themselves can help to draw attention to certain data points. In preattentive attribute language, these are "added marks." They act as visual cues, helping direct our audience's attention within the graph. It's like the graph is saying, "Look audience, these numbers are so important that they are labeled directly so you know what the specific values are."

We could use this strategy with data labels to direct our audience's attention to another point in the graph:

Notice that you can't help but look at the $7.5M label and the data point that goes with it. In the above, I chose a different color to fill in the data marker and also to label the data point, so as not to inadvertently visually tie together the ACTUAL data with the Jun data point (by filling it with the same dark blue as the actual data). In this case, we are drawn quickly to the $7.5M figure in Jun, while none of the other figures are labeled directly. Rather, we see the general shape of the rest of the data as context, and if we want to know rough values, can estimate those using the y-axis at the left.

This is actually an example from a recent workshop (details and numbers changed to preserve confidentiality). My final graph looked similar to the following, where in addition to the data label, I added some explanatory text:

I should note that whether to show the axis or label data directly is a decision point in most (all?) graphs. I've illustrated using a line graph here, but the same thought process can be applied to vertical or horizontal bars, scatterplots, etc.

In general, when you are deciding whether to show the axis, label the data directly, or some combination of these things: consider how you want your audience to read the graph. What level of specificity do they need to have with the individual data points? Where do you want them to pay attention? Let the answers to these questions guide your thoughtful design.

If interested, you can download the Excel file with the above graphs.

the bullet graph

The following is a guest post written by Bill Dean. After a recent workshop, Bill shared with me his affinity for bullet graphs. I've never used one before—though can see the potential—so invited him to share his views and an example approach here. Bill leads an engineering and data science team at Microsoft that enables groups across the company to analyze and act on customer feedback at scale. He loves (and makes) both BBQ and data visualizations but hasn’t yet mixed the two. For more on Bill or to connect with him, check out LinkedIn or Twitter.

While some chart series stand on their own, it sometimes makes sense to allow them to stand alone-ish. That is, you want it to stand near some benchmark (e.g., closest competitor, last year’s performance, goals) without giving up too much of the spotlight. Enter “Bullet Graphs.” Bullet graphs were seemingly developed by Stephen Few, and provide a great way to gain information density without the cognitive load of some graph options. Bullet graphs have been around a while but are utilized much less frequently than simple 2-series bar charts or even malformed eye-candy gauges.  

For example, take a typical water bill (Figure 1, below, roughly replicated from my personal bill), which does a decent job at drawing your eyes to the current year’s data via a two-series bar chart. This is actually a good starting point as I’ve seen data like this represented as two pie charts. There’s a confusion factor lurking for users to misinterpret the chart due to its two-month groupings. At first glance, one might think that January is the white bar, February the black bar, and so on in some sort of tick-tock visual joke. Instead, we’re billed two months (e.g., January-February Unit) at a time and each period is the total for combined months in each year’s data. I’m going to start with the water bill as-is, and will adjust it towards a bullet graph through a path that’s familiar…especially for Cole’s readers.

Figure 1: Original Water Bill

Figure 1: Original Water Bill

While it’s a pretty good start, the white bars representing the “previous year’s consumption” series make my eyes bleed and I might want to help it stand out less by making those bars gray and removing the border. In Figure 2, both series still have sufficient contrast ratios from the background and each other, which is good for everyone…especially “low-vision” users. It will also help to move the legend up towards the top of the chart to inform users ahead of the visual.

Figure 2: Updated Water Bill with Gray

Figure 2: Updated Water Bill with Gray

If the exact numbers matter, I could consider adding the data labels on the inside end of the bars. Only do this if you can ensure there’s always enough vertical bar space for a contrasting text and sufficient width to accommodate reasonably large numbers.

Figure 3: Updated Water Bill with Inside Labels

Figure 3: Updated Water Bill with Inside Labels

Figure 3 feels like a natural improvement over the original chart (even though I want to remove the y-axis, I’ll leave it here for continuity). I could leave this alone, but I’m going to transform it into a simple bullet graph. Next, I’d like to put the previous year’s series behind the current year’s series so the direct comparison is more obvious. I can do this by clicking once into my first data series and making it “Secondary.” Because this will move them on top of each other, I’ll want the ensure the context (previous year series) is in the background, lighter in color (it already is), and wider.

Figure 4: Simple Bullet Graph

Figure 4: Simple Bullet Graph

Now, my chart looks different because both the primary and secondary axis are using ranges that make sense for each series independently. I’ll need to intervene and ensure they’re both set to be the same. Do this by right-clicking on each axis and the “Edit Series” icon to set the scale (in this case it’s 0 to 3500). I can also delete the axis on the right (or both, if I add the value as a data label). Dealer’s choice, really. For additional decluttering, I’ve opted to remove the line at the base of the bars. 

Figure 5: Simple Bullet Graph with Consistent Axes Range

Figure 5: Simple Bullet Graph with Consistent Axes Range

Now, I’ve got a chart that draws my attention to this year’s trend with the context of what my family consumed last year (Figure 5). There’s also a side benefit that the two-month format no longer confusing as they’re grouped and centered under both bars. It’s just as easy to have made this a horizontal bar chart instead (often better for longer labels and non-time groups). This is how that would look (also used data labels in lieu of the axis).

Figure 6: Horizontal Simple Bullet Graph

Figure 6: Horizontal Simple Bullet Graph

If my water company wanted to add a little more information and a little peer pressure, they might add a more complete bullet graph with zones of guilt and a small marker that represents my local neighborhood, a goal they set for me, or (in this case) last year’s consumption for the same period).

Figure 7: Single Bullet Graph

Figure 7: Single Bullet Graph

A full series with this information would be more informative.

We can start with the data table and walk through how to do it. We’ll start with some reference ranges that will add up to the full range of the chart you want to see and will create zones in the background for context.

Table 1: Monthly Consumption Data (in percent)

Table 1: Monthly Consumption Data (in percent)

Start, by highlighting the entire table in Excel and use the Insert tab to add a 100% stack chart. It will be hideous!

Figure 8. Default Stack Chart

Figure 8. Default Stack Chart

Excel draws the table in reverse order (the top cell value is at the bottom of each chart column with each subsequent value piled on top). This is important to know so you start with what you want in the background at the top of the table. 

The blue, orange, gray sequence for each period in Figure 8 is the Conservative, High, Extremely High data from my table, respectively. You’ll want to right click the bottom series and color that a dark gray, the orange series should be a noticeable bit lighter than that, and the gray series should be a few notches lighter than the previous value. Set the outline to “None” (or make it white to have the borders pop a little).

Figure 9. Stack Chart Mid-Formatting

Figure 9. Stack Chart Mid-Formatting

From here, you’ll want to right-click one of the series and adjust the gap width to 40% or so to ensure the bars are nice and wide. When you hover over untouched series, you’ll see that they are the current and previous year. Let’s go ahead and make those “secondary” by right- clicking on “Change Series Chart Type.” Check the box for both Current and Previous years’ data. For the current year, we’ll change that to the “stacked column” chart while the previous year will be set to a “stacked line with markers.”

Figure 10: Stack Chart Changing Chart Type and Secondary Axes

Figure 10: Stack Chart Changing Chart Type and Secondary Axes

While we’re at it, we can eliminate the chart border and the axis lines. We’re civilized, after all.

Figure 11: Stack Chart with Secondary Axes and Mixed Chart Types

Figure 11: Stack Chart with Secondary Axes and Mixed Chart Types

Formatting the Current Year Consumption to Black and removing the ‘Previous Year’ connecting lines gets us REALLY close to the final chart.

Figure 12: Stack Chart Starting to Look Like a Bullet Graph

Figure 12: Stack Chart Starting to Look Like a Bullet Graph

Here we can play with the Previous Year Consumption marker to ensure it can be seen a bit more clearly. Click on the series by right-clicking on the dots and selecting “Format Data Series” (Figure 13). Select the Marker, Built-in and click on the wide dash. The default size is 5, but it will look much better around 18.

Figure 13: Format Data Series Marker

Figure 13: Format Data Series Marker

It’s not quite done because you’ll need to ensure both axis are set to 0-100%. Do this then delete one of them (most delete the right).

Figure 14: Bullet Graph Core Completed

Figure 14: Bullet Graph Core Completed

On the home stretch as now you can adjust the colors, add a title, and size it appropriately to fit your dashboard, report, etc. There’s a really helpful feature that might help down the line so that you can consistently format all similar charts EXACTLY the same way. Right-click your chart and click “Save as Template,” name it, and save it.

Next time you have a set of data like this, you can start with whatever chart you want, right-click, and select “Change Series Chart Type” > “Templates” > Chart Template.

Figure 15: Chart Template Dialogue

Figure 15: Chart Template Dialogue

Figure 16: Bullet Graph with Title, Lighter Ranges, Less Clutter

Figure 16: Bullet Graph with Title, Lighter Ranges, Less Clutter

It’s probably more likely that each month pair has a different idea of what Conservative Water Usage, High, and Extremely High Usage so I’ve made each zone differently sized in Figure 17.

Figure 17: Bullet Graph with Variable Ranges

Figure 17: Bullet Graph with Variable Ranges

...and with a navy bar.

Figure 18: Bullet Graph with Navy Bar

Figure 18: Bullet Graph with Navy Bar

...and another, even bluer option in Figure 19 (in honor of “Towel Day”).

Figure 19: Bullet Graph with Hooloovoo

Figure 19: Bullet Graph with Hooloovoo

The following GIF (Figure 20) demonstrates the process described above and should keep your cats busy, if not teach them how to make bullet graphs. 

Figure 20: GIF

Figure 20: GIF

Jon Peltier also has a variety of instructions if you’d like to see a variety of ways to make bullet charts in both horizontal and vertical varieties. 

Huge thanks to Bill for this thorough and informative post! You can download the Excel file that contains his graphs.

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

numbers of different magnitudes

It can be challenging when you have numbers of very different magnitudes that you want to look at together. How do you make the small numbers visible? How do you provide a true sense of scale? I encountered this situation when reworking an example for a workshop recently and approached it in a new way. Here, I'll share with you how I tackled the challenge (note: details and numbers have been modified to preserve confidentiality).

First, let me set up the scenario: imagine you work in the credit risk organization at a bank (coincidentally, this was how my career started!). It's inevitable that some people will take out loans and default, or not pay them back. You need to estimate this amount so that you can reserve money against these expected losses. To do so, for a given portfolio of loans, you have a process for risk-rating each loan. For simplicity sake, let's assume a given loan can either Pass (negligible risk) or is classified as having some level of risk (Very Low, Low, Moderate, High, or Very High). You want to understand what the pass rate and risk profile for a given portfolio have looked like over time.

The original graph created to meet the above need looked similar to the following:

This is a lot to process. When I worked in banking, we used a ton of graphs with data tables like this. My initial reaction today is to get rid of the data table—my general guidance is that if the specific values are important, we should label them directly in the graph—but that doesn't work here. Many of the segments are very small, so there's no physical space to put them in the graph. We'll need to address this in another way.

In order to determine an effective approach for showing this data, first we need to figure out what we're trying to illustrate with it. When I look at the above graph and attempt to identify specific potential takeaways—and I should caveat that this domain is no longer my area of expertise, so I'm making a number of assumptions for illustration purposes—I come up with the following:

  1. There's been marked growth in the overall portfolio. Total loan volume has increased 81% in the time period shown, from $392M in January 2016 to $709M in January 2017. This strikes me as impressive growth. There's probably some interesting context here.
  2. Those classified as Pass have increased as a proportion of total. In January 2016, 89% of loans in this portfolio were classified as Pass (negligible risk); by January 2017, the proportion classified as Pass increased to 95% of total. This seems like good progress (note there could be a behind-the-scenes story of new loans added to the portfolio not having enough time to "go bad"—we'd want to understand the aging effect, but for the sake of illustration here let's not complicate our story with that).
  3. In spite of portfolio growth, there has been a volume decrease in all risk classifications year over year except Moderate. This is actually really difficult to see in the current visual because the stacked graph doesn't show it clearly and the data table takes a ton of effort to process. This seems like a potential area of concern in an otherwise positive story, so I want to make sure this finding comes across clearly.

Now that I know the takeaways I want to highlight, I can figure out how to show this data in a way that helps me make these takeaways clear to my audience. It would be difficult to highlight all of these points in a single graph, so I'm not going to limit myself to a single graph. Rather, I'm going to spread them out across multiple views. This will let me focus on each of the above points more effectively and weave all of the data and takeaways I want to highlight together. Following are the visuals and narrative that I developed for this. 

There's been an 81% increase in the dollar volume of our loan portfolio over the past 13 months, from $392M in January 2016 to $709M in January 2017. We see pretty consistent growth throughout the year. Next, I'm going to take this same data from this line graph and shift to a bar graph—I'm doing this because next I'll show you some component pieces of the overall portfolio. Here's the same data in a bar chart:

We're still going from $392M in January 2016 to $709M in January 2017. As you know, we risk rate all of the loans in our portfolio. A given loan is either classified as Pass—negligible risk—or with some level of risk, ranging from Very Low to Very High. Let's focus first on the Pass portion:

This is a positive story: the proportion of loans classified as Pass has increased from 89% of the portfolio in January 2016 to 95% in January 2017. This means the Non-pass loans have decreased from 11% of total portfolio in January 2016, to just 5% in January 2017:

Next, I'm going to focus on just the Non-pass loans, the orange portion of the following bars:

We classify Non-pass loans into one of five risk categories: Very Low, Low, Moderate, High, or Very High. Next, we'll look at this breakdown, piece-by-piece. There are large differences in the magnitudes of the numbers across the various risk ratings, so I'm going to layer these on and change the scale as needed as we go. Bear with me—this is perhaps a different way than you've seen data like this shown before—but we'll walk through it together step by step. Here's the basic graph:

Note that currently, the y-axis scale goes up to $0.1M, or $100,000. Let's start with the most severe level of risk: Very High. In the following graph, I'll plot the dollar volume of loans classified as Very High risk over time. As of January 2017, $0.06M—or $60K—in loan volume is classified as Very High.

Next, I'm going to do something a little different. I'm going to change the y-axis on the graph so that instead of going up to $0.1M ($100,000), it goes up to $1M. Notice how this visibly compresses the portion of the portfolio classified as Very High risk. That final point in January 2017 still represents $60K:

I've changed the scale of this graph so that I can add on the next layer of risk (one step less in severity than the Very High that we just considered): High risk loans. We've seen some big changes in High risk volume over the past year, with it starting out around $850K, then decreasing, but then increasing again. As of January 2017, loan volume classified as High risk amounts to $810K:

Next, I'm going to change the scale of the graph again so that we can continue to layer on more of the risk-rated portfolio. In this next iteration, my y-axis maximum has been increased from $1M to $10M.

As we saw before, expanding the scale visually compresses the data we've graphed so far. Note that the Very High risk loans are still there, but at only $60K, we can't really even see them now given the new scale of the graph. High risk loans are the dark orange bars. Next, I'll layer on the loan volume classified as Moderate risk. This has increased over the past year, from less than $2M in January 2016 to $6M in January 2017.

Next, I'll expand the scale again, increasing the y-axis maximum from $10M to $20M.

This provides space to layer on the next level of risk (continuing to decrease in risk severity): Low risk. This portion of the portfolio has been relatively flat over time, and totals $10M as of January 2017.

I'm going to change the scale of my graph one final time, increasing the y-axis maximum from $20M to $50M.

With this scale, now I can layer on the final level of risk (this is the lowest severity for those loans classified as Non-Pass). Very Low risk loans have decreased over time and as of January 2017, total $21M.

When we look at the overall heights of the bars in the preceding graph, we can see that total Non-Pass loans have decreased in volume year over year. However, when we stack data on top of other data like this, it can make it difficult to see the trend for each individual series. So let's look at one final view of this data, where we unstack the above bars and focus on the trend over time for each level of risk in a line graph:

In the line graph, we can see the marked decrease in Very Low risk loans over time as well as the relatively flat volume of Low risk loans. We can see that High and Very High loans are much lower in absolute volume than the other categories. Perhaps most interesting, however, is that Moderate loans have increased in volume over the past 13 months. Is this noteworthy? I'm not sure, but it seems like something we may want to draw attention to, better understand, and keep an eye on.

To overcome the challenge of visualizing numbers of very different magnitudes in a live setting, I might walk my audience through something like the preceding progression. Then if we're also in need of a static version to share—for those who missed the meeting or for a reminder for those who did attend of what was discussed, or if it really all needs to all fit on a single page (always question that assumption!)—I might do something like the following:

This is the first time I've tried this approach of repeatedly changing the y-axis maximum to layer on additional data, so I'll be curious in your reaction and feedback. Do you like it? Was it confusing? Do you have other ideas on how to approach and overcome this challenge? Leave a comment with your thoughts!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.