the bullet graph

The following is a guest post written by Bill Dean. After a recent workshop, Bill shared with me his affinity for bullet graphs. I've never used one before—though can see the potential—so invited him to share his views and an example approach here. Bill leads an engineering and data science team at Microsoft that enables groups across the company to analyze and act on customer feedback at scale. He loves (and makes) both BBQ and data visualizations but hasn’t yet mixed the two. For more on Bill or to connect with him, check out LinkedIn or Twitter.

While some chart series stand on their own, it sometimes makes sense to allow them to stand alone-ish. That is, you want it to stand near some benchmark (e.g., closest competitor, last year’s performance, goals) without giving up too much of the spotlight. Enter “Bullet Graphs.” Bullet graphs were seemingly developed by Stephen Few, and provide a great way to gain information density without the cognitive load of some graph options. Bullet graphs have been around a while but are utilized much less frequently than simple 2-series bar charts or even malformed eye-candy gauges.  

For example, take a typical water bill (Figure 1, below, roughly replicated from my personal bill), which does a decent job at drawing your eyes to the current year’s data via a two-series bar chart. This is actually a good starting point as I’ve seen data like this represented as two pie charts. There’s a confusion factor lurking for users to misinterpret the chart due to its two-month groupings. At first glance, one might think that January is the white bar, February the black bar, and so on in some sort of tick-tock visual joke. Instead, we’re billed two months (e.g., January-February Unit) at a time and each period is the total for combined months in each year’s data. I’m going to start with the water bill as-is, and will adjust it towards a bullet graph through a path that’s familiar…especially for Cole’s readers.

Figure 1: Original Water Bill

Figure 1: Original Water Bill

While it’s a pretty good start, the white bars representing the “previous year’s consumption” series make my eyes bleed and I might want to help it stand out less by making those bars gray and removing the border. In Figure 2, both series still have sufficient contrast ratios from the background and each other, which is good for everyone…especially “low-vision” users. It will also help to move the legend up towards the top of the chart to inform users ahead of the visual.

Figure 2: Updated Water Bill with Gray

Figure 2: Updated Water Bill with Gray

If the exact numbers matter, I could consider adding the data labels on the inside end of the bars. Only do this if you can ensure there’s always enough vertical bar space for a contrasting text and sufficient width to accommodate reasonably large numbers.

Figure 3: Updated Water Bill with Inside Labels

Figure 3: Updated Water Bill with Inside Labels

Figure 3 feels like a natural improvement over the original chart (even though I want to remove the y-axis, I’ll leave it here for continuity). I could leave this alone, but I’m going to transform it into a simple bullet graph. Next, I’d like to put the previous year’s series behind the current year’s series so the direct comparison is more obvious. I can do this by clicking once into my first data series and making it “Secondary.” Because this will move them on top of each other, I’ll want the ensure the context (previous year series) is in the background, lighter in color (it already is), and wider.

Figure 4: Simple Bullet Graph

Figure 4: Simple Bullet Graph

Now, my chart looks different because both the primary and secondary axis are using ranges that make sense for each series independently. I’ll need to intervene and ensure they’re both set to be the same. Do this by right-clicking on each axis and the “Edit Series” icon to set the scale (in this case it’s 0 to 3500). I can also delete the axis on the right (or both, if I add the value as a data label). Dealer’s choice, really. For additional decluttering, I’ve opted to remove the line at the base of the bars. 

Figure 5: Simple Bullet Graph with Consistent Axes Range

Figure 5: Simple Bullet Graph with Consistent Axes Range

Now, I’ve got a chart that draws my attention to this year’s trend with the context of what my family consumed last year (Figure 5). There’s also a side benefit that the two-month format no longer confusing as they’re grouped and centered under both bars. It’s just as easy to have made this a horizontal bar chart instead (often better for longer labels and non-time groups). This is how that would look (also used data labels in lieu of the axis).

Figure 6: Horizontal Simple Bullet Graph

Figure 6: Horizontal Simple Bullet Graph

If my water company wanted to add a little more information and a little peer pressure, they might add a more complete bullet graph with zones of guilt and a small marker that represents my local neighborhood, a goal they set for me, or (in this case) last year’s consumption for the same period).

Figure 7: Single Bullet Graph

Figure 7: Single Bullet Graph

A full series with this information would be more informative.

We can start with the data table and walk through how to do it. We’ll start with some reference ranges that will add up to the full range of the chart you want to see and will create zones in the background for context.

Table 1: Monthly Consumption Data (in percent)

Table 1: Monthly Consumption Data (in percent)

Start, by highlighting the entire table in Excel and use the Insert tab to add a 100% stack chart. It will be hideous!

Figure 8. Default Stack Chart

Figure 8. Default Stack Chart

Excel draws the table in reverse order (the top cell value is at the bottom of each chart column with each subsequent value piled on top). This is important to know so you start with what you want in the background at the top of the table. 

The blue, orange, gray sequence for each period in Figure 8 is the Conservative, High, Extremely High data from my table, respectively. You’ll want to right click the bottom series and color that a dark gray, the orange series should be a noticeable bit lighter than that, and the gray series should be a few notches lighter than the previous value. Set the outline to “None” (or make it white to have the borders pop a little).

Figure 9. Stack Chart Mid-Formatting

Figure 9. Stack Chart Mid-Formatting

From here, you’ll want to right-click one of the series and adjust the gap width to 40% or so to ensure the bars are nice and wide. When you hover over untouched series, you’ll see that they are the current and previous year. Let’s go ahead and make those “secondary” by right- clicking on “Change Series Chart Type.” Check the box for both Current and Previous years’ data. For the current year, we’ll change that to the “stacked column” chart while the previous year will be set to a “stacked line with markers.”

Figure 10: Stack Chart Changing Chart Type and Secondary Axes

Figure 10: Stack Chart Changing Chart Type and Secondary Axes

While we’re at it, we can eliminate the chart border and the axis lines. We’re civilized, after all.

Figure 11: Stack Chart with Secondary Axes and Mixed Chart Types

Figure 11: Stack Chart with Secondary Axes and Mixed Chart Types

Formatting the Current Year Consumption to Black and removing the ‘Previous Year’ connecting lines gets us REALLY close to the final chart.

Figure 12: Stack Chart Starting to Look Like a Bullet Graph

Figure 12: Stack Chart Starting to Look Like a Bullet Graph

Here we can play with the Previous Year Consumption marker to ensure it can be seen a bit more clearly. Click on the series by right-clicking on the dots and selecting “Format Data Series” (Figure 13). Select the Marker, Built-in and click on the wide dash. The default size is 5, but it will look much better around 18.

Figure 13: Format Data Series Marker

Figure 13: Format Data Series Marker

It’s not quite done because you’ll need to ensure both axis are set to 0-100%. Do this then delete one of them (most delete the right).

Figure 14: Bullet Graph Core Completed

Figure 14: Bullet Graph Core Completed

On the home stretch as now you can adjust the colors, add a title, and size it appropriately to fit your dashboard, report, etc. There’s a really helpful feature that might help down the line so that you can consistently format all similar charts EXACTLY the same way. Right-click your chart and click “Save as Template,” name it, and save it.

Next time you have a set of data like this, you can start with whatever chart you want, right-click, and select “Change Series Chart Type” > “Templates” > Chart Template.

Figure 15: Chart Template Dialogue

Figure 15: Chart Template Dialogue

Figure 16: Bullet Graph with Title, Lighter Ranges, Less Clutter

Figure 16: Bullet Graph with Title, Lighter Ranges, Less Clutter

It’s probably more likely that each month pair has a different idea of what Conservative Water Usage, High, and Extremely High Usage so I’ve made each zone differently sized in Figure 17.

Figure 17: Bullet Graph with Variable Ranges

Figure 17: Bullet Graph with Variable Ranges

...and with a navy bar.

Figure 18: Bullet Graph with Navy Bar

Figure 18: Bullet Graph with Navy Bar

...and another, even bluer option in Figure 19 (in honor of “Towel Day”).

Figure 19: Bullet Graph with Hooloovoo

Figure 19: Bullet Graph with Hooloovoo

The following GIF (Figure 20) demonstrates the process described above and should keep your cats busy, if not teach them how to make bullet graphs. 

Figure 20: GIF

Figure 20: GIF

Jon Peltier also has a variety of instructions if you’d like to see a variety of ways to make bullet charts in both horizontal and vertical varieties. 

Huge thanks to Bill for this thorough and informative post! You can download the Excel file that contains his graphs.

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

numbers of different magnitudes

It can be challenging when you have numbers of very different magnitudes that you want to look at together. How do you make the small numbers visible? How do you provide a true sense of scale? I encountered this situation when reworking an example for a workshop recently and approached it in a new way. Here, I'll share with you how I tackled the challenge (note: details and numbers have been modified to preserve confidentiality).

First, let me set up the scenario: imagine you work in the credit risk organization at a bank (coincidentally, this was how my career started!). It's inevitable that some people will take out loans and default, or not pay them back. You need to estimate this amount so that you can reserve money against these expected losses. To do so, for a given portfolio of loans, you have a process for risk-rating each loan. For simplicity sake, let's assume a given loan can either Pass (negligible risk) or is classified as having some level of risk (Very Low, Low, Moderate, High, or Very High). You want to understand what the pass rate and risk profile for a given portfolio have looked like over time.

The original graph created to meet the above need looked similar to the following:

This is a lot to process. When I worked in banking, we used a ton of graphs with data tables like this. My initial reaction today is to get rid of the data table—my general guidance is that if the specific values are important, we should label them directly in the graph—but that doesn't work here. Many of the segments are very small, so there's no physical space to put them in the graph. We'll need to address this in another way.

In order to determine an effective approach for showing this data, first we need to figure out what we're trying to illustrate with it. When I look at the above graph and attempt to identify specific potential takeaways—and I should caveat that this domain is no longer my area of expertise, so I'm making a number of assumptions for illustration purposes—I come up with the following:

  1. There's been marked growth in the overall portfolio. Total loan volume has increased 81% in the time period shown, from $392M in January 2016 to $709M in January 2017. This strikes me as impressive growth. There's probably some interesting context here.
  2. Those classified as Pass have increased as a proportion of total. In January 2016, 89% of loans in this portfolio were classified as Pass (negligible risk); by January 2017, the proportion classified as Pass increased to 95% of total. This seems like good progress (note there could be a behind-the-scenes story of new loans added to the portfolio not having enough time to "go bad"—we'd want to understand the aging effect, but for the sake of illustration here let's not complicate our story with that).
  3. In spite of portfolio growth, there has been a volume decrease in all risk classifications year over year except Moderate. This is actually really difficult to see in the current visual because the stacked graph doesn't show it clearly and the data table takes a ton of effort to process. This seems like a potential area of concern in an otherwise positive story, so I want to make sure this finding comes across clearly.

Now that I know the takeaways I want to highlight, I can figure out how to show this data in a way that helps me make these takeaways clear to my audience. It would be difficult to highlight all of these points in a single graph, so I'm not going to limit myself to a single graph. Rather, I'm going to spread them out across multiple views. This will let me focus on each of the above points more effectively and weave all of the data and takeaways I want to highlight together. Following are the visuals and narrative that I developed for this. 

There's been an 81% increase in the dollar volume of our loan portfolio over the past 13 months, from $392M in January 2016 to $709M in January 2017. We see pretty consistent growth throughout the year. Next, I'm going to take this same data from this line graph and shift to a bar graph—I'm doing this because next I'll show you some component pieces of the overall portfolio. Here's the same data in a bar chart:

We're still going from $392M in January 2016 to $709M in January 2017. As you know, we risk rate all of the loans in our portfolio. A given loan is either classified as Pass—negligible risk—or with some level of risk, ranging from Very Low to Very High. Let's focus first on the Pass portion:

This is a positive story: the proportion of loans classified as Pass has increased from 89% of the portfolio in January 2016 to 95% in January 2017. This means the Non-pass loans have decreased from 11% of total portfolio in January 2016, to just 5% in January 2017:

Next, I'm going to focus on just the Non-pass loans, the orange portion of the following bars:

We classify Non-pass loans into one of five risk categories: Very Low, Low, Moderate, High, or Very High. Next, we'll look at this breakdown, piece-by-piece. There are large differences in the magnitudes of the numbers across the various risk ratings, so I'm going to layer these on and change the scale as needed as we go. Bear with me—this is perhaps a different way than you've seen data like this shown before—but we'll walk through it together step by step. Here's the basic graph:

Note that currently, the y-axis scale goes up to $0.1M, or $100,000. Let's start with the most severe level of risk: Very High. In the following graph, I'll plot the dollar volume of loans classified as Very High risk over time. As of January 2017, $0.06M—or $60K—in loan volume is classified as Very High.

Next, I'm going to do something a little different. I'm going to change the y-axis on the graph so that instead of going up to $0.1M ($100,000), it goes up to $1M. Notice how this visibly compresses the portion of the portfolio classified as Very High risk. That final point in January 2017 still represents $60K:

I've changed the scale of this graph so that I can add on the next layer of risk (one step less in severity than the Very High that we just considered): High risk loans. We've seen some big changes in High risk volume over the past year, with it starting out around $850K, then decreasing, but then increasing again. As of January 2017, loan volume classified as High risk amounts to $810K:

Next, I'm going to change the scale of the graph again so that we can continue to layer on more of the risk-rated portfolio. In this next iteration, my y-axis maximum has been increased from $1M to $10M.

As we saw before, expanding the scale visually compresses the data we've graphed so far. Note that the Very High risk loans are still there, but at only $60K, we can't really even see them now given the new scale of the graph. High risk loans are the dark orange bars. Next, I'll layer on the loan volume classified as Moderate risk. This has increased over the past year, from less than $2M in January 2016 to $6M in January 2017.

Next, I'll expand the scale again, increasing the y-axis maximum from $10M to $20M.

This provides space to layer on the next level of risk (continuing to decrease in risk severity): Low risk. This portion of the portfolio has been relatively flat over time, and totals $10M as of January 2017.

I'm going to change the scale of my graph one final time, increasing the y-axis maximum from $20M to $50M.

With this scale, now I can layer on the final level of risk (this is the lowest severity for those loans classified as Non-Pass). Very Low risk loans have decreased over time and as of January 2017, total $21M.

When we look at the overall heights of the bars in the preceding graph, we can see that total Non-Pass loans have decreased in volume year over year. However, when we stack data on top of other data like this, it can make it difficult to see the trend for each individual series. So let's look at one final view of this data, where we unstack the above bars and focus on the trend over time for each level of risk in a line graph:

In the line graph, we can see the marked decrease in Very Low risk loans over time as well as the relatively flat volume of Low risk loans. We can see that High and Very High loans are much lower in absolute volume than the other categories. Perhaps most interesting, however, is that Moderate loans have increased in volume over the past 13 months. Is this noteworthy? I'm not sure, but it seems like something we may want to draw attention to, better understand, and keep an eye on.

To overcome the challenge of visualizing numbers of very different magnitudes in a live setting, I might walk my audience through something like the preceding progression. Then if we're also in need of a static version to share—for those who missed the meeting or for a reminder for those who did attend of what was discussed, or if it really all needs to all fit on a single page (always question that assumption!)—I might do something like the following:

This is the first time I've tried this approach of repeatedly changing the y-axis maximum to layer on additional data, so I'll be curious in your reaction and feedback. Do you like it? Was it confusing? Do you have other ideas on how to approach and overcome this challenge? Leave a comment with your thoughts!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

the book you're holding might be a fake!

It was a leap of faith when I decided to start storytelling with data® back in 2010. While I knew there was an opportunity to help people by sharing what I’ve learned over the years, I could not have predicted the enthusiasm and excitement that would build with my workshops, speaking engagements, and book. I tweet it often: #ilovemyjob and how I’m able to help individuals have greater impact in their jobs and organizations grow their business.

Recently, we discovered that there is an ambitious effort and market around selling fake or bootlegged versions of my book, storytelling with data. After my initial reaction—“That’s insane! There is a black market for my book?!?”—I shifted to the more practical and disappointing flurry of topics to consider including: trademark and copyright infringement, lost revenue, individuals profiting off of other people’s work, and so forth. Most disturbing to me, however, is the lower quality of these knock off books.

I care deeply about quality, attention to detail, and the overall audience experience. It’s why I try to pick cool places and venues for my workshops, chose to partner with my publisher, Wiley, and read through and edit blog posts like this about a zillion times prior to posting. This focus on quality removes distractions from my message. Unfortunately, the fake copies are leading to a poor experience of storytelling with data. I was always confused by negative feedback on the quality of the book—it’s become clear now that this is due to bootlegged copies, which, sadly, have been out there since the book's launch. Here are some images illustrating the issues: 

FRONT COVER: Paper of fake version is yellowish instead of bright white, text storytelling with data is stretched, and there are variances in artwork formatting (pronounced borders in table, different text color in heatmap).

FRONT COVER: Paper of fake version is yellowish instead of bright white, text storytelling with data is stretched, and there are variances in artwork formatting (pronounced borders in table, different text color in heatmap).

BACK COVER: Different picture of me, different list price, text below picture has larger line breaks and smaller text on the fake book.

BACK COVER: Different picture of me, different list price, text below picture has larger line breaks and smaller text on the fake book.

INSIDE IMAGES: Blue tip boxes appear washed-out, grey elements in graphs are so light they are barely legible in the fake version of the book and contrast isn't sharp.

INSIDE IMAGES: Blue tip boxes appear washed-out, grey elements in graphs are so light they are barely legible in the fake version of the book and contrast isn't sharp.

BINDING: The fake has sloppy binding with visible glue and is also thinner due to lower paper quality compared to the real book.

BINDING: The fake has sloppy binding with visible glue and is also thinner due to lower paper quality compared to the real book.

With my publisher's help, I’m working to control what’s out there. But I want to do more—I want to be sure that YOU are not sitting with a low-quality knock off of my book.

If you believe you were sold a fake version of my book: please let me fix that personally.

Signs that you may have a fake book include the issues highlighted in the preceding images: yellowish cover with stretched or poorly formatted artwork, back cover with a different picture of me (crazy, right?!?) or US list price other than $39.95, faded text or washed out images in the book, and visible glue or otherwise sloppy binding.

If you believe you’ve bought a fake book, email: fakebook@storytellingwithdata.com with a picture of the book’s flaws and proof of purchase. We’ll let you know how to return your book and in exchange, receive quickly as a replacement a personally signed copy of the real storytelling with data.

Haven’t yet bought your copy of storytelling with data? Follow this link and purchase directly from Amazon to ensure you’ll get the real thing. We’re finding that the pirated books are coming from the “42 New” category in the pic to the right.

Thanks for your continued support, enthusiasm, and love for storytelling with data.
- Cole

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

declutter this graph!

The importance of decluttering when communicating visually with data is something that I talk about regularly (you can watch a video from me on this topic, and I've also blogged about it many times—here's an early example). It's mostly common sense stuff when we stop and think about it: get rid of the visual elements in your graphs that are unnecessary. Doing so can have a profound impact. Yet too often, we don't take the time to do this.

To illustrate the benefit, let's examine the following visual. I found it on viz.wtf, so we know it's a model for what not to do when visualizing data. There is a ton of clutter and other issues here. Consider for a moment: what clutter would you eliminate?

Declutter GIF 1.png

Scroll down to see the progression I went through. Each of these changes on its own is relatively minor, but sum them up and it's a pretty big difference between the original and the decluttered version. We can take it a step further by identifying the so what? and using words and color to make the point clear. 

Declutter GIF 7.png
Declutter GIF 8.png

The graph still isn't perfect. It bothers me that the time intervals on the x-axis aren't consistent. You may see other things you'd approach differently as well. But check out the improvement we can make by identifying and eliminating clutter and making some other changes to reduce cognitive burden. Don't let unnecessary elements distract from your data or your message. If interested, you can download the Excel file.

What's your favorite type of clutter to remove? Leave a comment!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

so what?

"What is the point?" This is a question that comes up often in my workshops when we are looking at graphs and discussing how they can be improved. In this post, I discuss the importance of making your main takeaway clear and apply this idea to a specific example.

Read More
/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.