numbers of different magnitudes

It can be challenging when you have numbers of very different magnitudes that you want to look at together. How do you make the small numbers visible? How do you provide a true sense of scale? I encountered this situation when reworking an example for a workshop recently and approached it in a new way. Here, I'll share with you how I tackled the challenge (note: details and numbers have been modified to preserve confidentiality).

First, let me set up the scenario: imagine you work in the credit risk organization at a bank (coincidentally, this was how my career started!). It's inevitable that some people will take out loans and default, or not pay them back. You need to estimate this amount so that you can reserve money against these expected losses. To do so, for a given portfolio of loans, you have a process for risk-rating each loan. For simplicity sake, let's assume a given loan can either Pass (negligible risk) or is classified as having some level of risk (Very Low, Low, Moderate, High, or Very High). You want to understand what the pass rate and risk profile for a given portfolio have looked like over time.

The original graph created to meet the above need looked similar to the following:

This is a lot to process. When I worked in banking, we used a ton of graphs with data tables like this. My initial reaction today is to get rid of the data table—my general guidance is that if the specific values are important, we should label them directly in the graph—but that doesn't work here. Many of the segments are very small, so there's no physical space to put them in the graph. We'll need to address this in another way.

In order to determine an effective approach for showing this data, first we need to figure out what we're trying to illustrate with it. When I look at the above graph and attempt to identify specific potential takeaways—and I should caveat that this domain is no longer my area of expertise, so I'm making a number of assumptions for illustration purposes—I come up with the following:

  1. There's been marked growth in the overall portfolio. Total loan volume has increased 81% in the time period shown, from $392M in January 2016 to $709M in January 2017. This strikes me as impressive growth. There's probably some interesting context here.
  2. Those classified as Pass have increased as a proportion of total. In January 2016, 89% of loans in this portfolio were classified as Pass (negligible risk); by January 2017, the proportion classified as Pass increased to 95% of total. This seems like good progress (note there could be a behind-the-scenes story of new loans added to the portfolio not having enough time to "go bad"—we'd want to understand the aging effect, but for the sake of illustration here let's not complicate our story with that).
  3. In spite of portfolio growth, there has been a volume decrease in all risk classifications year over year except Moderate. This is actually really difficult to see in the current visual because the stacked graph doesn't show it clearly and the data table takes a ton of effort to process. This seems like a potential area of concern in an otherwise positive story, so I want to make sure this finding comes across clearly.

Now that I know the takeaways I want to highlight, I can figure out how to show this data in a way that helps me make these takeaways clear to my audience. It would be difficult to highlight all of these points in a single graph, so I'm not going to limit myself to a single graph. Rather, I'm going to spread them out across multiple views. This will let me focus on each of the above points more effectively and weave all of the data and takeaways I want to highlight together. Following are the visuals and narrative that I developed for this. 

There's been an 81% increase in the dollar volume of our loan portfolio over the past 13 months, from $392M in January 2016 to $709M in January 2017. We see pretty consistent growth throughout the year. Next, I'm going to take this same data from this line graph and shift to a bar graph—I'm doing this because next I'll show you some component pieces of the overall portfolio. Here's the same data in a bar chart:

We're still going from $392M in January 2016 to $709M in January 2017. As you know, we risk rate all of the loans in our portfolio. A given loan is either classified as Pass—negligible risk—or with some level of risk, ranging from Very Low to Very High. Let's focus first on the Pass portion:

This is a positive story: the proportion of loans classified as Pass has increased from 89% of the portfolio in January 2016 to 95% in January 2017. This means the Non-pass loans have decreased from 11% of total portfolio in January 2016, to just 5% in January 2017:

Next, I'm going to focus on just the Non-pass loans, the orange portion of the following bars:

We classify Non-pass loans into one of five risk categories: Very Low, Low, Moderate, High, or Very High. Next, we'll look at this breakdown, piece-by-piece. There are large differences in the magnitudes of the numbers across the various risk ratings, so I'm going to layer these on and change the scale as needed as we go. Bear with me—this is perhaps a different way than you've seen data like this shown before—but we'll walk through it together step by step. Here's the basic graph:

Note that currently, the y-axis scale goes up to $0.1M, or $100,000. Let's start with the most severe level of risk: Very High. In the following graph, I'll plot the dollar volume of loans classified as Very High risk over time. As of January 2017, $0.06M—or $60K—in loan volume is classified as Very High.

Next, I'm going to do something a little different. I'm going to change the y-axis on the graph so that instead of going up to $0.1M ($100,000), it goes up to $1M. Notice how this visibly compresses the portion of the portfolio classified as Very High risk. That final point in January 2017 still represents $60K:

I've changed the scale of this graph so that I can add on the next layer of risk (one step less in severity than the Very High that we just considered): High risk loans. We've seen some big changes in High risk volume over the past year, with it starting out around $850K, then decreasing, but then increasing again. As of January 2017, loan volume classified as High risk amounts to $810K:

Next, I'm going to change the scale of the graph again so that we can continue to layer on more of the risk-rated portfolio. In this next iteration, my y-axis maximum has been increased from $1M to $10M.

As we saw before, expanding the scale visually compresses the data we've graphed so far. Note that the Very High risk loans are still there, but at only $60K, we can't really even see them now given the new scale of the graph. High risk loans are the dark orange bars. Next, I'll layer on the loan volume classified as Moderate risk. This has increased over the past year, from less than $2M in January 2016 to $6M in January 2017.

Next, I'll expand the scale again, increasing the y-axis maximum from $10M to $20M.

This provides space to layer on the next level of risk (continuing to decrease in risk severity): Low risk. This portion of the portfolio has been relatively flat over time, and totals $10M as of January 2017.

I'm going to change the scale of my graph one final time, increasing the y-axis maximum from $20M to $50M.

With this scale, now I can layer on the final level of risk (this is the lowest severity for those loans classified as Non-Pass). Very Low risk loans have decreased over time and as of January 2017, total $21M.

When we look at the overall heights of the bars in the preceding graph, we can see that total Non-Pass loans have decreased in volume year over year. However, when we stack data on top of other data like this, it can make it difficult to see the trend for each individual series. So let's look at one final view of this data, where we unstack the above bars and focus on the trend over time for each level of risk in a line graph:

In the line graph, we can see the marked decrease in Very Low risk loans over time as well as the relatively flat volume of Low risk loans. We can see that High and Very High loans are much lower in absolute volume than the other categories. Perhaps most interesting, however, is that Moderate loans have increased in volume over the past 13 months. Is this noteworthy? I'm not sure, but it seems like something we may want to draw attention to, better understand, and keep an eye on.

To overcome the challenge of visualizing numbers of very different magnitudes in a live setting, I might walk my audience through something like the preceding progression. Then if we're also in need of a static version to share—for those who missed the meeting or for a reminder for those who did attend of what was discussed, or if it really all needs to all fit on a single page (always question that assumption!)—I might do something like the following:

This is the first time I've tried this approach of repeatedly changing the y-axis maximum to layer on additional data, so I'll be curious in your reaction and feedback. Do you like it? Was it confusing? Do you have other ideas on how to approach and overcome this challenge? Leave a comment with your thoughts!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

the book you're holding might be a fake!

It was a leap of faith when I decided to start storytelling with data® back in 2010. While I knew there was an opportunity to help people by sharing what I’ve learned over the years, I could not have predicted the enthusiasm and excitement that would build with my workshops, speaking engagements, and book. I tweet it often: #ilovemyjob and how I’m able to help individuals have greater impact in their jobs and organizations grow their business.

Recently, we discovered that there is an ambitious effort and market around selling fake or bootlegged versions of my book, storytelling with data. After my initial reaction—“That’s insane! There is a black market for my book?!?”—I shifted to the more practical and disappointing flurry of topics to consider including: trademark and copyright infringement, lost revenue, individuals profiting off of other people’s work, and so forth. Most disturbing to me, however, is the lower quality of these knock off books.

I care deeply about quality, attention to detail, and the overall audience experience. It’s why I try to pick cool places and venues for my workshops, chose to partner with my publisher, Wiley, and read through and edit blog posts like this about a zillion times prior to posting. This focus on quality removes distractions from my message. Unfortunately, the fake copies are leading to a poor experience of storytelling with data. I was always confused by negative feedback on the quality of the book—it’s become clear now that this is due to bootlegged copies, which, sadly, have been out there since the book's launch. Here are some images illustrating the issues: 

FRONT COVER: Paper of fake version is yellowish instead of bright white, text storytelling with data is stretched, and there are variances in artwork formatting (pronounced borders in table, different text color in heatmap).

FRONT COVER: Paper of fake version is yellowish instead of bright white, text storytelling with data is stretched, and there are variances in artwork formatting (pronounced borders in table, different text color in heatmap).

BACK COVER: Different picture of me, different list price, text below picture has larger line breaks and smaller text on the fake book.

BACK COVER: Different picture of me, different list price, text below picture has larger line breaks and smaller text on the fake book.

INSIDE IMAGES: Blue tip boxes appear washed-out, grey elements in graphs are so light they are barely legible in the fake version of the book and contrast isn't sharp.

INSIDE IMAGES: Blue tip boxes appear washed-out, grey elements in graphs are so light they are barely legible in the fake version of the book and contrast isn't sharp.

BINDING: The fake has sloppy binding with visible glue and is also thinner due to lower paper quality compared to the real book.

BINDING: The fake has sloppy binding with visible glue and is also thinner due to lower paper quality compared to the real book.

With my publisher's help, I’m working to control what’s out there. But I want to do more—I want to be sure that YOU are not sitting with a low-quality knock off of my book.

If you believe you were sold a fake version of my book: please let me fix that personally.

Signs that you may have a fake book include the issues highlighted in the preceding images: yellowish cover with stretched or poorly formatted artwork, back cover with a different picture of me (crazy, right?!?) or US list price other than $39.95, faded text or washed out images in the book, and visible glue or otherwise sloppy binding.

If you believe you’ve bought a fake book, email: fakebook@storytellingwithdata.com with a picture of the book’s flaws and proof of purchase. We’ll let you know how to return your book and in exchange, receive quickly as a replacement a personally signed copy of the real storytelling with data.

Haven’t yet bought your copy of storytelling with data? Follow this link and purchase directly from Amazon to ensure you’ll get the real thing. We’re finding that the pirated books are coming from the “42 New” category in the pic to the right.

Thanks for your continued support, enthusiasm, and love for storytelling with data.
- Cole

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

declutter this graph!

The importance of decluttering when communicating visually with data is something that I talk about regularly (you can watch a video from me on this topic, and I've also blogged about it many times—here's an early example). It's mostly common sense stuff when we stop and think about it: get rid of the visual elements in your graphs that are unnecessary. Doing so can have a profound impact. Yet too often, we don't take the time to do this.

To illustrate the benefit, let's examine the following visual. I found it on viz.wtf, so we know it's a model for what not to do when visualizing data. There is a ton of clutter and other issues here. Consider for a moment: what clutter would you eliminate?

Declutter GIF 1.png

Scroll down to see the progression I went through. Each of these changes on its own is relatively minor, but sum them up and it's a pretty big difference between the original and the decluttered version. We can take it a step further by identifying the so what? and using words and color to make the point clear. 

Declutter GIF 7.png
Declutter GIF 8.png

The graph still isn't perfect. It bothers me that the time intervals on the x-axis aren't consistent. You may see other things you'd approach differently as well. But check out the improvement we can make by identifying and eliminating clutter and making some other changes to reduce cognitive burden. Don't let unnecessary elements distract from your data or your message. If interested, you can download the Excel file.

What's your favorite type of clutter to remove? Leave a comment!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

so what?

"What is the point?" This is a question that comes up often in my workshops when we are looking at graphs and discussing how they can be improved. In this post, I discuss the importance of making your main takeaway clear and apply this idea to a specific example.

Read More
/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

an updated post on pies

Today's post is about one of my favorite dessert graphs: the pie chart. Those who follow my work know that I am not a fan. In fact, I've written posts with titles like "pie charts are evil" and given presentations called "death to pie charts." It can be fun to be a little provocative sometimes. Though one might argue this is taking it too far.

A wise person once said to me, "rather than ban pies altogether, teach people how to use them appropriately." This is sage advice. The challenge for me is that the appropriate use cases are few and personally, in pretty much every one of these cases, I'd opt for another approach.

Still, I do not want to spread misinformation. Pie charts are not inherently evil. Like pretty much any tool, they can be used well and they can be used not-so-well. Since my pie-bashing posts and presentations, new research has been conducted (by afore-quoted Robert Kosara and Drew Skau) that debunks some previously held common beliefs about pies. So I thought it prudent to write an updated post on the pie chart. While not exactly glowing, this will likely be at least a little more balanced compared to what you may have seen from me on this topic in the past.

The appropriate use case for the pie chart

Pies do a better job than probably any other visual out there at expressing the part-to-whole relationship. When you see a pie, you immediately have an understanding that it depicts a "whole" and can be sliced into pieces of that whole. It's also very easy for us to pick out a very large slice or a very small slice.

The limitation of the pie is that it's harder to say much more specific than that. When segments are close in size, it can be difficult to determine which is bigger or by how much. When that is an important goal, the pie chart breaks down. Ok, that didn't take so long—I'm already talking about what not to do with a pie chart. Let's shift next to more on that.

What not to do with pies

Pies seem to lend themselves more than other graph types to unnecessary—and often downright harmful—dressing up and embellishment. No other graph type is depicted in 3D or exploded as frequently as the pie.

A sampling of results from Google image search for "pie chart."

A sampling of results from Google image search for "pie chart."

But the argument that it isn't fair to ban pies based on a bad example of a pie is a logical one. Due to their frequent misuse and in the spirit of teaching people how to use them correctly, here are some pointers on what not to do with pie charts:

  • Don't use 3D effects or explode your pie. At best these add unnecessary clutter; worse, they can make it difficult or impossible to accurately understand the relative values in the pie. Here's a simple example.
  • If the pie is depicting percents, it must sum to 100%. If it sums to anything other than 100%, something is wrong. If not percents, then the pie must sum to some meaningful whole.
  • Don't have a ton of slices. There isn't a hard and fast rule here, but be reasonable. A pie showing a ton of tiny categories will be impossible to read (even if legible, hard to say much useful from, like this). Consider whether it might make sense to combine small slices into an "Other" category.
  • Don't use a pie if the primary goal is to compare the size of the slices. The lack of alignment to a common baseline and area encoding of data makes this difficult. A bar chart will usually be a better option if comparing a quantity across categories is the primary goal.
  • Don't use multiple pies and ask your audience to compare across them. This piece of advice may be controversial. But if the slices are different across the pies (which I'd expect they are if you have something interesting to say with them), the pieces shift around; this plus the spatial separation and lack of alignment to a common baseline make comparing slices across pies difficult. Perhaps you could get away with it if you're emphasizing a single slice across multiple pies, but if you want to do more than that, pies won't be a good approach. 

In conclusion

I will continue not to use pies. Does that mean you should follow suit? Not necessarily. What I advise in my workshops is, when you find yourself reaching for a pie, pause and ask yourself why. If you can answer that question, you've probably put enough thought into it to use the pie chart. Though when I step back and think about that advice—really, that's something we should do anytime we make any kind of graph. Think about what you want to enable your audience to do with the data you are graphing and whether the type of graph you choose is allowing for that in a straightforward manner. If you do that, you'll be well positioned to get your point across. And that's sort of the whole point, isn't it?

For more on this topic (and some varied perspectives on pies), check out the following posts:

The above isn't a comprehensive list—if you know of others worth mentioning, please do so in a comment.

I'll also note that one limitation of the pie study is that it included pies depicting only two segments. I hope to see further research expand on this and also look at pies having three or more slices.

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.