Tuesday, December 27, 2011

don't fall victim to this

I came across this graph recently when catching up on some reading over the holiday. My question to you is simple: can you read it?


The website where this interactive visual resides is called worldshapin, and it implores you to "compare countries through their shape." It visualizes data from the Human Development Report 2011 as a "star plot" along the six dimensions of education, population, health, workplace equality, carbon footprint, and living standards. As shown above, you can look at this data between countries and as it compares to continents and the world (when the world isn't obscured by the countries and continents you've chosen, as it is above).

Before I get to the don't fall victim portion of this blog post, let me first say that I do think this helps make the data in the report more accessible by making it visual. You can get a quick idea of how one part of the world stacks up to another across these dimensions that you wouldn't get with a table of data, for example. This is fine for information discovery. This assumes you are making it available for an audience who will have an appetite to "play" with the data.

This visual is not fine, however, if you have a specific story that you want to tell through data.

To convince you of this, I'm going to take one of my own failed data visualizations from my past and remake it into something that works. First, a bit of history:

I used to make charts like this. I called them "spider graphs." In a prior life, I worked in banking, managing home equity fraud. When it comes to fraud, the ways you can impact it can be classified into 8 categories (where each category is a piece of the fraud management lifecycle): deterrence, prevention, detection, mitigation, analysis, policy, investigation, and prosecution (Wes Wilhelm, The Fraud Management Lifecycle Theory). So if we were to look at our efforts in each of these areas and rate the activities along a scale from 0 (we have nothing in place) to, say, 10 (the unattainable utopia of fraud management - we've solved every problem), we could show how well we're doing on a relative basis in each area, with the goal of maximizing our coverage and balancing activity across the different parts of the lifecycle. The spider graph was perfect for this!

I was able to locate an old annual review on the topic of home equity fraud that I put together that highlighted progress to date and introduced forward-looking plans. I'm going to assume it's ok to share an excerpt here, given that the financial institution I did this work for is now defunct (due to much bigger issues than my poor data viz). Here's what it looked like:


The visual starts off with an explanation, shows an example of how to read the graphs on the right, followed by the real-data-graphs across the bottom (the titles across the very bottom are the 5 different types of home equity fraud that we were tracking).

Lesson 1 (foreshadowing): if you have to have a graph to show how to read your graph, your visual may be too complicated.

When it comes to the visual at the bottom ("FML for Home Equity"), let's try to look past the black background and meaningless colors (while annoying, we have bigger fish to fry here) to the actual data. Same question as I led this post with: can you read it?

Before I answer that question with my current data viz lens on, let's back up the better part of a decade to take a look at what I thought of these visuals when I created them. I thought they looked really cool. Sexy, even. I also thought they clearly showed what I wanted to show: mainly, that we had a lot of work to do - we were failing in a lot of places and needed to make some changes.

But people found them really hard to read. I found myself explaining, repeatedly (to the same people even!) how to read them. At the time, I thought this was an issue with my audience.

When I look at the graphs through today's lens, I recognize that the issue was not with my audience, but rather with me. It was a visual design failure. I stubbornly persisted to show data in a way that wasn't straightforward for my audience to consume (even when it became obvious through their questions that it wasn't clear!). When information isn't straightforward, it's hard to look at. For an audience, this feels uncomfortable. Most people don't want to spend a lot of time with things that make them feel uncomfortable. Even when you try to convince them to. Can you blame them?

Let's talk about some other ways to visualize this same data. The sort of data we have lends itself easily to a matrix structure, with fraud management lifecycle stage across one axis and fraud type across the other. When I see the data organized this way, I think heatmap. But the main drawback to a heatmap in this scenario is that, while it gives us a decent visual comparison of how we're doing across the different buckets (both by fraud management lifecycle stage and by fraud type), we don't get a visual comparison of where we are vs. where we'd like to be, which I think is the most important piece here.

Instead, I'll leverage one of my best friends: the bar chart. Bar charts are great because people already know how to read them. This means there's no learning curve for your audience to face to get to the information you want to provide. Rather than spending their time deciphering how to read the graph, they can spend it understanding the information it shows. There also more likely to spend time on a visual that doesn't make them feel uncomfortable. Here's another way to visualize this data:

Note that the actual numbers aren't so important here - they were somewhat subjective to begin with - so I opted not to show a numerical scale at all. What is important is the relative distance from where we consider ourselves to be currently and where we want to be (as close to "we've solved every problem" as possible). I've drawn attention to this gap by showing the opportunity that remains outlined in blue.

The overarching lesson is this: don't fall victim to choosing sexy over utility when it comes to data viz for telling a story. When your audience tells you something is hard to read, or you find yourself explaining the visual more than discussing the information it shows, listen and adjust!

If interested, my Excel file is here. Leave a comment to let me know what you think!

6 comments:

  1. Great post Cole. Just a couple of comments I'd like to share. Use of color may not be so bad in this case as color coding by stage would help facilitate comparisons across fraud types. Also, if there were acceptable ranges (i.e., 75-85% of problems solved), this could then be converted to bullet graphs, which would probably do a better job of highlighting where fraud management changes need to occur. I'd appreciate your thoughts.

    ReplyDelete
  2. Nice post ... However I do think start pots have a real advantage over anything else when the comparison of the shapes (type of frauds in this case) is the main task. Detecting the similarity among the frauds with the bar charts looks harder than with start plots.

    Also, I think the biggest flaw of worldshapin is the not choice of using start plots but that of using animation. Animated plots are really really hard to compare and there is not a clear justification for this choice. A static small multiple would work a whole lot better.

    ReplyDelete
  3. This is also an appropriate place for a parallel coordinates approach. I'm not in a place where I can look up a link, but I've written about this on my blog.

    ReplyDelete
  4. Fully support your criticism regarding radar-charts:
    They are confusing if several criteria are shown in more or less random order. The visual impression depends mainly on the order of the criteria. And it is almost always random.

    The purpose influences the chart type:
    The analyst needs interactive tools to look at the data from different angles and find the underlying story. In the final report the story has to be shown as clearly as possible.

    The story/question determines the final visualisation:
    Depending on what the message/the question is, who the audience is and what type of media is used the way of visualisation might vary. Interactive or not, sorting or not, use of color or not etc.

    Bar chart: often the best friend
    It might not be as sexy as the "Chart-girl/boy" one met at the fancy new "Tool-Bar" next door. But most of the time it is your best friend.

    Variation on your bar chart:
    With a little sorting and highlighting (by formula): http://dl.dropbox.com/u/51461869/FML-Makeover_Markus.xlsx

    ReplyDelete
  5. I added a very simple bar chart comparison of 2 selectable FML stages:
    http://dl.dropbox.com/u/51461869/FML-Makeover_Markus_extended.xlsx

    ReplyDelete
  6. Well said Cole. One of the things I enjoy most about data viz is the continuous learning process. I learn something new each and every day, and it's blogs like your's that help open my eyes.

    ReplyDelete