our tools don't know the story

A question that frequently arises in our workshops is “What tools do I need to tell stories like you do?” Many are surprised to hear the answer: we’re tool-agnostic. Rather, the concepts we teach are universal. No matter if you’re using Excel, Tableau, PowerBI, R, SAS, or something else, the tools themselves don't know your data, your organization, or your audience like you do. That’s where an analyst adds value by bringing the data & its underlying story contextually to life.

Today’s post was inspired by a real-world makeover of data originally created in a tool highly regarded for data visualization. The client was visualizing advertising data across multiple countries. Their initial visualization looked similar to the one below. (Note: I’ve anonymized the data to preserve confidentiality).  

scatterplot1.png
 

This chart shows two dimensions of advertising effectiveness: reach (how many users saw an ad) and engagement (how many users clicked on the ad) across several countries (United States, Germany, Great Britain, China, and Brazil). The higher these numbers, the better.

Upon further exploration, we see that the magnitudes of reach and engagement are very different across countries. In China, 52% of users were reached compared to 68% in Brazil. With engagement, the magnitude of the difference is even more pronounced: China’s engagement is 6%, half of Brazil’s 12%.  

Imagine yourself as a decision maker tasked with determining an action plan from these results. If the analyst presented you with the visual above, what conclusions might you draw?  An informal poll of readers might return multiple answers, which demonstrates the danger of letting our tools "tell the story" for us.

Don’t assume two different people looking at the same graph will come to the same conclusion. Add value by highlighting key takeaways for your audience.

An important distinction made in the book, storytelling with data, is the difference between exploratory and explanatory analysis. Exploratory analysis is what we do to find interesting things in our data. For example, the analyst might have asked many questions during the exploratory phase, including (but not limited to):

 

1. How have these metrics changed over time?
2. Are there geographical differences when drilling down by country?
3. What is the revenue impact of this data?
4. Are there noticeable patterns in users’ behavior that can be used for predicting next quarter’s results?

After exploratory analysis, then we move to explanatory analysis. Explanatory analysis is where we take the interesting thing we found via exploratory analysis and communicate it to our intended audience. In explanatory analysis, often times that requires creating a different visual or using a different tool than we used in the exploratory phase.

Let’s assume that what’s relevant in this data is the varying levels of reach & engagement and therefore, each country needs its own strategy for next year. If that’s the interesting conclusion, how might the analyst communicate this? One option is to use the initial design and visual cues like color and annotations to focus attention appropriately:

scatterplot2.png
 

In this version, I’ve preserved the horizontal bars, sorted by reach in descending order, and decluttered by removing the border and grid lines. While this a step in the right direction, it still takes a lot of work to read all this text and mentally process the different takeaways:

 

1. Low engagement/high reach
2. High engagement/high reach
3. High engagement/low reach
4. Low engagement/low reach

Perhaps a different visual would make this more visually apparent. Since these takeaways fall into four quadrants, a scatterplot is another alternative:

scatterplot3.png
 

We now have a visual with a well-labeled construct on how to interpret the data. The categories on the axes (Many/Few, Low/High) help the audience understand the range of values and where each country falls on that range. For further reading on the importance of categorization, check out this post.

Finally, I’d add back the color & annotations, while being thoughtful about how the audience will intake the information. For example, in a setting where only the Brazil team is present, I might focus attention only on their data:

scatterplot4.png
 

Or the European countries, where the results are mixed:

scatterplot5.png
 

Both views would be important considerations in knowing the audience for our explanatory analysis.  

Scatterplots are often used with scientific data, but in this case work well for visualizing categories of differing takeaways. This works because of the additions of text and categorization, which helps the audience process the information. Remember, never make your audience do more work than necessary to understand a graph!

If all the takeaways need to be on one view, I can still leverage the scatterplot while being strategic about the use of color to focus attention appropriately.  

scatterplot6.png
 

In conclusion, there’s a huge difference between simply showing data from the exploratory phase vs. using data tell the a story in explanatory analysis. Check out the difference between the remade view above vs where we started:  

scatterplot7.png
 

We have a wide disposal of great tools for visualizing data, but our tools will never know our data’s story like we do. We can add value to our roles and our organizations by bringing the story to life.

If interested, you can download the Excel file with the above graphs.

Want more on story? Check out Episode 2 of the SWD podcast, where Cole discusses her thoughts on, "What is story?" She makes a distinction between story with a lower case 's' (the takeaway, or the so what—the way "story" was used in this post) and Story with a capital 'S,' which has a shape (plot, twists, ending—a narrative arc). Also stay tuned for the next post here, where Cole will recap and share the 75+ annotated line graphs received in response to the latest #SWDchallenge.

Update from Cole: We have a couple of additional views to share based on reader comments. First, the following view is similar to the final visual above, only with text moved out of the graph itself to the side. 

Scatterplot - words on side.png
 

This next graph was created by Daniel Zvinca, which follows what he calls his "obsessive concern for a flexible design."

Scatterplot alternative_DanZvinca.png
 

He notes that he preserved color for potential additional enhancement and outlines the following benefits of this view:

  1. More metrics can be added or just one can be used (works fine for 1, 2, ...5 metrics).
  2. More countries can be added. When number is higher, gridlines ever 5 countries or so would help localize the associated values.
  3. Any metric is clearly encoded/decoded and can be used for sorting.
  4. Comments do not require special care, they never overlap (unless they are too long).
  5. They can be defined for several performance levels (e.g. Likert scale intervals). For purpose of this design, bad=dark background, good=light background.

Nice idea, Dan, and thanks for sharing! Thanks also to everyone who has commented and contributed to the discussion, both here and on other posts.


Elizabeth Hardman Ricks is a Data Visualization Designer on the storytelling with data team. She has a passion for helping her audience understand the ’so-what?’ Connect with Elizabeth on LinkedIn or Twitter.  

how I'm building this

“How did you start storytelling with data?” is a question I receive frequently. In this latest episode of the podcast, I’m interviewed by my husband, Randy, and field questions on how it all came to be, including finding balance between work and family, some book-writing tips, and my outlook on where SWD will go from here.

Listening time: 45:30. Links mentioned during the show:

Subscribe in your favorite podcast platform to be updated when new episodes are available. If you like what you hear, please rate or review the SWD podcast. You can find past episodes on the podcast page, including sessions focusing on feedback in data visualization and discussion of what is story?

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

introducing the 2018 #SWDchallenge

One of the best ways to learn is to do. This is something I strongly believe. We learn and get better through practice and application. It is by practice and practice—and more practice—that I’ve sharpened my own data visualization, presentation, and storytelling skills. I believe that you can do so as well.

While practicing on the job is great, it can also present some challenges. Constraints are often imposed. It can at times feel like a scary (or even risky) place to test out something new. But what if there were a safe space each month where you could take a turn flexing your skills or trying something you haven’t attempted before?

To that end, I’m happy to launch the 2018 monthly #SWDchallenge. Each month will have a different topic—I’m planning to start with some different graph types, but may change this up as we go along. I’ll announce the focus at the beginning of each month and share some related thoughts and examples. Then I’ll turn it over to you. You’ll have a week to find your data, create your visual and write any commentary you’d like to go with it (full specs follow). I’ll circle back with a follow up post later in the month, where I’ll share back what you’ve shared with me.

Think of this as a safe space to try something new: test out a new tool, technique, or approach. Or simply take it as a reason to practice as you continue to hone your data visualization and data storytelling skills. There’s no obligation—participate in one, a couple, or all. You are welcome to remain anonymous (I’ll only share first name and possibly last initial, though I’m also happy to include your social media profile or site if you’d like).

The instructions are simple:

  • Make it. Identify your data and create your visual with the tool of your choice. If you need help finding data, check out this list of publicly available data sources. You are welcome to use a real work scenario, but please don’t share any confidential data.
  • Share it. Email your entry to SWDchallenge@storytellingwithdata.com by the deadline (midnight PST). Attach your image as a .PNG. Put any commentary you’d like included in my follow up post in the body of the email; if there’s a social media profile or blog/site you’d like mentioned, please embed the links directly in your commentary. Note: if you’re going to write more than a paragraph or so, I encourage you to post it externally (I believe LinkedIn allows for posts like this) and provide a link or summary for inclusion here.
  • The fine print. I reserve the right to post and potentially reuse examples shared.

Now that we've covered those details, let's move on to the first challenge...

JANUARY #SWDchallenge: annotated line graph

While not always the case, I find frequently that the line graphs I use depict time. When you have time on the x-axis, you have a natural built-in construct for storytelling: the chronological story. When presenting data live, I’ll often build the time series point by point, talking the audience through the interesting context as I show the relevant data. Then I end with a final version where the most important parts of this context are annotated directly on the graph. This is the version that would be sent around (where you aren’t there to talk through it) and the audience has to process it on their own. Let's look at a few examples of annotated line graphs.

There is an example in the book I highlight in Chapter 9 that illustrates a progression similar to what I outline above, then I summarize with the following annotated line graph:

 

One amusing annotated line graph—one of my personal faves—that I sometimes discuss in my workshops was created by David McCandless and Lee Byron:

 

For another business example, here's an annotated line graph I created for a recent workshop based on a client example (details changed to protect confidentiality):

 

This is all prelude for my inaugural challenge for you: to create an annotated line graph. Sure, you’ve likely made a line graph before. But what could you try out that’s a little different this time? Or what data might you look at to learn something new?

Submit your entry, following the instructions above, by Tuesday, January 9th. Stay tuned for a post later in the month where I'll summarize the examples received. I look forward to seeing what you come up with!

what is story?

The year is winding down! While you were attending that holiday party or baking one more batch of gingerbread cookies, here at the Knaflic household, we've been recording another podcast. In this final 2017 episode, I discuss my perspective on what is story? and why those working with data should care. I also tackle listener Q&A on data visualization vs. storytelling, NFL football stats, and how to tell when a visualization is complete.

Listening time: 36:51. Links mentioned during the show:

Subscribe in your favorite podcast platform to be updated when new episodes are available. If you like what you hear, please rate or review the SWD podcast. Also, if you haven't already, listen to Episode 1, which focuses on feedback in data visualization. 

If you have any resources to share related to the topic of story, please do so in the comment box at the bottom of this post.

Next up for me: an early morning flight to the chilly midwest for a couple weeks off with family (hoping for a white Christmas!). Happy holidays!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

how we position and what we compare

When visualizing data, one piece of advice I often give is to consider what you want your audience to be able to compare, and align those things to a common baseline and put them as close together as possible. This makes the comparison easy. If we step back and consider this more generally, the way we organize our data has implications on what our audience can more (or less) easily do with the data and what they are able to easily (or not so easily) compare.

I was working with a client recently when this came into play. The task was to visualize funnel data for a number of cohorts. For each cohort, there were a number of funnel stages, or “gates,” where accounts could fall out: targeted, engaged, pitched, and adopted. Each of these stage represents some portion of those accounts that made it through the previous stage. In this case, the client wanted to compare all of this across a handful of cohorts and regions. Here is an anonymized version of the original graph:

 
Cohort Analysis 1.png
 

There are some things I like about this visual. Everything is titled and labeled. So, while it takes a bit of time to orient and figure out what I’m looking at, the words are all there so that I can eventually figure this out, helping to make the data accessible. But when I step back and think about what I can easily do with the current arrangement of the data, there are a number of limitations. Let’s consider the relative levels of work it takes to make various comparisons within this set of graphs.

The easiest comparison for me to make is looking at a given region within a given cohort and focusing on the relative stages of the funnel. For example, if we start at the top left, I can easily compare for the Q1 Cohort in North America the purple vs. blue vs. orange vs. green bar. This is because they are both (1) aligned to a common baseline and (2) close in proximity (directly next to each other).

The next most straightforward comparison I can make is for a given stage in the funnel, I can compare across the various regions for a given cohort. So again, starting at the top left, I can compare within the Q1 Cohort the first purple bar (Targeted in North America) scanning right to the next purple bar (Targeted in EMEA), and so on. They are still aligned to a common baseline, but in this case they aren’t right next to each other (I’m inclined to take my index finger and trace along to help with this comparison). This is a little harder than the first comparison described above, but still possible.

The next comparison I can make—and this one is quite a bit more difficult—is a step in the funnel for a given region across cohorts. Again, starting at the top left, I can take that initial purple bar (Targeted in North America) and now scan downwards to compare to that same point for the Q2 cohort and the Q3 cohort. This is harder, because these bars are not aligned to a common baseline and they are also not next to each other. I can see that the bottom leftmost purple bar is bigger than the ones above it. But if I need to have a sense of how much bigger, that’s hard for me to wrap my head around. The numbers are there via the y-axis to make it possible, but it means I'm having to remember numbers and perhaps do a bit of math as I scan across the bars, which is simply more work.

And if we step back and think about it… comparisons across cohorts… this is actually potentially one of the most important comparisons that we’d like to be able to make! Visualizing and arranging our data differently could make this easier.

Perhaps it’s just me (and this really could be the case), but when I think of cohort analysis, it actually reminds me of my days in banking (a former life) and decay curves, and when I think of “curves,” it makes me think of lines, which makes me want to draw some lines over these bars… Actually, let’s try that. Here’s what it looks like if I draw lines over the bars in the first graph (Q1 cohort):

 
Cohort Analysis 2_short.png
 

While I’m at it, I might as well draw lines across the other graphs, too:

 
Cohort Analysis 3.png
 

And now that we have the lines, we don’t need the bars…

 
Cohort Analysis 4.png
 

The bars would have likely been too much to put into a single graph. But now that I’ve replaced what was previously four bars with a single line—thus remaking my original 16 bars in each graph into 4 lines, or if we multiply that across the three graphs, I’ve turned 48 bars into 12 lines—those, I can potentially all put into a single graph. It would look like this:

 
Cohort Analysis 5.png
 

While it’s nice to have everything in a single graph, those lines on their own don’t make much sense. Next, I’ll add the requisite details: axis labels and titles so we know what we’re looking at.

 
Cohort Analysis 6.png
 

Note that I didn’t have space to write out “Targeted,” “Engaged,” “Pitched,” and “Adopted” for every single data point. Instead, I chose to use just the first letter of each of these along the x-axis, and then I have a legend of sorts below the region that lists out what each of these letters means. This may not be a perfect solution, but every decision when we visualize data involves tradeoffs, and I’ve decided I’m ok with the tradeoffs here.

You’ll perhaps notice here that I haven’t labeled the various cohorts yet. With this view, I could focus on one at a time (calling out either via text or my spoken narrative if talking through this live to make it clear what we are focusing on). For example, maybe first I want to set the stage and focus on the Q1 cohort and how it looked across the various funnel stages and regions:

 
Cohort Analysis 7.png
 

I could then do the same for the Q2 cohort (lower across everywhere: Is this expected? What drove this? My voiceover could lend commentary to raise or answer these questions):

 
Cohort Analysis 8.png
 

Then finally, I could do the same for the Q3 cohort (ah, now our metrics have recovered from their lows in the Q2 cohort and are now even higher than Q1, did we do something specific to achieve this? Looks like we targeted a higher proportion of the overall cohort, and it’s interesting to see how that impacted the downstream funnel stages):

 
Cohort Analysis 9.png
 

Note with this view, I could also focus on a given region at a time. For example, it might be interesting to note that these metrics are lower across all cohorts in North America compared to the other regions:

 
Cohort Analysis 10.png
 

Or the spread in APAC across cohorts might be noteworthy, as it’s the largest variance across cohorts compared to the other regions:

 
Cohort Analysis 11.png
 

This piece-by-piece emphasis could work well in a live presentation. But in the case where this is for a report or presentation that will be sent out where we’d likely have a single version of the graph (vs. the multiple iterations that can work well in a live setting so you can focus your audience on what you’re talking about as you discuss the various details), I’d venture to guess that the most recent cohort (Q3) is perhaps the most relevant, so let’s bring our focus back to that:

 
Cohort Analysis 12.png
 

Within the Q3 cohort, we may consider emphasizing one or a couple of data points. Data markers and labels are one way to draw attention and signal importance. If I put them everywhere, we’ll quickly end up with a cluttered mess. But if I’m strategic about which I show, I can help guide my audience towards specific comparisons within the data. For example, if the ultimate success metric is what proportion of accounts have adopted whatever it is we’re tracking (I’ve anonymized that detail away here), I might emphasize just those data points for the most recent cohort:

 
Cohort Analysis 13.png
 

Given the spatial separation between regions, I don’t necessarily have to introduce color here. But if I want to include some text to lend additional context about what’s going on in each region and what’s driving it, I could introduce color into the graph and then use that same color schematic for my annotations, tying those together visually:

 
Cohort Analysis 14.png
 

Let’s take a quick look at the before-and-after:

Cohort Analysis 15.png

Any time you create a visual, take a step back and think about what you want to allow your audience to do with the data. What should they be able to most easily compare? The design choices you make—how you visualize and arrange the data—can make those comparisons easy or difficult. Aim to make it easy.

The Excel file with the above visuals can be downloaded here. I should perhaps mention a hack I used to achieve this overall layout: each cohort is a single line graph in Excel, where I’ve formatted it so there is no connecting line between the Adopted point for one region and the Targeted point in the following region. (It may be brute force, but it works!)

introducing the SWD podcast

 
ColePodcast.jpg
 

I'm very excited to officially launch the storytelling with data podcast! This first episode focuses on feedback in data visualization. I discuss the value of both giving and receiving data visualization feedback and potential problem areas to avoid. Hear The Economist's response to the recent hurricane data visualization challenge as well as answers to reader questions on the topics of when to use graphs, considerations with dashboards, and data viz 101 book recommendations.

Big thanks to Timo Elliston, friend and awesome NYC composer/musician, for the amazing original music, and to hubby Randy for encouraging all of this in the first place, equipping our office with recording gear, and for always being my biggest supporter.

I hope your enjoyment of the session is as great as the fun we had making it happen. If you like what you hear, please be sure to rate the SWD podcast on your favorite podcast platform!

Links mentioned during the podcast:

 

Feedback? email feedback@storytellingwithdata.com
Blog post: SWD makeover challenge on The Economist’s hurricane graph
Article: “Design & Redesign in Data Visualization” by Fernanda Viegas & Martin Wattenberg
Blog post: my guiding principles
Article: The subtle art that differentiates good designers from great designers by UX Planet
Blog post: a tale about opportunity
Book: The Big Book of Dashboards by Steve Wexler, Jeff Shaffer & Andy Cotgreave
Book: The WSJ Guide to Information Graphics by Dona Wong
Book: Show Me the Numbers by Stephen Few
Book: The Visual Display of Quantitative Information by Edward Tufte
Questions? email askcole@storytellingwithdata.com

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.