Monday, April 14, 2014

exploratory vs explanatory analysis

I often draw a distinction between exploratory and explanatory data analysis. Exploratory analysis is what you do to get familiar with the data. You may start out with a hypothesis or question, or you may just really be delving into the data to determine what might be interesting about it. Exploratory analysis is the process of turning over 100 rocks to find perhaps 1 or 2 precious gemstones. Explanatory analysis is what happens when you have something specific you want to show an audience - probably about those 1 or 2 precious gemstones. In my blogging and writing, I tend to focus mostly on this latter piece, explanatory analysis, when you've already gone through the exploratory analysis and from this have determined something specific you want to communicate to a given audience: in other words, when you want to tell a story with data.

Keeping this distinction in mind, I thought it might be interesting to look at a recent makeover and show how the visual you could use for the exploratory and explanatory steps of the analytical process might differ.

For this (generalized & simplified*) example, imagine that you work for a car manufacturer. You're looking at customer feedback, specifically to better understand how failed or less-than-ideal performance across various dimensions for a given make and model impacts customer satisfaction. The primary output variable you're looking at in this case is an overall question in your customer satisfaction survey, where customers are asked to express their overall satisfaction with their car along a 5-point Likert scale (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied). Let's assume you're most interested in anyone responding with anything other than Very Satisfied, and want to understand how this varies by customers who have reported specific issue(s) with their car, by the type of issue.

*Please keep in mind that I'm making up the specific scenario here; the makeover is a generalized example from a past workshop where I don't have all of the details and also don't have other data that would possibly be of benefit in the exploratory and explanatory phases. For example, there are likely other things that drive the overall satisfaction with the car, which we're ignoring here. Also, anytime you show percents like this, I'd recommend also showing the N count - in this case, the number of people reporting the given issue - which will be helpful for the interpretation of the data.

Your initial visual might look something like the following:


In the above, I've grouped all of the "less than very satisfied" responses (in orange), with the data arranged in descending order of this metric. With this visual, you can scan through the various issues and see the relevant "less than very satisfied" metric. This might be useful for part of your exploratory analysis.

Once you've identified something or some things to focus on, in some cases it will make sense to create a different visual for the purpose of focusing on that thing or those things. Alternatively, the same visual and be modified for explanatory purposes by drawing attention to the points of interest, while preserving the other data for context:


We can use the same visual and approach for highlighting another potential point of interest:


Or another:


Note how, when we focus on one aspect or story, it's actually harder to see the others. That's one of the reasons it's important to do exploratory analysis before you get to the explanatory phase: so you can have confidence that you're focusing your audience on the right thing(s).

In case it's of interest, the Excel workbook with the above graphs can be downloaded here.

Thursday, April 10, 2014

just because you have numbers doesn't mean you need a graph

I subscribe to updates from the Pew Research Center. They arrive in my inbox with subject lines like "Future of Internet, News Engagement, God and Morality" (yes, this was an actual title from their March 13th update - quite a span of topics!) and probably 90% of the time get moved to my trash without a second thought. But in a fraction of cases, something in that subject line catches my eye and I open the email to read more. Sometimes, this even prompts me to click further to the full article.

The snippet that caught my attention this time was "Stay-at-Home Mothers on the Rise." The link I clicked on within my email brings you here.

A quick scan through and I found that I was hardly able to focus on the article because of the issues plaguing the visuals that accompany it. There are many. But I'll focus on just a single one today and keep this rant very short and sweet:

Just because you have numbers doesn't mean you need a graph!

The following graph prompted this adage:


That's a whole lot of text and space for a grand total of two numbers. The graph does nothing to aid in the interpretation of numbers here! Even the fact that 20 is less than half of 41 doesn't really come across clearly here visually (perhaps because of the way the numbers are place above the bars?).

Rather, the above can be conveyed in a single sentence:
20% of children had a "traditional" stay-at-home mom in 2012 (compared to 41% in 1970). 

Just because you have numbers doesn't mean you need a graph!

For a less ranting delivery of a similar lesson, check out my post the power of simple text.

Monday, April 7, 2014

a storytelling with data ad

One of my favorite indulgences on a weekend morning is to sit in the sun on our terrace and read the latest copy of Dwell magazine. A number of things in the universe have to align to make this possible: namely, the sun must be shining and the child must be sleeping. The universe aligned in just this way this past Saturday (bliss!).

I find that the design of products and the design of spaces can sometimes influence my thinking, spark an idea, or act as inspiration when it comes to the visual design work on which I focus much of my attention. On this particular read through Dwell, it was the following advertisement that caught my eye:


This ad caused me pause for a few reasons:
  1. The leading stat - 1 in 5 children go to school hungry - is powerful. When it comes to communicating a number or two, tables and graphs don't usually have a place, as the numbers themselves carry a lot more attention-grabbing power.
  2. The use of preattentive attributes to make certain elements of the visual distinct: the numbers at the top are in bold, all caps and underlining draw attention to the second line, and the sort of sea-green in the logo and text at the bottom emphasize the un and is (when it comes to this last point, I might have chosen different portions of text to draw attention to, but I think that's one of those things that can be up for debate and probably there was a good reason the designer chose these particular pieces - perhaps the dichotomy between un and is?).
  3. The story. It's short and sweet, but still a robust example of storytelling with data, which, with the personal anecdote and picture are made to be much more human than a simple stat on its own would be.
  4. The picture. Speaking of pictures, one frequent question in my workshops is about the use of pictures when it comes to visual communication. I don't use pictures a lot personally, but as mentioned above, I do definitely think there are ways to use pictures that appeal on a different level than numbers do. Here, I think the pairing of the two is effective.
What do you think of this ad as an example of storytelling with data? Is it effective? Why or why not? Leave a comment with your thoughts!