Wednesday, April 23, 2014

focusing with color

In my previous post, I discussed the distinction between exploratory and explanatory analysis and showed how you can sometimes leverage the same visual when moving from the former stage to the latter, with some minor tweaks. Today, I'd like to consider another example of this and also illustrate how you can use iterations of the same visual to focus your audience with color.

We'll continue with the imagined scenario where you work for a car manufacturer. Today, you're interested in understanding and sharing insight around top design concerns for a particular make and model. Your initial visual might look something like the following:

The above visual could be one of those you create during the exploratory phase: when you're looking at the data to understand what might be interesting or noteworthy to communicate to someone else. The above shows us that there are 10 design concerns that have 8+ concerns greater than 1,000 (the rest of the tail has been chopped off, which would probably be worth a footnote with a little detail on how long the tail is, perhaps how many design concerns there are in total, etc. if you're using this to communicate to others).

You can leverage the same visual, together with thoughtful use of color and text to further focus the story:

Continuing to peel back the onion, we can go a level further than this, again using the same visual with modified focus and text to lead our audience from the macro to the micro parts of the story:

Repeated iterations of the same visual, with different pieces emphasized to tell different stories or different aspects of the same story (as above) can be particularly useful in live presentations, because you can orient your audience with your data and visual once and then continue to leverage it in the manner illustrated above.

If you're interested, you can download the Excel file with the above visuals here.

Monday, April 14, 2014

exploratory vs explanatory analysis

I often draw a distinction between exploratory and explanatory data analysis. Exploratory analysis is what you do to get familiar with the data. You may start out with a hypothesis or question, or you may just really be delving into the data to determine what might be interesting about it. Exploratory analysis is the process of turning over 100 rocks to find perhaps 1 or 2 precious gemstones. Explanatory analysis is what happens when you have something specific you want to show an audience - probably about those 1 or 2 precious gemstones. In my blogging and writing, I tend to focus mostly on this latter piece, explanatory analysis, when you've already gone through the exploratory analysis and from this have determined something specific you want to communicate to a given audience: in other words, when you want to tell a story with data.

Keeping this distinction in mind, I thought it might be interesting to look at a recent makeover and show how the visual you could use for the exploratory and explanatory steps of the analytical process might differ.

For this (generalized & simplified*) example, imagine that you work for a car manufacturer. You're looking at customer feedback, specifically to better understand how failed or less-than-ideal performance across various dimensions for a given make and model impacts customer satisfaction. The primary output variable you're looking at in this case is an overall question in your customer satisfaction survey, where customers are asked to express their overall satisfaction with their car along a 5-point Likert scale (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied). Let's assume you're most interested in anyone responding with anything other than Very Satisfied, and want to understand how this varies by customers who have reported specific issue(s) with their car, by the type of issue.

*Please keep in mind that I'm making up the specific scenario here; the makeover is a generalized example from a past workshop where I don't have all of the details and also don't have other data that would possibly be of benefit in the exploratory and explanatory phases. For example, there are likely other things that drive the overall satisfaction with the car, which we're ignoring here. Also, anytime you show percents like this, I'd recommend also showing the N count - in this case, the number of people reporting the given issue - which will be helpful for the interpretation of the data.

Your initial visual might look something like the following:

In the above, I've grouped all of the "less than very satisfied" responses (in orange), with the data arranged in descending order of this metric. With this visual, you can scan through the various issues and see the relevant "less than very satisfied" metric. This might be useful for part of your exploratory analysis.

Once you've identified something or some things to focus on, in some cases it will make sense to create a different visual for the purpose of focusing on that thing or those things. Alternatively, the same visual and be modified for explanatory purposes by drawing attention to the points of interest, while preserving the other data for context:

We can use the same visual and approach for highlighting another potential point of interest:

Or another:

Note how, when we focus on one aspect or story, it's actually harder to see the others. That's one of the reasons it's important to do exploratory analysis before you get to the explanatory phase: so you can have confidence that you're focusing your audience on the right thing(s).

In case it's of interest, the Excel workbook with the above graphs can be downloaded here.

Thursday, April 10, 2014

just because you have numbers doesn't mean you need a graph

I subscribe to updates from the Pew Research Center. They arrive in my inbox with subject lines like "Future of Internet, News Engagement, God and Morality" (yes, this was an actual title from their March 13th update - quite a span of topics!) and probably 90% of the time get moved to my trash without a second thought. But in a fraction of cases, something in that subject line catches my eye and I open the email to read more. Sometimes, this even prompts me to click further to the full article.

The snippet that caught my attention this time was "Stay-at-Home Mothers on the Rise." The link I clicked on within my email brings you here.

A quick scan through and I found that I was hardly able to focus on the article because of the issues plaguing the visuals that accompany it. There are many. But I'll focus on just a single one today and keep this rant very short and sweet:

Just because you have numbers doesn't mean you need a graph!

The following graph prompted this adage:

That's a whole lot of text and space for a grand total of two numbers. The graph does nothing to aid in the interpretation of numbers here! Even the fact that 20 is less than half of 41 doesn't really come across clearly here visually (perhaps because of the way the numbers are place above the bars?).

Rather, the above can be conveyed in a single sentence:
20% of children had a "traditional" stay-at-home mom in 2012 (compared to 41% in 1970). 

Just because you have numbers doesn't mean you need a graph!

For a less ranting delivery of a similar lesson, check out my post the power of simple text.