Monday, April 14, 2014

exploratory vs explanatory analysis

I often draw a distinction between exploratory and explanatory data analysis. Exploratory analysis is what you do to get familiar with the data. You may start out with a hypothesis or question, or you may just really be delving into the data to determine what might be interesting about it. Exploratory analysis is the process of turning over 100 rocks to find perhaps 1 or 2 precious gemstones. Explanatory analysis is what happens when you have something specific you want to show an audience - probably about those 1 or 2 precious gemstones. In my blogging and writing, I tend to focus mostly on this latter piece, explanatory analysis, when you've already gone through the exploratory analysis and from this have determined something specific you want to communicate to a given audience: in other words, when you want to tell a story with data.

Keeping this distinction in mind, I thought it might be interesting to look at a recent makeover and show how the visual you could use for the exploratory and explanatory steps of the analytical process might differ.

For this (generalized & simplified*) example, imagine that you work for a car manufacturer. You're looking at customer feedback, specifically to better understand how failed or less-than-ideal performance across various dimensions for a given make and model impacts customer satisfaction. The primary output variable you're looking at in this case is an overall question in your customer satisfaction survey, where customers are asked to express their overall satisfaction with their car along a 5-point Likert scale (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied). Let's assume you're most interested in anyone responding with anything other than Very Satisfied, and want to understand how this varies by customers who have reported specific issue(s) with their car, by the type of issue.

*Please keep in mind that I'm making up the specific scenario here; the makeover is a generalized example from a past workshop where I don't have all of the details and also don't have other data that would possibly be of benefit in the exploratory and explanatory phases. For example, there are likely other things that drive the overall satisfaction with the car, which we're ignoring here. Also, anytime you show percents like this, I'd recommend also showing the N count - in this case, the number of people reporting the given issue - which will be helpful for the interpretation of the data.

Your initial visual might look something like the following:


In the above, I've grouped all of the "less than very satisfied" responses (in orange), with the data arranged in descending order of this metric. With this visual, you can scan through the various issues and see the relevant "less than very satisfied" metric. This might be useful for part of your exploratory analysis.

Once you've identified something or some things to focus on, in some cases it will make sense to create a different visual for the purpose of focusing on that thing or those things. Alternatively, the same visual and be modified for explanatory purposes by drawing attention to the points of interest, while preserving the other data for context:


We can use the same visual and approach for highlighting another potential point of interest:


Or another:


Note how, when we focus on one aspect or story, it's actually harder to see the others. That's one of the reasons it's important to do exploratory analysis before you get to the explanatory phase: so you can have confidence that you're focusing your audience on the right thing(s).

In case it's of interest, the Excel workbook with the above graphs can be downloaded here.

No comments:

Post a Comment