Friday, April 29, 2011

beautiful visualization

I have a growing collection of books on the topic of data visualization. I seem to accumulate them more quickly than I can read them. But every once in a while I have an opportunity (an uninterrupted hour or two) to narrow the gap. I encountered one such occasion this afternoon. I had checked all of my to dos off my to do list and decided to give my eyes a break from the computer screen to look at actual physical pages for a bit. My eyes appreciated the change in visual medium; my brain appreciated the content.


The book I occupied myself with is called Beautiful Visualization: Looking at Data Through the Eyes of Experts. It's a collection of writings on various topics from experts in the field of infoviz. Some of the names I am familiar with (Jonathan Feinberg has a section on Wordle, Aaron Koblin and Valdean Klump do a deep dive on flight patterns, Martin Wattenberg and Fernanda Viegas write about visualizing Wikipedia), but many are new. For my afternoon reading, I chose two from this latter category.

The first (which, technically is the last, at least in the order in which the book is written) is Jessica Hagy. Interestingly, finding her name in this book was the second time I encountered her work today. I had lunch with a connection from one of my speaking events and we were chatting about our favorite blogs; one of hers is Indexed, written by Jessica Hagy. The blog consists of a daily posting of an index card, on which Jessica has drawn a clever (and often quite amusing) visual display of data. It might be my new favorite blog. In Beautiful Visualization, Jessica uses this same style to discuss the ever increasing amount of data in our world and the power of images to make that data more accessible in a much faster way than words can. If you want to understand the parallel between an elephant and visualization, you need to read the book (the likeness is uncanny!).

After starting at the end of the book (very unlike me, by the way, I typically do everything in order), I went back to the beginning. The first section, by Noah Illiinsky, is titled On Beauty. By his definition, a beautiful data visualization is one which is novel, informative, efficient, and aesthetic. His guidance on how to achieve this is as follows (descriptions are in my own words except where noted and highly condensed):
  1. Step outside default parameters. My favorite line from this section (actually, from everything I read): In most situations, well-defined formats have well-defined rational conventions of use: line graphs for continuous data, bar graphs for discrete data, pie graphs for when you are more interested in a pretty picture than conveying knowledge. (Making fun of ineffective pie charts pretty much speaks directly to my heart!). His point here is that to be beautiful, a visual must be novel and create shock and awe. I'm not sure I agree with this sentiment, but more on that in a bit.
  2. Make it informative. I certainly can't argue against this one. Noah says that a clear understanding of message and the needs of the audience are key here (um, yes). If you can't state those concisely, you're nowhere near the point of trying to put together a visual.
  3. Make it efficient. Every bit of visual content will make it take longer to find any particular element of the visualization. In Duarte language, this is the step where you dial up the signal and dial down the noise. Visually emphasize what matters and get rid of the stuff that doesn't or push it to the background so it doesn't distract.
  4. Leverage the aesthetics. Use the basic components of the graph (titles, axes, etc.) to increase the utility of the visualization. Make the visual something that's comfortable for your audience to look at. They're more likely to look at it.
My one hesitation with the above is the first rule. In some cases shock and awe is good, but I think whether or not that is a necessary condition for a beautiful visualization is highly context dependent. Perhaps it's my penchant for bold, blue bar graphs, but in business communication, I generally find simple graphics to be the most beautiful (perhaps due to their relative rarity?). Though I suppose in reality what I'm generally aiming for is not beauty, but rather effectiveness. In that case, my personal spin on the above would be to say that necessary conditions of an effective visual display of data are that it must be informative, efficient, and aesthetic.

It's certainly great food for thought. I'm looking forward to the remaining 18 sections!

Wednesday, April 27, 2011

a good chart takes time

Last month, I spoke at the GMN Conference in Seattle. Leading up to the conference, I asked participants to send me examples of data visualizations on which they would like my feedback. The following was one such submission. The slide was titled "more generational differences".


The title made clear that the graph was intended to show differences. After scanning through the bars, I see by the legend at the bottom that I'm looking at Baby Boomers in red and Millennials in blue. Now that I've deciphered what I'm looking at, my eye scans back up over the bars to look for interesting differences. I see a big gap in the proportion who have created social network profiles. And some other gaps. I've been able to come up with a couple interesting observations, but it's taking some time and I feel like there might be more nuanced insights that I'm overlooking. I think we can remedy this (and also reduce the burden of work we're putting on the audience) by making a few relatively quick changes.

Here's my makeover:


Here's an overview of the changes I made and why:
  • Title and legend: Every graph needs a title! Think of it as an introduction to your audience (audience, meet my chart) - it ensures they know what they're looking at. This also frees up your slide title to call out something more interesting than the title of your graph. In regards to the legend: in Western cultures, most people read from left to right, top to bottom, so by placing the legend at the upper left, we make it so the audience encounters how to read the chart before they get to the actual data. It's a nice thing to do, and makes it so the audience doesn't have to search to figure out what's they're looking at.
  • Color scheme: the bold text and red and blue series on the original graph are all of similar strength, which means they're all vying equally for the audience's attention. In the remake, I've provided a visual order of priority through the colors. I drew the audience's attention to the Millenials trend in blue, made the Baby Boomers trend grey, so it's still there for comparison but isn't competing for attention. 
  • Text: I found the bold text in the original version distracting, so cleaned that up and also right-aligned the labels and made the spacing such that each label would fit on one line (trick: you get more flexibility in Excel by having your labels in cells in the worksheet, rather than on the graph directly; just be careful that your labels line up correctly with the data). I also labeled my x axis (% agreeing with statement) to make it clear to the audience what they are looking at. (Every axis needs a label!)
  • Order: It wasn't clear to me that there was an intentional scheme to the ordering in the initial graph. I played around with this a bit. Since the original slide title called out the difference, I decided to actually show that on the graph and arranged the information in descending order of difference.
It turns out that the original chart is what you get if you put the data into Excel and let it create a chart for you (I realized this when I initially graphed the data - it looked just like the original chart, with the exception of the bold text - that was an explicit choice). 

Creating a chart in Excel should be considered the first step of your data visualization process, not your last. 

Once you have the initial graph, look at it. Think about the point you want to make and how you can make that point clear to your audience. Draw attention through explicit use of color. Push things that don't need to draw attention to the background. Basically, play around with emphasis and de-emphasis. Label everything. Think about whether there is an intrinsic order that you should leverage. A good chart takes time, but it's time well spent if it allows you to draw better insights or make a point to your audience (and hold their attention long enough to make it!).