Thursday, September 13, 2012

some finer points of data visualization

Last month, I conducted the first storytelling with data Data Viz Challenge. In addition to eternal notoriety, I promised the winner the invitation to write a guest blog post (in case you're interested, a full rundown of the entries and my comments about each can be viewed here). Winner Jeff Shaffer came through with the following post, which I'm excited to share with you here.
_________________________________

I have enjoyed reading Cole's blog at storytellingwithdata.com, so when she invited me to write a guest post I was thrilled with the opportunity. The challenge became focusing in on the exact topic for my post. Cole has done some terrific redesigns over the years, turning some not-so-good charts into good data visualizations. It would have been easy to find another bad chart and post a redesign because let's face it, there are more bad examples out there than good ones. So for this post I decided to cover some of the finer points of design in data visualization.

Before I do a critique of a chart, I wanted to share my view on creating a good data visualization. I teach data visualization at the University of Cincinnati and as part of the course I cover what I call "The Shaffer 4 C's of Data Visualization". They simply serve as a guideline to follow when creating or critiquing a data visualization.

The Shaffer 4 C's of Data Visualization:

  1. Clear - easily seen; sharply defined. Who's the audience? What's the message? Clarity is more important than aesthetics Ex. good chart title, critical labels, units of measure, avoiding rotated text, good color choice, etc.
  2. Clean - thorough; complete; unadulterated. Ex. not overlabeling axis and data points, too many gridlines or too dark, proper formatting, using the right chart type, poor color choice, etc.
  3. Concise - brief but comprehensive
    Not minimalist but not verbose
  4. Captivating - to attract and hold by beauty or excellenceDoes it capture attention? Is it interesting? Does it tell the story?

It's important to understand that certain elements can affect more than one area. For example, if there is a poor chart type or a 3D graphic used it could violate both the Clear and Clean principle and if the chart is loaded with data labels at every opportunity then it could easily violate both Clean and Concise. On the other hand it's quite possible to create a very Clean chart following all of the appropriate data visualization rules, but the message is lost (not Clear) or it may not be a story worth telling (not Captivating).

Color is another example that could affect multiple things. For example, using red/green would not be Clear to someone who is colorblind or using a categorical color scheme instead of a sequential color scheme for a certain data type might be very confusing. Alerting colors might confuse the message drawing attention to something it shouldn't. However, over use of color, gradient or shadow could also affect Clean. Even if the message is Clear, it might still be a sloppy looking chart with poor color choices. For example, bright pink mixed with red might cause a visceral reaction to the clash.

One final comment on the 4 C's of Data Visualization. I specifically used Concise to contrast what I believe to be a minimalist approach to data visualization by Edward Tufte and some others in the field. It isn't necessary in my view to save ink as if my printer cartridge were running dry. I also believe it's ok at times to have extra emphasis, even if it's redundant and I think the use of color can be used to help with Captivating so that the visualization isn't boring. What would the world be like if every chart were black and white, shades of gray, or blue and orange? Don't get me wrong, I have nothing against any of these and the blue/orange colorblind-friendly palette is one of my favorites, but we can't use it for everything.

On the flip side, there is a fine line between adding color for this purpose and that color becoming distracting, alerting or overpowering the reader. Jeffrey Heer, Associate Professor at the University of Washington and formerly with the Stanford Visualization Group, co-authored a paper with Wesley Willet and Maneesh Agrawala discussing Scented Widgets. "Visual Scent" was used to describe navigation cues embedded in visualizations. It's a great paper and I think the term visual scent will be used more, but I will add to the lexicon my coining my own term, "Visual Order". It's far too easy to create a chart in Excel that looks like Pac Man eating a skittles rainbow (yes, this is a real chart that someone produced with the simple addition of the eye added for effect). I won't critique this chart today.
Below is a chart to examine:
I ran across this chart on the University of Cincinnati Health website and the reason I picked this chart is because it's actually a pretty good chart.
  • It's the right chart type for the data. The bar chart allows for easy comparisons visually between the institutions. Bar charts are always a good choice for categorical comparisons.
  • It's ranked in order providing a quick and easy understanding of the verification rankings.
  • Reasonable abbreviations were used to shorten the names that would otherwise be very long.
  • The message is fairly Clear, UCNI is #1 receiving 13 verifications in 14 specialty areas of neurological care (note the benefit of a good title).
  • The chart has good use of color, emphasizing UCNI compared to the other institutions. Blue and red aren't exactly complimentary, but red is the University of Cincinnati color so that's a natural choice in this case. This color combination also avoids red/green which allows for someone who is colorblind to make the color distinction for the same visual message. You can test your own images at http://www.vischeck.com or download the free Adobe Photoshop plug-in.
  • The chart has good detail in the note section which gives the reader more information on how the designations are done and the fact that UCNI is working on the 14th specialty area.
  • From a design standpoint it is always best to use a dark font on white or light color and a light font on dark color. In this case the creator wisely chose a white font color on the color bars and black font on the white and light grey.
  • The gridlines are muted so that they are not distracting or creating a moire effect.

Compared to many charts out there, including some of the examples Cole has critiqued in her redesigns, this would be considered a pretty good chart. However, this chart can be improved when examining some of the finer design changes that can be made.

  • It's best to avoid rotated text whenever possible (Clear and Clean). In this case the text was only rotated by 45 degrees, so it's not as hard to read as it would have been if it were rotated a complete 90 degrees (which is commonly done on long labels). I try to avoid rotated text as much as possible, even small angle rotation. The text label "Barrow Neuro. Institute" is actually below 4 bars and requires the eye to follow that text to the end to determine the bar it represents. Try to quickly compare Barrow Neuro. Institute to UC San Francisco. The eye has a hard time keeping a place holder for the comparison. The best solution to solve this is to rotate the chart instead of the text. This allows the reader to read the text normally while still using the bars for the visual comparison. It also puts UCNI at the top of the of the list, which is where they are in the ranking.
  • There is no need for the y-axis label (Clean and Concise). The purpose of axis labels is to give an approximate value for the bars. In this case we have every bar labeled with the value. In cases where there are lots of categories (and this could be one of those cases) then it might be better to remove the individual data points and simply use the axis labels. If using that method then I might highlight UCNI with a single data point for emphasis (still keeping with Concise and Clean).
  • The gridlines are interfering with the paragraph of text (Clean). This is partially due to the increment of the gridlines being set as 1, but it's also the white gridline contrasting with the dark text. There are a number of ways to solve this, for example adding a slightly filled background box to the text or deleting the gridlines completely.

Below is an example redesign:


  • I used a free tool called ColorPic to get the exact colors that were used in the original chart. ColorPic is a utility that will extract the exact color hue, saturation, value and RGB color code from any point hovered over with the mouse.
  • In this case I copied the original color scheme exactly and did not make any adjustments for the gradient of the bars. I recommend avoiding gradient, but the use in this case is so minimal that I simply left it alone for now to preserve the original color scheme. However, notice that even with the tiniest of gradient effects there is still a visual impact on the bars. The left sides of the bars (and the bottom part of the original) are darker and seem to have more weight to them.
  • Axis labels for the values were removed since the bars have data labels.
  • The gridlines are now in increments of 2 instead of 1, but still muted.
  • The paragraph of text is now in the bottom right hand corner of the chart. Notice that I used a gradient effect on the gridlines, muting them to nothing on the bottom right of the chart. They serve no purpose in this region since the bars do not extend to this area. This allows us to keep the gridlines in the area where the bars are without interfering with the text.
  • I changed the font color of the institution names to blue to match the bar.
  • I placed a text box on top of the UCNI text label since Excel doesn't have an option to change the color of a single axis label like it does for a single data label. Now UCNI matches the red bar. 
  • I added the UC Health logo to add to the presentation.
  • Finally, I would usually add the author name and data source as a note at the bottom, but since I don't have the information from the original chart I am unable to do that.

Taking some liberties with the original color scheme and avoiding the gradient effect yields an even better version that isn't as dark and heavy. Note in this version I also removed the background fill and when doing that the bars will hang in the air. I agree with Stephen Few on this point who advocates using an axis bar in this case. Although it might be considered "more unnecessary ink" by some, I prefer this over the dangling bars and to visually set them at a baseline.


As I stepped through this same exercise this past week in my data visualization class, one of the students remarked on one additional improvement that I had mentioned in class as a best practice, but had neglected in this chart. They pointed out that having the data labels set at the inside base instead of at the ends of the bars is visually better. It puts the data point immediately next to the text labels and creates a data table that is easy to read vertically. This allows for quick, easy comparisons and doesn't force the eye to jump back and forth from right to left. While I don't think there is anything wrong with the chart above, I do agree with that best practice because it makes it a bit more Clear.


I hope this example showcases some of the finer points of design for data visualization. We often cover the topics of redesign where the charts are so bad that almost anything would be an improvement. In this particular case it is the careful attention to a few details and applying the 4 C's that help make this chart a better presentation of the data.

I would like to thank Cole again for this opportunity to write a guest post on her wonderful blog. Keep up the great work, Cole!

Jeffrey A. Shaffer

Jeffrey A. Shaffer is the Vice President of Information Technology and Analytics at Unifund. Mr. Shaffer joined Unifund in 1996 and has been instrumental in the creation and development of the complex systems, analytics and business intelligence platform at Unifund. Mr. Shaffer holds a BM and MM degree from the University of Cincinnati and an MBA from Xavier University where he was the winner of the 2006 Graduate Student Scholarly Project in Research. Mr. Shaffer has attended the Harvard School's Executive Education Program, is a Certified Manager of Quality and Organizational Excellence through the American Society for Quality, a Certified Project Management Professional through the Project Management Institute and has completed Six Sigma Green Belt and Black Belt training with the Xavier Consulting Group. Mr. Shaffer is also Adjunct Assistant Professor at the University of Cincinnati in the Carl H. Lindner College of Business teaching Data Visualization in the Graduate Course series for Data Analytics. He is also a regular speaker at business intelligence conferences and symposiums on the topic of data visualization, writes for the data visualization blog at MakingDataMeaningful.com for Lucrum, Inc. and was a finalist in the 2011 Tableau Interactive Visualization Competition.


1 comment:

  1. Good chart design can become more challenging when you do not know in advance what the data and labels are, which is often the case in software applications.

    Rotating the bar chart as you have done here also better prepares the chart for much larger data values. For example, this new design will easily support bar value labels of 1300, 900... while the labels in the original design would have failed.

    ReplyDelete