Sunday, November 25, 2012

celebrating (almost) 100 posts with 10 tips

As I was looking at the underbelly of my blog the other day (the side only I see that has info on posts, pageviews, etc.), a number caught my eye:

99

This number described the number of posts I have published on this blog. Which meant that my next post (the one you're reading currently) would be my 100th. This seems like a significant number.* 
*Upon closer evaluation, 99 actually describes total posts... published + drafts. Removing my drafts makes this the 91st published post. Since 100 was a somewhat arbitrary number anyway, I decided to go ahead with this post now instead of wait for the actual 100th post. I guess we can consider this a celebration of 91 posts published!

It's amazing to me how the time since I began writing this blog has gone by...I've been sharing my thoughts on the same topic for a time quite suddenly better measured in years than months, with my interest in learning and teaching and writing about communicating effectively with data continuing to grow.

I thought I'd use this (almost) 100th post as an excuse to look back at storytellingwithdata posts over the past two years and handpick my top 10 tips for telling a visual story with data. Here they are, in rough order of my general approach to the visualization process (click the link for the full relevant post):

cole's 10 tips for effective storytelling with data
  1. Set aside time for the visualization process.
  2. Start with a blank piece of paper.
  3. Keep your audience top of mind.
  4. Generally avoid pie charts.
  5. Always label your axes.
  6. Leverage preattentive attributes.
  7. Declutter your visuals.
  8. Consider cutting gridlines.
  9. Employ visual editing.
  10. Use words to make your visual accessible.

Thank you very much for reading and I hope you'll join me for the next 100+ posts!

Sunday, November 11, 2012

data viz challenge... how teens research

I subscribe to updates from the Pew Research Center. It's a great way to ensure a consistent inflow of data, which is useful as I gather and examine examples for workshops and my blog. Often, the incoming email gets quickly scanned and archived. But last week one of the titles piqued my interest, so I clicked to learn more about How Teens Research in the Digital Age.

When I think back to my own research projects in high school, I have images of trekking to the county library and using large computer terminals to locate old news articles on microfiche. Or making the even longer journey into the city to utilize the massive book collection at the university library. (I grew up in the sticks and when I was younger, these truly did seem like Illiad-like voyages...adventures to the city even involved a ferry boat!).

As I suspected, the means for researching for teens today are very different. They don't even have to leave their house (or their room!) if they don't want to, since the internet, and thus the world's information, is at their fingertips.

Alright, that's sufficient prelude. You're probably wondering about this data challenge that I mentioned in the title. On to that. I was surprised reading this article how many stats were included and yet how few visuals. Only one graph, in fact:


My challenge to you is this: read the article and determine what data you find most interesting then visualize it. You can remake the above graph, or focus on bringing life to numbers included in the report by making them visual. Submit your entries via the following instructions by Sunday, November 25th (those in the states need something to do over Thanksgiving break, right?).

When complete, you can leave a comment with the story you would focus on and a link to your visual, or email it to me directly (cole.nussbaumer@gmail.com) along with any comments you'd like me to post with it, and I'll put it into Dropbox and create a comment for you with the link (if you don't already have a Dropbox account, this is a good reason to get one!).

I'll invite the creator of my favorite to write a guest blog post. Happy data visualizing!

Friday, November 2, 2012

to stack or not to stack

My husband came across the graphical focus of this post in his Google+ stream last week. The original source is a Wall Street Journal blog post summarizing a recent Forrester report, where the main story can be summed up by one of the Forrester quotes within the post, "The future is one where no single OS or vendor is dominant - Microsoft is extremely late to the market expansion into mobile and has lost its dominant position."

Here is the graph included to illustrate this point:


It's true that you can get the evidence to support the claim made from this graph: once you identify the light blue portion of the bar as Microsoft, we see clearly that it decreases over time as the orange portion (Google) and the yellow portion (Apple) become increasingly prominent. But I'm of course not satisfied with it. The color palette is strange. Color in general could be used more strategically here. We can eliminate the work of going back and forth between the legend and the data it describes. I'm also not sure how I feel about the stacked bar chart.

Let's look at a couple variations on this data viz. First, here's what it could look like if we preserve the stacked bar and use color a little differently (note: I didn't have the raw data, so the remakes below are based on me eyeballing the figures and likely aren't entirely accurate):

Other minor changes I made above:
  • Added an action title so it's clear what to look for in the graph (this was included as tiny text below the graph in the original post).
  • Oriented the graph title and legend text at upper left - so reader encounters how to read the data before they get to it.
  • Added a title to the y-axis. Always include this!
  • Added data labels to the Microsoft series. This acts to both draw more attention to Microsoft, as well as to give a quick numerical view of the decrease over time.
  • Narrowed the bars. In the original, they are bordering on too thick so that our eye starts to try to compare the area rather than the height.
When it came to color, I took a look at Microsoft's logo. I nearly always use blue to highlight the areas to which I want to draw attention. In this case, I actually tried venturing out using the red color from the logo and then the green color from the logo. But both just looked a little off (the burnt red looked overly negative to me, the green a little pukey). So I went with the blue from the logo (matching by eye - it's not perfect but close). I chose a grey palette for the remaining series.

I still don't love this, for a couple of reasons.

First, I think I just don't like stacked bar charts. This is actually probably a good use case for this graph type, since this lets us emphasize the percent of total and how that's changing over time. But I still don't love it. Because the bars aren't oriented on a consistent baseline, our eyes are forced to compare differing heights starting from different points. That's fine to get a general view, which is probably all we need here. Perhaps I'm just being overly finicky.

Second, it's still a bit of work to go back and forth between the legend and the data. If you don't recognize that the legend (left to right) is in the same order as the right-hand bar (top to bottom), it could prove difficult to see quickly which series is which.

I thought I would like a line graph of this data better - it would allow me to organize the series on a consistent baseline as well as label each directly. But then I graphed it and reconsidered:


Personally, my issues with this line graph version are greater than my issues with the stacked bar. We lose clear visibility/confidence that the lines sum to the total market, 100%. I also worry that in this case, the lines make it at first glance appear that we have more data than we do. Perhaps three points per series is too few for a line graph? If not in general, than I think that is at least true in this case. The overlapping nature of the lines creates a sort of spaghetti graph (as if I had a handful of uncooked spaghetti noodles and threw them on the ground). I tried to make this better by emphasizing the main three series (Google, Microsoft, and Apple) and de-emphasizing the others. But it still isn't great.

Given this, I'm back to the stacked bar chart. I think the information comes across most quickly with that structure.

What do you think - is a stacked bar the best choice here? Or are there additional options worth considering?

In case you're interested, here is the Excel file with the makeovers.