Each year around this time, the US financial institution PNC produces the "Christmas Price Index," in which they calculate the cost of Christmas based on the items in the 12 Days of Christmas carol. I guess it's a sort of merrier (at least in theme) version of the Consumer Price Index and is meant to provide some economic insight into how the price of goods changes from year to year.
This year, they've layered on an interactive layer of glitz: the Christmas Price Index Express. Fast Company describes it as "A game-enhanced site with a handmade feel, the Index Express appears as a magical train that carries visitors through an alpine world to collect each of the 12 gifts. But it's essentially an elaborate interactive infographic, where the data points come to life with animation and sound." (Fast Company article) Whatever it is, it takes forever to load and I wasn't patient enough to spend time on the Index Express (where there are literally bells and whistles), rather, I clicked through the site long enough to find what I really wanted to get my hands on: the underlying data.
PNC certainly didn't make the data easy to extract. After painstakingly copying and pasting data from each of the 13 pages (total cost of Christmas plus one page for each day of Christmas) and reformatting to get a dataset I could do something with, I had myself an Excel spreadsheet with 28 years of 12 days of Christmas cost. Next challenge: visualize it and see what gems of wisdom we can acquire.
Often, there is much to be learned by looking at how not to visualize data. So before we get to how I'd visualize the cost of Christmas, let's look at a few less-than-optimal visualizations of this data and discuss their limitations.
First, the stacked bar chart. I often see data like this (multiple series over time) displayed this way. Unfortunately, this usually isn't a great approach. Stacked bar charts are tricky, because once you get past the first series, there is no longer a consistent baseline to compare the other series. Here's what it looks like with this data:
In the above, we can see how the total price of Christmas has changed over time and also see what the major contributors to the total price are. But if I want to understand how the different components have changed over time, that's tough with this visual. Are all goods changing in the same way, or are some getting more expensive while others have become cheaper? It's really difficult to tell with this graph.
So what if we unstack the bars so that we do have a consistent baseline for each series. Here's what we get:
This clearly doesn't work here - there's way too much going on. But even with fewer series (picture just the first 5, for example), this format is hard to read. It puts a lot of onus on the audience to spend time staring at it and looking for interesting things to pull out. That's too much work, when we can make the interesting things more obvious so our audience doesn't have to search for them.
Let's see what this data looks like in a line chart:
This is getting better, but still may not be optimal. There are a lot of overlapping lines, especially at the bottom where a number of series have similar values. But the biggest drawback is that we don't get a good sense of how the total cost of Christmas has changed over time with this graph, which is kind of the meta point of the data and is probably interesting.
While we're on the topic of non-ideal graphs for this data, I can also picture some sort of horrible visualization with pie charts: one for each year showing the breakdown of Christmas items, perhaps even with the size of the pie scaled by the total cost of Christmas. This would take some time to build, so I'm not going to go through the effort, particularly given that pie charts are my enemy. Rather, I'll simply say: don't do this!
Why? Check out this blog post for some background.
We've looked at some less than stellar graphical representations of this data; now let's turn our attention to something that I think might work a little better.
In any visualization exercise, one of the first things to do is determine what question(s) you want to answer. This will drive how you show the data: the goal is to show it in a way that makes it clear what questions you set out to answer and answers them in a straightforward manner. The problem is that this step is often skipped, resulting in graphs like the ones above. When you don't isolate what question(s) you want to answer and try to create a visual that will answer any question, you run the risk of not answering any single question very well.
With this data, I'm going to choose to answer a couple of questions: how has the price of Christmas changed over time? (both in aggregate and for the various items) and what proportion does each day contribute to the total cost of Christmas? The trick I'll employ to do this in a way that isn't overwhelming is to create a visual with multiple graphs (and words!) so we can answer these questions one at a time. Said in another way, I'm going to use my visual to tell a story with this data. Here's what it looks like:
The top left graph shows how the cost of Christmas has changed over time. The top right graph shows the 2011 cost breakdown per item so we can understand the contributors to the total cost. Finally, the mini-graphs at the bottom help us understand the drivers behind the total changes we see in the top left graph. I've put on my analyst hat and added some words to describe what I believe are the main takeaways that my audience shouldn't miss.
The bottom line: Christmas is getting more expensive. If you have a tight budget for your holiday party, for entertainment you may consider replacing your leaping lords and dancing ladies with milk maids and for decor swap your swans for hens to save a considerable amount of money!
In case you're interested, my full Excel spreadsheet with data and graphs can be found here.