In a data visualization battle of table against graph, which will win?
The short answer (which may be less than satisfying) is: it depends. Mostly, it depends on who the audience is and how the data will be used. One important thing to know is that people interact very differently with these two types of visuals. Let's take a quick look at how and some use cases for each, then we'll look at a specific example from a recent WSJ article.
Tables, with their rows and columns of data, interact primarily with our verbal system. We read tables. When I have a table in front of me, I typically have my two index fingers out - I scan across rows, down columns, and I compare values. Tables are great when you have an audience who wants to do just that. Or if you have a diverse audience, where each wants to look at their own piece: a table can meet this need. Tables are also handy when you have many different units of measure, which can be difficult to pull off in an easy to read manner in a graph.
Graphs, on the other hand, interact with our visual system. It's a high bandwidth information flow from what our eyes see to the comprehension in our brain, which can be extremely powerful when done well. Graphs can present an immense amount of data quickly and in an easy-to-consume fashion; they are particularly useful when there is a point to be made in the shape of the data, or for showing how different things (variables) relate to each other.
Let's look at an example. There was an article posted recently in the Wall Street Journal online titled, "Young Workers Like Facebook, Apple, and Google" (article). With the article, came an "Interactive Graphic," a table listing the 150 companies included in the survey, relative rank, and the percentage of young worker respondents that voted for each. (Slight tangent: while I suppose the interactive label fits, I was a little surprised to find that the only way I could interact with the data was to sort each column in either ascending or descending order - I guess this would be useful if I were looking for a particular company, so I could alphabetize the list, but utility beyond that is limited.) Here's what the top of the table looked like:
Question: was it right of WSJ to include a table rather than a graph?
In this case, I think the answer is yes. The article spends time discussing Google in the top spot (making the article title seem somewhat incongruous to me...also interesting that they mention Google last out of the three companies called out in the title while it ranked first), but then also points out some other nuances, for example the decrease in financial sector rankings (though the year over year data is not provided to the user). My assumption is that they wanted to include all of the data so that users could look up specific companies of interest, or look at the top or bottom of the list. This hits the one of the table criteria that we described above: a diverse audience, each wanting to look up their own piece.
If, however, the primary goal is to make the point that Google is well ahead of the pack (which is the focus of the majority of the article), a graph would help us to visually tell the story more quickly and arguably more effectively than can be done with the table.
Question: what should we graph? Graphing all 150 companies is out of the question: there are too many and the tail will take up more space than the value seeing it will add. So we know we need to graph something less than all, but the question remains: where should we make the cutoff?
We can pick a clean number (this is likely the rationale behind the top 3 that WSJ mentions in title): top 5, top 10, top 20. But in doing so, we run the risk of including and excluding companies of very similar values (for example, if we were to graph the top 10, we'd include the CIA at 5.04% but exclude Nike, which is only 3 basis points lower, at 5.01%). This isn't to say this isn't acceptable, but to point out that it should be an explicit decision: you should understand the pros and cons of this approach and be accepting of the cons (vs. not recognizing that they exist).
Another option is to graph the data and then look for the natural breaks that occur and have our cutoff reflect this nuance in the data. Here's what it looks like if we graph the top 25 (quick & dirty):
Here, the y-axis is the % of respondents and the x-axis is company rank. I found it hard to see the difference in the length of bars plotting this direction, so also tried the horizontal bar chart:
I find it much easier to see the relative differences in this second iteration of the chart (somehwhat due to the compression of the bars, also it just seems easier to scan down vs. across to spot differences in bar length). Based on this, it looks like there are clear differences between 7th and 8th place, between 8th and 9th, between 11th and 12th, between 15th and 16th, and so on. We could make arguments for a number of different cutoffs. In this case, I'm going to decide to take the top 15, both because it's a clean number (I've always liked multiples of 5, not sure why) and because we see a drop between the 15th and 16th positions (it's also the point where we break the 4% mark: 4.04% respondents vs. 3.80%, which I can note in a footnote). You could make an argument to make the cutoff in another place, but this is what I'm going to go with for the reasons that I've outlined.
So if I want a visual to highlight the point in the article that Google is ahead of the pack, here is what it could look like:
Main takeaway: when debating table vs. graph, ask yourself how the data will be used and consider your audience. Let the utility of the visual that is needed drive your decision.