Friday, August 10, 2012

evaluating word clouds

Word clouds created a bit of buzz when they first became popular a couple of years ago (or at least that's when I encountered them for the first time). Like the infographic, they have a bit of sex appeal that draws you in. As in the case of infographics, however, I often find that upon further evaluation they tend to be a letdown - full of fluff without so much informative value.

While facilitating a workshop recently, I heard a horror story about someone who had tried to create a word cloud by hand (perhaps the scariest part of the story involved scaling text boxes one at a time). Lesson: in data viz (and in life), if you find yourself doing something tedious and repetitive like that, stop to reevaluate. At minimum, do a Google search. Even better if you can find a blog post or related article on the topic from someone who has encountered the same challenge before and identified an eloquent solution.

In the case of word clouds, there are a number of applications you can use to generate them. Wordle is a popular free product (created by Jonathan Feinberg of IBM, note that if you upload your Wordle to the gallery, the data goes with it, though you can also opt for local-only word cloud generation) that allows for quite a bit of customization of color, size, font, etc. Google docs has a word cloud gadget within spreadsheets. There are a number of others, easily located via a Google search.

But before you start thinking about generating word clouds, let's continue our discussion on their efficacy. Their sexiness can draw you in. But is there value beyond that? I think it comes down to the use case. I've got one example for the negative and one for the affirmative.

Poor use of word clouds
First, let's take a look at an example from a Community Health Center. My understanding is that they employed a consultant to analyze some survey data from their clients. The consultant put together a report filled with pretty word clouds like this one:

Good service is... minutes? Part of the challenge in this case is that the connotation has been completely stripped away from the nouns, removing the sentiment behind the comments. Which is kind of the important part of the comments, in my opinion. But in reading the report, buried near the end of it, I found the following:

The consultants took the time to content-code the comments. These categories and their descriptions are much more useful for understanding what people value than the word cloud. With this info, we can direct action: we get an understanding of what's going well that we want to maintain, as well as potential areas for improvement. We could take this a step further of making the data visual like this:

In this case, I think the simple bar chart is much more useful (in terms of both understanding the information and determining how to act on it) than the word cloud. Now let's look at a better use of word clouds.

Thoughtful use of word clouds
Caveat: this example came to me by way of the telephone game (I heard it from someone who heard it from someone), which means it's guaranteed that I don't have the details totally right. But I think this still serves well as an example of a good use of word clouds. The story goes: Apple stores obviously really value customer service. They use surveys to collect info about each store. Each day, they create a word cloud for each store based on customer comments. What they are looking for are 5 (I'm making that number up, I don't know what the real number is) specific words - things that are considered must-haves when it comes to customer service in their stores. It's when these [5] words don't show up prominently on the word cloud for a given store that a red flag is raised and some sort of action is taken.

This is what I would consider a thoughtful and actionable use of word clouds. If the required word doesn't appear, some sort of intervention happens.

We can generalize this to the following: when you're considering using a word cloud, think about what you want your audience to know and what you want your audience to do. Then ask yourself if a word cloud will enable them to know and do those things.

And for goodness sake, if you do use a word cloud - leverage some of the tools that exist - don't try to create it by hand!


  1. In your 'good' example, wouldn't it make more sense just to rank the words, or create a function/macro that told you if the 5 words were present (in the top 20 or whatever your criterion is)? Word clouds look interesting, but I've never seen a case where they're more useful than just a ranked list. And the 3D clouds are a major pain to navigate.

  2. Great post! Thank you for sharing.

    The word cloud and the bar chart certainly convey different information. I founds this is due to a limitation of the word cloud software. It separates two or more words that are often needed in context together. Project Managment is a perfect example. Separately they have a very different meaning than they do together. To overcome this feature, I recode the data and eliminate the space between project and management. In this way the word cloud more accurately represents the sentiments of the respondents. Taken one step further, the entire list of recoded comments can be input into the word cloud rather than the raw comments themselves. This provides a word cloud is more closely aligned with the bar chart of coded data.

    In terms of preferences...I like to use both the word cloud and the bar charts--art and science.

    John Mattox

  3. I agree with Shawn from a practical standpoint, but I don't think the "fun" value of word clouds can go understated. With their color and structured chaos they draw people in. This can be particularly powerful when it comes to advertising. While this isn't something I would use in a more formal report, I would consider using it as a way to generate excitement or curiosity around a certain topic. I think that there are also ways to leverage their difficulty to navigate, such as a word could word search or maybe making certain words into hyperlink for hidden content.

  4. I find the clouds a useful discovery tool when I need to explore a vast amount of narrative input. The trick is not to project my biases onto highlighted words that may have different meaning within context, like John points out.


  5. I had a client that liked using word clouds sort of in an opposite way. Their policy was such that if 3 particular words didn't appear in the word cloud, this was cause for an alert.

    In Allegiance we provide both options shown above. Regular word cloud that goes by counts, but we also have a text analytics engine that finds topics or concepts using natural language. We present those topics in a bar chart form so you can get a quick view of the % of the comments that mention the topic.

  6. Thanks for that, great article! Absolutely agree that word clouds are useful when used in the right context. For those interested, here's a FB app that get a person's friends to describe them and then creates a word cloud out of the results, I think this is a great way to use word clouds! For those interested, check out: