people analytics

Up until relatively recently, my day job was in People Analytics at Google. My career has been (and continues to be) focused on helping people make sense of, understand, and act based on numbers and analytics. Applying these skills in the people space over the past six years was a fascinating adventure.

People Analytics is an analytics team that is embedded in Google's Human Resources organization, where the goal is to help ensure that people decisions made at Google - decisions about employees or future employees - are data driven. Personally, I credit this role and my managers and team for really allowing me to use people analytics to hone my data viz and storytelling with data skills, gain a better understanding of the science behind data visualization, and give me the opportunity and autonomy to build and teach a course on data viz there, which ultimately paved the path to where I am today.

But I stray off track. Let's get back to the topic of people analytics. Because of the time I spent in this area (and Google's reputation in this space as a thought-leader), I periodically gets calls from the press asking for details. Recently, a reporter from the Wall Street Journal reached out to discuss "big data and how it's used in human resources". It turns out that they mostly wanted me to talk about some proprietary Google projects that I declined to comment on (unfortunately, I can't share some of the really interesting ground-breaking work), but I did sketch out some notes when I was thinking about the topic, that I thought I'd share here for those who may be interested.

Cole's [somewhat random] thoughts on People Analytics
Employees are a precious resource at any organization. Data can help you to make better decisions when it comes to these precious resources. Broadly, I think about People Analytics in terms of the different stages of the employee lifecycle:

  • Hiring: getting the right people in the door.
  • While they're there it's about making them as effective as possible and creating an environment and opportunities that optimize efficiency and impact (performance management, career development, rewards, employee sentiment).
  • Attrition: getting ahead of it so you can retain those you want to keep and push out those you don't (as appropriate).

You can put data to each of the above spaces to make smarter decisions. In the early stages of people analytics, much of it is descriptive: understanding what things look like currently and identifying gaps between that and where you want to be. As you move up the value chain, you can get into some really interesting predictive spaces to try to understand how things will look in the future and what levers you can pull to impact that.

There are a number of challenges when it comes to leveraging the people analytics space. I'll outline my view on two of the big ones:

  • Marrying what is often many disparate data sources into a single holistic view of the employee that can be aggregated up and looked at through different lenses so that the info is available to the right person at the right time to take action. This becomes even more challenging when you start to think about external data sources (e.g. Twitter, LinkedIn) that could be integrated for improved insight. In the early stages of people analytics, the first goal is to understand where you're at currently, which often takes the form of reports. Over time, these may be replaced by dashboards that "push" data out to internal stakeholders. Once the current state is known, analysts' time is freed up to focus on the more interesting questions and custom analysis to be able to drive data-informed decisions. 
  • Finding the right balance (and organizational appetite) between data-driven and considering the context. Many struggle to make the data make sense - taking the organizational and business context into account when it comes to interpreting and using the data. Most companies have a wealth of qualitative data - things that HR business partners or managers know that will never be adequately captured in hard numbers. There's also a wealth of information in text data that's typically largely untapped (resumes, interview notes, employee surveys, performance reviews, exit interviews). Being able to marry all of this together will provide the most robust view, but is easier said than done (and may sometimes be more than is needed, anyway). It's about figuring out when to lean a little more in one direction vs. the other to create buy-in and build the best solution for a given situation.

If this space sounds fascinating (it is), you can check out Google's open roles here. Increasingly, other companies are devoting brains to this area as well; search openings by querying People Analytics or HR Analytics.

telling multiple stories (part 1)

I often emphasize the importance of identifying the single most important story you want to tell and crafting a visual to support this. But how should you approach the visualization challenge when there are multiple stories you want to tell with the same data? Check out this post for a strategy.

Read More

a Google example: preattentive attributes

The topic of my short preso at the meet up last week in Mountain View was preattentive attributes. I started by discussing exactly what preattentive attributes are (those aspects of a visual that our iconic memory picks up, like color, size, orientation, and placement on page) and how they can be used strategically in data visualization (for more on this, check out my last blog post). Next, I talked through a Google before-and-after example applying the lesson, which I'll now share with you here.

First, a little background: In 2010, my colleague Neal Patel undertook research on managers at Google. He set out to understand two primary things: 1) the impact that managers have on work-life and 2) what makes a good manager. To read more about this study and the findings, check out the New York Times article from earlier this year.

When Neal's research was complete and it was time to begin to socialize the study and findings, he and I locked ourselves in a room filled with whiteboards and began to brainstorm. One of the visualization challenges was the first part of the study: as one might expect, managers have varying degrees of influence over the different aspects of work-life, ranging from aspects that they are able to influence heavily to aspects that they influence little or not at all. Our aim was to show this in a way that was easy to understand.

One of the early iterations looked like the following (note that I've generalized the visuals significantly to be able to show them here).


Given that I've generalized most of the labeling, I'll walk you quickly through what you're looking at. At the top of the page, there are three categories: those work-life aspects that are 1) highly influenced by managers, 2) somewhat influenced by managers, and 3) not influenced by managers. The categories within these are the different work-life themes, for example feeling supported in career development or having the ability to innovate, and then each has more detail on what aspects of the given theme are influenced at the given level by managers.

Next, comes the graph. The y-axis is a quantitative measure of manager influence. The x-axis shows the different aspects of work-life, grouped by color into same thematic categories as referenced in the table above the graph. The height of the bars indicates what influence category each work-life aspect falls into (matching the table above it): highly influenced by managers, somewhat influenced, or not influenced.

This is a nice looking visual. But we can use preattentive attributes more effectively to make the point come across more quickly and enable the audience to more easily take in the information.

In fact, it is exactly those two things from my perspective that preattentive attributes can facilitate in a really powerful way when employed effectively: 1) to draw the audience's eye to the most important part of the visual and 2) to provide a visual hierarchy of information that will help make it clear to the audience how they should interact with the information that is being provided. You can think of preattentive attributes as your tools to help your audience get into your (the designer's) head.

Let's inspect the above visual with these two things in mind. One of the first questions I ask myself when I'm looking at a visual is where is my eye drawn? You can do this easily with your own visuals: look away for a moment, then back at the visual and take note of where your eye first focuses (it's generally the preattentive attributes that dictate this). When I do this with the above visual, my eye first sees the title, "Findings," and then is pulled to the color in the graph at the bottom. The color differentiates the various work-life themes, which is probably not the most important thing on the page, and yet the strong draw of the color gives a signal that it should be.

Now, let's look at the visual from a hierarchy-of-information standpoint. Besides the title and the color in the graph, the font is all of similar size and weight. What this means is that the audience must read through everything in order to be able to conclude for themselves what is important and where they should devote their attention. To be frank, most audiences won't take the time to do this. It's also not really fair of us to ask them to, when a few minor changes will make it clear.

The following mock-up is similar to where we ended up with the visual after our brainstorming session. Note that very little change has been made to the content: we already had the right information, it was just a matter of playing with the preattentive attributes to make it more accessible to our audience.

Some work-life aspects are more influenced by managers

The only content changes were to the titles. One of my rules is to never waste the title line for a descriptor like "findings". Titles are typically at the top of the page, which means they are the first thing people encounter and they are often big and bold (and perhaps even blue!), which makes them even more attention grabbing. Use them to communicate the most important thing about the visual. Maybe it's the main finding. Or perhaps the call to action that the data informs. It's prime reas estate, so make it count.

Let's take a look at how preattentive attributes are working for us in this updated visual. First, from the where-is-your-eye-drawn standpoint: for me, it goes like this:

  • I can't help but read the main title because of its placement at the top of the page and because it's big and bold and blue.
  • Next, my eye catches the graph title (font is bigger than that which is around it, also the bold is a signal that it's important) and scans it so I know what I'm looking at.
  • Within the graph, my eyes are drawn to the dark blue bars, which are those work-life aspects that are most heavily influenced by managers, arguably the most important thing on the page, since these are the areas that can be most impacted by change.
  • As my eyes continue to move down the page, they are drawn to the dark blue in the table (color coordinating with the same influence category as in the graph so there is a visual tie connecting them that doesn't require reading).

From a visual-hierarchy standpoint, what I've outlined above is highlighted clearly as the highest priority information on the page. Everything else is secondary. It's there to add clarity and additional information, but note how much more scan-able the second version of the visual is compared to the first.

The lesson is this: use preattentive attributes like color, size, and placement on page with intention. Specifically, use them to 1) highlight the most important part(s) of the visual and 2) create a visual hierarchy of information. Your audience will appreciate that you are providing visual cues to help them interact with your data visualization and will be more generous in giving their time to it than a visual that feels like work to consume.

nGram viewer

Earlier this week, Google launched an exciting new tool: the Books nGram Viewer for visualizing how the occurrences of phrases in books have waxed and waned over the years. The viewer sits on top of a dataset of 500 billion words from 5.2 million books in Chinese, English, French, German, Russian, and Spanish, with phrases up to five words and a count of how many times the phrase appears each year (a subset of Google's 15 million digitized books worldwide).

Play around with it, and you'll see firsthand how a clean and simple visual can allow you to understand a massive amount of data in seconds and use that data to start to create and tell stories.

Say, for instance, that I want to understand the varying popularity of my personal favorite amusement park ride (the Ferris Wheel) in English literature over the years. For a point of comparison, I'll also plot my least favorite amusement park ride (the rollercoaster). Here is the visual:

Ferris Wheel vs. Rollercoaster popularity over time


We see both rides beginning to be captured with the written word in the 1930s. The Ferris Wheel has had several relative rises and falls in popularity since then, with (sadly) a continued decline since the mid-1990s. The popularity of rollercoasters, on the other hand, was initially slow to build, but then overtook the Ferris Wheel around 1985 and has skyrocketed in comparison since that time. Based on this visual, my affection for Ferris Wheels puts me in a dwindling minority, while rollercoasters are rapidly gaining in popularity.

As a reminder on the importance of context, let's add another series. If you enjoy Ferris Wheels like I do, you may know that the first one was built for the World's Columbian Exposition in Chicago in 1893 and that it was intended to rival the Eiffel Tower, that had been built for the Paris Exposition 4 years earlier. Let's check out what happens if we plot mentions of the Eiffel Tower in English literature on our chart:

Eiffel Tower mentions dwarf Ferris Wheel and Rollercoaster throughout history


As I called out in the chart title, mentions of the Eiffel Tower dwarf our initial two series. Also note that, whereas we see mentions of the Eiffel Tower pop up immediately following its unveiling, the Ferris Wheel took a little longer to make its way from the World Expo to the written word.

Just starting to play around with this sparks more interesting questions: what led to the bumps in the Ferris Wheel's popularity? What genre of novels most mention the Eiffel Tower - romance? history? Google gives us the ability to dig to our heart's content by making the full datasets freely downloadable as well.

What stories might you use nGram to tell?