#SWDchallenge: education, color, and words

Throughout my life, I’ve known February to be Black History month. Interestingly, though, that’s simply on account of my age, as this year marks only the 43rd year we celebrate and recognize African American achievements in the US and Canada—achievements that took place against a backdrop of inequalities and often injustices politically, economically, and socially. For me, I believe that one of the most important pillars to ensuring access and opportunity for all (as well as ending ignorance and racism) is education.

To raise awareness and celebrate Black History Month, storytelling with data is collaborating with data.world, Tableau Public, #MakeoverMonday, Viz for Social Good, and Data for Democracy to ignite the imaginations and talents of our respective community members around the datasets and data stories connected to Black History. Each week’s focus is on a different sub-topic. I’ve decided to make this month’s #SWDchallenge to be centered on education, specifically the access, benefits, opportunities, and ignorance-curbing power. Create a visual with this in mind and let’s use data to recognize the importance—today perhaps more than ever before—of education in our society.

Your work doesn’t stop there. Last month, the challenge was to create an annotated line graph (nearly 90 people shared their creations!). I felt that singling out a graph type here would be too limiting, however (we’ll come back to that in future challenges). Rather than dictate a type of visual, this month we will put into practice a specific tip I find myself giving often when it comes to creating effective visual stories: be thoughtful in your use of color and words.

This may sound like simple advice. It is, I suppose, but there are nuances and the impact of these two straightforward elements executed well can be huge—and can even help overcome other design issues. Let’s talk a bit more about each of these.

Color, used sparingly, is one of your most strategic tools when it comes to the visual design of you data stories. Consider not using color to make a graph colorful, but rather as a visual cue to help direct your audience’s attention, signaling what is most important and indicating where to look. Note that for this to be effective, the use of color must be sparing. If we use too many colors, we lose the ability to create sufficient contrast to direct attention.

Words used well will both ensure your visual is accessible as well as indicate to your audience what you want them to understand in the data. There are some words that must be there: every graph needs a title and every axis needs a title (exceptions will be rare!). Don’t make your audience work or make assumptions to try to decipher what they are looking at. Beyond that, think about how you can use words to make the “so what?” of your visual clear. I advocate use of a “takeaway title”—meaning, if there is something important that you want your audience to know (there should be), put it in the title so they don’t miss it. Also, when your audience reads the takeaway in the title, they are primed to know what to look for in the data. When I’m putting a graph on a slide, I’ll use the slide title for the takeaway (and put a descriptive title on the graph). When the graph is on its own, I’ll often title with both—typically “descriptive title: takeaway.”

As illustration, below is an example. Here, I’ve shown the progression (no need to do this for your challenge, you can simply share the final product) from base graph, then added color, and finally words. Notice how we immediately know what to look for and where to look in the final graph.

Education color words.png

To recap the #SWDchallenge: find some data of interest related to education (you have free range within this: academia, higher education, black scholars, access, how education has helped ensure progress and opportunity, etc.). Data.world has curated a short list of datasets, or you can find even more in this list of publicly available data. Analyze the data to determine the specific story you’d like to tell. Harness the power of color and words to create your visual story. DEADLINE: Wednesday, 2/14 by noon PST. Specific submission details follow.

SUBMISSION INSTRUCTIONS:

  • Make it. Identify your data and create your visual with the tool of your choice. If you need help finding data, check out this list of publicly available data sources.
  • Share it. Email your entry to SWDchallenge@storytellingwithdata.com by the deadline. Attach your image as a .PNG. Put any commentary you’d like included in my follow up post in the body of the email (e.g. what tool you used, any notes on your methods or thought process you’d like to share); if there’s a social media profile or blog/site you’d like mentioned, please embed the links directly in your commentary (e.g. Blog | Twitter). If you’re going to write more than a paragraph or so, I encourage you to post it externally and provide a link or summary for inclusion here. Feel free to also share on social media using #SWDchallenge and #VisualizeDiversity and/or upload to the data.world page.
  • The fine print. I reserve the right to post and potentially reuse examples shared.

I look forward to seeing what you come up with. Thank you for helping to celebrate Black History Month and the importance of education in our society. Stay tuned for the recap post!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

it depends...

“It depends.” These two simple words can answer a number of questions raised when it comes to visualizing data and communicating with it effectively. In this session, Cole discusses 10 common data visualization questions where the answer is “it depends” and discusses what it depends on and the critical thought process required for success. Cole also answers reader questions on considerations between lower and upper case in data visualization related text and Excel resources.

    Listening time: 45:00. Links mentioned during the podcast:

    Subscribe in your favorite podcast platform to be updated when new episodes are available. If you like what you hear, please rate or review the SWD podcast. You can find past episodes on the podcast page, including sessions focusing on how I've built storytelling with data, feedback in data visualization and discussion of what is story?

    /
    CLICK HERE TO JOIN OUR MAILING LIST

    SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

    area graph to highlight a line

    I don't use a lot of area graphs. But I found myself pausing on one that was submitted as part of the recent annotated line graph #SWDchallenge. It was created by Mike M. and the interesting thing to me was that the focus of this particular area graph wasn't on the area so much, but rather on the line that separated the areas.

    This apparently stuck with me, because I found myself recommending a similar approach in a recent client makeover. 

    The original graph looked something like the following (data has been modified to protect confidentiality):

    Area to highlight line_1.png
     

    This is collections data from a bank. In case you aren't familiar with how collections work, typically an automated dialer makes calls to overdue accounts. The grey bars above represent total dials made. When someone answers the phone on the other end, the dialer connects them to a collections agent, who talks to the person who hasn't paid their bill and tries to get them to make a payment. The accounts where a person is reached (a collections agent talks to someone) are considered to be "worked," which is what the teal bars above represent. The penetration ratio, depicted by the black line, is...hmm. What is a penetration ratio exactly? This one threw me. I'm familiar with penetration rate, which would be the proportion of accounts that were worked out of the total dialed. So in other words, if penetration rate is 33%, we worked a third of the accounts. The ratio seems less straightforward. I think to describe it, it would be something like "if the penetration ratio is 3, it means we dialed 3x more accounts than we talked to." This seems unnecessarily complicated. Let's see if we can make some changes to how we show this data to make it more straightforward. Oh, and let's use that cool idea that I picked up from Mike M, too.

    First, I'm going to remove the secondary y-axis on the right side of the graph and the data (Penetration Ratio) that goes with it. That gets us a simple two-series bar chart:

    Area to highlight line_2.png
     

    In the above, we see accounts worked (teal) and total dials made (grey). Dials made is the sum of accounts that were worked and those that weren't reached. So I'm going to change this data slightly—from dials made in grey to those not reached—and stack the bars on top of each other.

    Area to highlight line_3.png
     

    We can get the same information out of the view above as the previous one: we can see total dials made (overall height of bars) and within that, the portion that were worked and the portion that were not reached. Notice that because worked series is on the bottom of the stack, we can easily see how it has varied over time. Total dials made have decreased over time, so has the number of accounts we've worked. But are we working a lower proportion of total dials now than we have historically? It's hard to tell here. Let's shift to 100% view to answer that question:

    Area to highlight line_4.png
     

    With the 100% stacked bar, we lose the context that overall call volume (total dials made) has decreased over time. But that's ok, because we know it now, so we can state it in words: "Call volume decreased 47% over the course of the year." With the 100% view, we can see that the proportion of accounts that we are working has decreased recently. So in spite of reduced call volume, we are reaching a lower proportion of accounts. Interesting. Perhaps we can make that a little easier to see?

    Let's remove the space between the bars and turn this into an area graph:

    Area to highlight line_5.png
     

    Bingo! With this view, we can see the proportion of accounts that were worked out of the total dialed. The white line separating the teal from the grey now represents the penetration rate. We can make this clear by adding some text and calling out the most recent data point:

    Area to highlight line_6.png
     

    I might add a headline that says something like, "Despite decreasing call volume, penetration rate hit a 12-month low in December." And like that, we've used an area graph to highlight a line.

    What do you think? Do you like this approach? What might you do differently? Where else could an approach like this work? Leave a comment with your thoughts!

    You can download the Excel file with the above visuals.

    our tools don't know the story

    A question that frequently arises in our workshops is “What tools do I need to tell stories like you do?” Many are surprised to hear the answer: we’re tool-agnostic. Rather, the concepts we teach are universal. No matter if you’re using Excel, Tableau, PowerBI, R, SAS, or something else, the tools themselves don't know your data, your organization, or your audience like you do. That’s where an analyst adds value by bringing the data & its underlying story contextually to life.

    Today’s post was inspired by a real-world makeover of data originally created in a tool highly regarded for data visualization. The client was visualizing advertising data across multiple countries. Their initial visualization looked similar to the one below. (Note: I’ve anonymized the data to preserve confidentiality).  

    scatterplot1.png
     

    This chart shows two dimensions of advertising effectiveness: reach (how many users saw an ad) and engagement (how many users clicked on the ad) across several countries (United States, Germany, Great Britain, China, and Brazil). The higher these numbers, the better.

    Upon further exploration, we see that the magnitudes of reach and engagement are very different across countries. In China, 52% of users were reached compared to 68% in Brazil. With engagement, the magnitude of the difference is even more pronounced: China’s engagement is 6%, half of Brazil’s 12%.  

    Imagine yourself as a decision maker tasked with determining an action plan from these results. If the analyst presented you with the visual above, what conclusions might you draw?  An informal poll of readers might return multiple answers, which demonstrates the danger of letting our tools "tell the story" for us.

    Don’t assume two different people looking at the same graph will come to the same conclusion. Add value by highlighting key takeaways for your audience.

    An important distinction made in the book, storytelling with data, is the difference between exploratory and explanatory analysis. Exploratory analysis is what we do to find interesting things in our data. For example, the analyst might have asked many questions during the exploratory phase, including (but not limited to):

     

    1. How have these metrics changed over time?
    2. Are there geographical differences when drilling down by country?
    3. What is the revenue impact of this data?
    4. Are there noticeable patterns in users’ behavior that can be used for predicting next quarter’s results?

    After exploratory analysis, then we move to explanatory analysis. Explanatory analysis is where we take the interesting thing we found via exploratory analysis and communicate it to our intended audience. In explanatory analysis, often times that requires creating a different visual or using a different tool than we used in the exploratory phase.

    Let’s assume that what’s relevant in this data is the varying levels of reach & engagement and therefore, each country needs its own strategy for next year. If that’s the interesting conclusion, how might the analyst communicate this? One option is to use the initial design and visual cues like color and annotations to focus attention appropriately:

    scatterplot2.png
     

    In this version, I’ve preserved the horizontal bars, sorted by reach in descending order, and decluttered by removing the border and grid lines. While this a step in the right direction, it still takes a lot of work to read all this text and mentally process the different takeaways:

     

    1. Low engagement/high reach
    2. High engagement/high reach
    3. High engagement/low reach
    4. Low engagement/low reach

    Perhaps a different visual would make this more visually apparent. Since these takeaways fall into four quadrants, a scatterplot is another alternative:

    scatterplot3.png
     

    We now have a visual with a well-labeled construct on how to interpret the data. The categories on the axes (Many/Few, Low/High) help the audience understand the range of values and where each country falls on that range. For further reading on the importance of categorization, check out this post.

    Finally, I’d add back the color & annotations, while being thoughtful about how the audience will intake the information. For example, in a setting where only the Brazil team is present, I might focus attention only on their data:

    scatterplot4.png
     

    Or the European countries, where the results are mixed:

    scatterplot5.png
     

    Both views would be important considerations in knowing the audience for our explanatory analysis.  

    Scatterplots are often used with scientific data, but in this case work well for visualizing categories of differing takeaways. This works because of the additions of text and categorization, which helps the audience process the information. Remember, never make your audience do more work than necessary to understand a graph!

    If all the takeaways need to be on one view, I can still leverage the scatterplot while being strategic about the use of color to focus attention appropriately.  

    scatterplot6.png
     

    In conclusion, there’s a huge difference between simply showing data from the exploratory phase vs. using data tell the a story in explanatory analysis. Check out the difference between the remade view above vs where we started:  

    scatterplot7.png
     

    We have a wide disposal of great tools for visualizing data, but our tools will never know our data’s story like we do. We can add value to our roles and our organizations by bringing the story to life.

    If interested, you can download the Excel file with the above graphs.

    Want more on story? Check out Episode 2 of the SWD podcast, where Cole discusses her thoughts on, "What is story?" She makes a distinction between story with a lower case 's' (the takeaway, or the so what—the way "story" was used in this post) and Story with a capital 'S,' which has a shape (plot, twists, ending—a narrative arc). Also stay tuned for the next post here, where Cole will recap and share the 75+ annotated line graphs received in response to the latest #SWDchallenge.

    Update from Cole: We have a couple of additional views to share based on reader comments. First, the following view is similar to the final visual above, only with text moved out of the graph itself to the side. 

    Scatterplot - words on side.png
     

    This next graph was created by Daniel Zvinca, which follows what he calls his "obsessive concern for a flexible design."

    Scatterplot alternative_DanZvinca.png
     

    He notes that he preserved color for potential additional enhancement and outlines the following benefits of this view:

    1. More metrics can be added or just one can be used (works fine for 1, 2, ...5 metrics).
    2. More countries can be added. When number is higher, gridlines ever 5 countries or so would help localize the associated values.
    3. Any metric is clearly encoded/decoded and can be used for sorting.
    4. Comments do not require special care, they never overlap (unless they are too long).
    5. They can be defined for several performance levels (e.g. Likert scale intervals). For purpose of this design, bad=dark background, good=light background.

    Nice idea, Dan, and thanks for sharing! Thanks also to everyone who has commented and contributed to the discussion, both here and on other posts.


    Elizabeth Hardman Ricks is a Data Visualization Designer on the storytelling with data team. She has a passion for helping her audience understand the ’so-what?’ Connect with Elizabeth on LinkedIn or Twitter.