Here at storytelling with data, we have been known to say things like, “The only thing worse than a pie chart…is two pie charts.” And yet, believe it or not, we’ve found a data visualization that we think succeeds in using not one, not two, but forty-five pie charts. How could this possibly be?Read More
A friend of mine, Mark Bradbourne, recently posted a picture to Twitter showing a bar chart that his local utility company included in his most recent bill. He entitled the picture “Let’s spot the issue!”
So as to protect the utility company in question, I’ve recreated the chart below, as faithfully as possible. (There are, of course, many changes I would make in order to render this a storytelling with data-esque visualization, but for the purposes of this discussion it’s important that you see the chart as close to its original, “true” form as possible.)
The internet immediately latched onto the seemingly absurd collection of months portrayed in this chart. The bill, dating from June of 2019, included 13 prior months of usage from as early as August of 2016, as recently as March of 2019, and in a random order.
Soon, our non-U.S.-based friends pointed out that the dates made even less sense to them, as (of course) their convention is not to show dates in MM/YY format, but in YY/MM format.
And with this, the truth of the matter became obvious: the dates were in neither MM/YY format nor YY/MM format; they were in MM/DD format, and excluded labeling the year entirely.
Whenever we run across these kind of so-called “chart fails,” it helps to keep in mind that whoever created the chart wasn’t setting out to be confusing or deceptive. The utility company clearly wanted its customers to be aware of their recent usage, and went so far as to show that usage in a visual format so that it would be more accessible.
The danger, though, is in the assumptions we make when we are the ones creating the chart. Specifically, in this case, there were likely assumptions made about how much information needed to be made explicit versus how much could be assumed.
The energy company likely thought:
The chart says that it’s showing monthly usage; and, since it shows 13 bars, the homeowner will know, or at least assume, that the bars represent the last 13 months in chronological order.
And in general, yes: that is what our first assumptions would be, if there had been no labels whatsoever.
In this case, the company chose to label the bars with a MM/DD convention, excluding the year—probably to denote what specific day the meter was last read, or on what specific day the last water bill was issued. But we very rarely see dates in MM/DD format when they cut across two different years. We’re trained to see date formats in the style of XX/YY being representative of months and years, not months and days. To interpret the chart correctly, we would have had to ignore and resist our personal experience with this convention.
So on the one hand, logic told us that the chart showed the last 13 months; on the other hand, our experience and the direct labels told us that it was mistakenly showing us 13 random months. What other elements of the chart, or other design choices, could have nudged us towards one of these interpretations over the other?
Perhaps if the chart had been a line chart rather than a bar chart, we would have been nudged into thinking that the data was being shown over a continuous period of time; this could have been enough to make the chart more easily interpreted.
Or, if the labels had used abbreviations for the months, rather than numbers, we almost certainly would have seen the orderly progression of months more clearly.
Another solution, one which would have almost certainly eliminated all confusion, would have been to include the actual year in the labels, or as super-categories below the existing labels.
We could also ask the question: Do we need to be so precise with our X axis labels that the specific day of the month is shown at all?
It doesn’t seem like it; especially considering that the data on the Y axis has most likely been rounded off, and is presented to the audience at a very general level.
Look at the level of granularity on the Y axis; although it ranges from 0.1 to 0.7 (in 1000s of units), every bar is shown at an exact increment of 0.1. It’s unlikely that a homeowner’s actual monthly utility usage is always an exact multiple of 100.
In this case, the labeling of the specific date on the X axis implies a specificity of data that the Y axis does not support.
The bottom line, though, is that the creator of the chart made assumptions about what they needed to show versus what they could exclude; and in making those assumptions, they inadvertently misled their audience in a manner that was very confusing.
It is important to focus your audience’s attention on your data in your visualizations, and to remove extraneous clutter and distracting elements—including redundant information in labels. This case, however, highlights the danger of taking your assumptions too far, and inadvertently adding confusion rather than clarity.
Sometimes we get so familiar with our own work, and our own data, that we lose track of what is, or isn’t, obvious to other people. During your design process, it can be valuable to get input from people who aren’t as close to your work. This helps to identify, and avoid, situations like this one, where familiarity with the data led to design choices that were confusing, rather than clarifying.
Putting yourself in the mind of your audience, and soliciting feedback from other people who aren’t as close to your subject, will help you to avoid these kinds of misunderstandings in your own work.
What is one thing you’ll do differently after learning the storytelling with data lessons?
At the end of our workshops, participants are often prompted to reflect on this question. The resulting discussion usually evolves into things that can be easily integrated into the day-to-day work already being done. One piece of advice we frequently give may surprise you—there are two easy actions that don’t require complicated technical skills! First, adopt the habit of stating your takeaway in words. Second, develop the practice of using color sparingly. Today’s post is a quick illustrative example that puts these tips to use.
At a recent client workshop, we discussed a visual similar to the one below. It is a snapshot of an organization’s current accounts payable (AP) by vendor at a point in time. At a basic level, the graph is fine. It’s cleanly designed with a left-aligned chart title, data labels incorporated into the bars, and no clutter of gridlines or chart border. The bar chart is easy for me to read—I can quickly see that AP is highest for Microsoft and how incrementally larger it is compared to the other vendors because of the consistent baseline (the y-axis).
What I can’t easily see is what I should take away from this chart. At client workshops, we often don’t have this important context—because of this, we often show multiple approaches for highlighting different potential takeaways. Below you’ll see several strategies for employing color and words in this visual. In each of these, notice how the words set up your expectations for what’s emphasized in the graph and color used sparingly indicates where to look in the visual.
If the audience is interested in the highest spend, I could emphasize the largest vendor:
Perhaps the audience will be more curious where AP is concentrated. I could instead focus attention on the top vendors:
What if the conversation is about expectations—is this spend surprising or unsurprising? I might add additional context with super-categories—useful if the audience is unfamiliar with these vendors’ services—grouping and employing similarity of color and position to visually tie the text to the data it describes.
Practice pairing color and words in your visuals to be more effective when communicating for explanatory purposes with data. Bonus: you don’t need fancy tools to do either of these things!
Download the file for a peek at how I created these visuals in Excel.
Do you find yourself needing to communicating with qualitative data? This post discusses three best practices when communicating with qualitative data—effectively using color, reducing text and considering if audience needs quantitative context—and illustrates through example.Read More
Today's post is a step-by-step Excel “how-to” inspired by a reader question we received following a recent post on using dotted lines in data visualizations.
“Do you know of a trick for drawing vertical lines to delineate years (or actuals/historical vs forecast/future segments of the chart)? I currently have to draw them with the line drawing tool, which gets messy when moving the chart on a PPT slide. If there were a way to embed it in the data or somehow format the chart, that'd be awesome.”
The following chart illustrates what Dave describes. The data is units of output over time where the first nine months of the series are actual data and the remaining four months of the year are a forecast. The dotted line serves as a visual cue to differentiate actual from forecast. Created in Excel, the line was physically drawn on the graph with the Shape Illustrator. While this approach might suffice as a quick method for achieving the desired effect; it isn’t ideal for recurring use of the graph, particularly if the line’s position on the x-axis might change in future iterations.
After some research and playing around in Excel, I’ve devised one method for achieving this effect, which I’ll outline in this post (I’m sure there are others!). Don't be dismayed by the number of steps: it's a one-time setup after which can be easily refreshed in future iterations by changing where you want the reference line. I’m using Excel 2016 and you can download the accompanying file.
In my spreadsheet, the data for the Output over time chart looks like this:
1. Go to a blank cell range and enter these values as shown in my screenshot below. I’m choosing to add these new values directly underneath my data range in cells F19:G21. This will eventually become the coordinates for a secondary scatterplot that we’ll add in a later step.
2. Choose where you want the vertical reference line to cross the x-axis and enter those values below “X”. In this example, I want the line located on the September data point, the ninth point in my data series. In cells G20:G21, I entered “9” in each, as shown below. (Note: for a more automated approach in a larger dataset, a MATCH formula could also calculate where September falls in the range: =MATCH("Sep",$F$7:$F$18,0).
3. Add a new data series by right-clicking the graph and choosing Select Data:
4. In Select Data Source dialog, click the Add button.
5. In the Edit Series dialog, enter a name for your data series (I chose “reference”) and select the X values you entered from Step 2. I selected the 9’s in G20:G21. Click OK to exit the dialog boxes.
The resulting visual looks like this:
6. Right click the new line and choose Change Series Chart Type.
7. In the Change Chart Type dialog box, select Combo section under All Charts tab. Then select Scatter with Straight Lines and check the option for Secondary Axis. Click OK to exit.
The resulting visual looks like this:
8. Go to the chart, right click the red reference line and choose Select Data again. In the Select Data Source dialog, highlight reference and click Edit.
9. In Edit Series dialog, update the X values to be the original values you selected in Step 5. Set the Y values to be 0,1. Click OK to exit.
The resulting visual looks like this:
The remaining steps are visual cleanup: first, I forced the red line to align with the top of the primary y-axis and second, I hid the secondary axis line and text labels.
10. Right-click on the secondary y-axis and select Format Axis:
11. In the Axis options section, type 1 into the textbox beside the Maximum option.
12. In the Text Options section, under Text Fill, choose No fill. This will remove the text labels on the secondary y-axis.
13. In the Axis Options section, under Line, choose No line. This will remove the secondary y-axis line.
Voila! The resulting visual has an embedded vertical line, which is plotted on a hidden secondary y-axis.
Recall that this goal of this specific scenario was a dotted line which visually differentiated the actual and forecast sections. My last step was to change the formatting of the line to appear as a thin, grey dashed line.
(Note: To achieve your preferred formatting, right-click the line and select Format Data Series in the context menu where you’ll find formatting selections in the resulting dialog pane.)
This method does come with some trade-offs to consider.
One downside is that you lose some control over the exact placement of the line where it crosses the x-axis. Below you’ll see a comparison between the manual vs embedded approach. With the manual approach, the line can be drawn exactly on the tick mark between the Aug & Sep data points, providing a clean alignment with the x-axis. With the embedded approach, the line is centered above the Sep label, resulting in a slightly less seamless effect.
On the cosmetic side, another downside is losing the flexibility to manipulate the length of the line for labeling purposes. I’ll illustrate this with a horizontal bar chart (which I also created using this method). With the manual approach, I can physically draw the line to extend above the x-axis line, aligning it closely to the “Target” text label. With the embedded approach, the line stays below the x-axis line, creating a gap between the line and the label that describes it.
You can download the Excel file to see the behind-the-scenes of these graphs. Are there other methods you’re aware of for achieving this effect? Or other considerations with embedding the reference line directly? Leave a comment with your thoughts!
Accessibility in data visualization goes well beyond considerations for color blindness. This is something I personally understand much better after my interactions with Amy Cesal in preparation for bringing you the following post. Back in April, we posted a short article here with some accessibility considerations in visual design and Amy sent me a private note basically expressing, “but wait…there’s more!” I love when content posted on the blog starts conversations like this—as I continue to learn, it means there’s more great content to share with you. Just be aware that if you reach out like this, I may talk you into doing some writing and sharing: that’s what has happened here when I invited Amy to pen a guest post and she graciously accepted!
I was already familiar with some of Amy’s work (if you haven’t seen it, be sure to check out her #DayDohViz, where she is in the process of creating 100 visualizations made out of PlayDoh, including a couple she’s shared via #SWDchallenge—you can view them all here). Amy first started being interested in accessibility for data visualization through her work in the federal government, where everything has to be up to 508 accessibility standards. At CFPB, she led the creation of the data visualization portion of the design manual, which is the first style guide for the federal government that includes accessibility for data visualization. Please join me in thanking Amy for writing and sharing the following expanded post on accessibility in data visualization!
Often, when we are creating charts and graphs, we think of ourselves as the ideal user. This is not only a problem because we know more about the data than the target user, but because other users might have a different set of constraints than we do.
Inclusive design principles and accessibility (often posted about with the tag #a11y) are important to take into consideration when designing data visualization because they help a broader audience understand your graphic. Designing with accessibility in mind can even help make your visualizations easier to understand for people without disabilities.
To help you understand how other people experience the world, empathy prompts provide a variety of suggestions. The Microsoft guide to the principles of inclusive design, excepted below, suggests thinking about temporary and situational disabilities as well as permanent ones.
Being clear with text, distinctive labeling, and adding multiple ways to identify the point to your visuals will make it easier for people with impairments and those without to interpret your graphs. There are easy ways to add the principles of accessibility into your visual communications. Here are five simple ones.
5 easy ways to make your data visualization more accessible
1. Add Alt text
Alternative text (referred to as Alt text) is displayed when the image cannot be. Screen readers, the assistive technology used by people who are visually impaired, read alt text out loud in place of people seeing the image. It’s important to have valuable alt text instead of “figure-13.jpg,” which doesn’t help a user understand the content they are missing. Screen readers speak alt text without allowing users to speed up or skip, so make sure the information is descriptive but succinct.
According to the CFPB data visualization guide, which I helped create, good alt text includes: one sentence of what the chart is, including the chart type for users with limited vision who may only see part of it. It should also include a link to a CSV or other machine-readable data format so people with impaired vision can tab through the chart data with a screen reader.
If you frequently use data visualizations in Microsoft products, this guide will help you learn how to add alt text to them.
2. Employ a takeaway title
Research suggests that users read the title of the graph first. People also tend to just rephrase the title of graph when asked to interpret the meaning of the visualization. When the graph title includes the point, the cognitive load of understanding the chart decreases. Recently, when writing about how to effectively use words in graphs, Cole advocated for what she called a takeaway title. People know what to look for in the data when they read the graph takeaway first as part of the title.
3. Label data directly
One way to reduce the cognitive burden on users it to directly label your data rather than using legends. This is especially useful for colorblind or visually impaired users who may have difficulty matching colors within the plot to those in the legend. It also decreases the work of scanning back and forth trying to match the legend with the data.
Notice the difference in the following visual example. Compare how quickly you understand them and the relative ease of processing the one that is labeled directly.
4. Check type and color contrast
In the data visualization community, we’ve started addressing how to design for red/green colorblindness. This is an issue for 8% of men and 0.5% of women with Northern European ancestry. However, we should also consider users with low vision and a variety of other conditions that affect vision. The W3 WCAG guidelines specify necessary contrast and text sizes for readability on screen. There are a number of tools to help you abide by these contrast and size standards. For example, in the following screenshot the top example passes color contrast standards, where the below version is too light and fails.
Tools to help get you started:
Color blindness simulator for images or
5. Use white space
White space is your friend. When information is too densely packed, the graphic can feel overwhelming and unreadable. It can be helpful to leave a gap between sections of a chart. Judicious use of white space increases the legibility by helping to demarcate and distinguish between different sections without relying only on color. This can also supplement accessible color choices by helping users distinguish the difference between colors that identify separate sections.
In the following example, compare how easily you can distinguish the sections. Now imagine if you more difficulty distinguishing between the colors. The lines between them help you to visually identify each chunk.
These are just a few things you can do to help everyone easily comprehend the graphs you create. You should strive to make sure that everyone, not just you or your ideal user, understands the point of the visualization. When you consider accessibility, you create a better product for everyone.