The slopegraph challenge was a popular one, with topics ranging from economics to entertainment. We noted overall good use of color and words, with many illustrations of how slopegraphs can be used to show a ton of data without being overwhelming when visual hierarchy is clear. Follow the link to see the full recap post, including each submission and related commentary.Read More
Today's short post focuses on an Excel tip that I find myself using frequently and I get asked about a ton. While I focus on Excel here, I imagine the same could apply when working in any spreadsheet application. For additional formatting flexibility, put some graph elements in the cells directly.
This is probably easiest to illustrate through an example. Below is a visual from storytelling with data: a data visualization guide for business professionals. It's highlighted in Chapter 6 (pg. 161), which focuses on what I consider to be model examples of data visualization. I regularly get questions about how this graph was created.
There are two common questions posed about this visual:
- How did you format some of the y-axis labels differently from the rest?
- How did you include the TOTAL % in the graph?
The trick in both of these cases is the same: these elements aren't part of the graph directly, rather they are in individual cells in Excel. In this particular graph, the only things directly in the graph element are the stacked bars. Everything else is done in cells. I do this for additional formatting flexibility. In Excel, you can only apply a single format to axis labels on a given axis. You also have to get creative (and it can sometimes be limiting) if you want to add additional numbers or labels to the visual that aren't part of the data you're visualizing directly. Using the cells allows me to break free from both of these constraints.
In case it's easier to see what I'm talking about, below is what the visual looks like if I show the gridlines in Excel (in the preceding view, all of my cells were filled with white).
When approaching this way, you do have to take care to ensure that everything is lined up correctly. This means precisely aligning the graph with the cells and making both cell and graph heights line up (otherwise your bars won't correctly align with the words and numbers). Also, be aware that if you shuffle your data around, you'll need to adjust the elements you've put into the cells directly. Changing the order of your data would change the graph in this case, but not the PRIORITY or TOTAL %, which would have to be done manually (unless you set up so it's pointing to the data you reshuffle). Both of these things lead me to express a word of caution: when I break the pieces apart like this, it's easier to introduce issues and have things not line up: both from a visual standpoint as well for data and labels to get out of sync. Because of this, the tradeoffs may not be worth it if you're trying to automate or it's a repeated process that you don't want to have to adjust each time. But so long as you're careful and pay attention to detail, when you need the additional formatting flexibility, consider putting some elements into cells directly.
If it's of interest, you can download my Excel workbook.
Are there other graphs from storytelling with data that you have questions about or would like a behind-the-scenes peek? Or other tricks you employ when graphing data that you'd like to share? Leave a comment!
We recently worked with a client who was using treemaps in many of their visuals and the client questioned whether there was a more effective way to visualize their data. Today’s post illustrates some pros and cons of using treemaps, plus possible alternatives.
Treemaps are increasingly being included in most data visualization tools (including the latest Excel 2016 version, grouped under “Hierarchy charts”). The treemap was originally designed to visualize a very large amount of data in a hierarchical, tree-structured diagram where the size of the rectangles organized from largest to smallest. Color is used to encode a second dimension. Today, they’re often used generally for categorical data. For further reading on treemaps, check out these posts from Jeffrey Shaffer and Robert Kosara.
appropriate use cases for treemaps
Treemaps can work well if your data falls into this scenario:
- You want to visualize a part-to-whole relationship amongst a large number of categories.
- Precise comparisons between categories is not important.
- The data is hierarchical.
Steve Wexler provides a nice example of why a treemap was more effective than a bar chart when visualizing the Electoral College votes from the 2012 election in this post.
the challenge with treemaps
While treemaps may seem like a sexy choice for visualizing data, they’re often used when another chart type would serve the data better. The limitation is that when we’re encoding data with area and intensity of color, our eyes aren’t great a detecting relatively minor differences in either of these dimensions. If our data is such that our audience needs to make precise comparisons between categories, it’s even more cumbersome when the categories aren’t aligned to a common baseline. We should never make our audience do more work than necessary to understand a graph!
Bottom line: treemaps do have a use case —however consider reaching for something else when your audience needs to be able to make specific comparisons. Andy Kriebel provides a great litmus test for the effectiveness of a treemap in this this post: if it takes longer than 2 seconds to compare categories, go with a simpler design.
a real-world makeover
Back to our client: their treemap looked similar to the one below (I’ve anonymized the data to protect confidentiality). The intent was to compare returns activity (measured by volume of dollars claimed and number of returns) for their customer base where % of returns is encoded by the size of the rectangles while % of dollars is encoded by color - both in differences in hue and intensity.
I can make a few general conclusions from this visual: three customers (A, B, and C) are roughly half of the return activity (visually estimated based on the size of the rectangles) and these same three are also pretty drastically different in terms of dollars claimed (judged by the intensity of color—although without a color legend it’s difficult to discern much more than that).
I’d argue that this visual might work well in an exploratory phase of analysis, where the analyst is sorting through the data to find what might be interesting to stakeholders. However, it doesn’t work so well when communicating what’s interesting in this data.
What if I asked you which are the top 10 customers by dollars claimed? Or what’s the exact difference between Customer A and B’s % of returns? We can see how easily a treemap breaks down when trying to make these comparisons.
One possible solution is a bar chart. Below, I’ve displayed all customers’ returns and dollars claimed horizontally. This lends itself well to comparing across categories and we can easily see that four customers comprise the majority of return activity.
When communicating this data, we can help our audience understand what’s important by focusing only on the top four customers. This condensed view could lend itself well to a discussion towards action on these top four customers.
To address the remaining customers, I could provide the entire bar chart for additional detail if the audience needs it. I might also choose to add the other customers with a footnote (as shown) or group them into an “All other” category. The decision point here would be determined by the level of detail my audience needs.
Taking this a step further, we can introduce color strategically as visual cue, paired with some explanatory text, to prompt our audience to act on what this data may suggest.
For an additional example, Stephen Few provides a similar bar chart solution in this before & after post.
For a deeper look at the graphs, you can download the accompanying Excel file. What do you think? Are there situations where you’ve seen a treemap more effective than an alternate solution? Leave a comment with your thoughts!
Slopegraphs can be great for visualizing change or differences between two points in time or categories. Check out the challenge post to learn more and see some examples of slopegraphs used effectively, as well as potential issues to watch out for.Read More
45 people shared their waterfall creations in May, with huge variety of tools and more business-related topics than recent challenges. Click the link to see the full recap post, including each submission and related commentary.Read More
When visualizing and communicating with data, one design element we can play with is line style. Most tools default to and we most often use and see solid lines. But a dotted line is another possibility. What considerations should we think about with a dotted line? When should we use one? In this post, I’ll outline my thoughts and illustrate the scenarios in which I find myself using dotted lines through examples plus will share some commentary on how to do this in your tools.
considerations with dotted lines
Dotted lines are super attention grabbing. They also convey a sense of uncertainty that can be useful. The challenge is that dotted lines introduce some visual noise. From a clutter standpoint, we’ve taken what could have been visualized as a single visual element (a line) and chopped it into a ton of pieces (many little lines, dashes, or dots). Because of this, I recommend against using the dotted line as a way to attract attention (rather use less noisy means of contrast, such as position, size, or color for this). Preserve the use of dotted lines for when there is a target or goal we are trying to hit or remain a certain side of or when there is uncertainty to depict (a forecast or prediction). In these cases, the visual differentiation and sense of uncertainty that the dotted line helps depict makes up for the additional visual noise it introduces. Let’s look at an example of each of these use cases and some dotted lines in action.
dotted line for a goal or target
I was recently working with a graph similar to the following that depicted time to fill a given type of role at a company:
There are three lines in the graph above: (1) the Goal—which you likely looked at first both due to position and because it’s bold black, which stands out more than the other colored lines, (2) average days to fill roles for Internal candidates (orange), and (3) average days to fill roles with External hires over time (teal). The Internal and External lines represent data that we’ve collected and summarized, whereas the Goal is something we have set and in this case we’d like to stay below. To set the Goal apart—and here I'm interested in making it less attention grabbing but still want it there for reference—we could use a dotted line. Dotted lines come in different styles, both in terms of the thickness of the line and how large or small the individual pieces are. We also have some other design elements at our disposal when it comes to the formatting of the line and the text that goes with it. Here’s an example of how I iterated to land on a combination I liked:
I prefer the final view, where the dotted line is thin and grey, effectively pushing it to the background (in spite of the noise that this line style introduces). I should probably mention that there are other styles of dotted lines as well (for example, some that combine dashes and dots)—I recommend avoiding these because they look quite messy and instead choose a style where the segments (whether dashes or dots) are consistently spaced. Here is what my final iteration looks like in the full graph:
I like this. The GOAL is clearly stated and still the first thing I see, but due to the formatting it feels more like reference or context, while the thicker solid lines are the clear focus of the graph. Also—and perhaps it's just me, but—there's something about this view that makes it feel easier to compare each of the individual lines (Internal and External) to the GOAL than in the original view when all of the lines were of similar thickness.
dotted line to depict uncertainty
I often see graphs where some data is actual and some is forecast and there isn’t anything done to differentiate the two, like the following example.
Given that we are standing in 2018, some of the data in the graph above clearly hasn’t happened yet and so must be forecast. But how much? Was this graph recently made using actual data through 2017 and forecasting thereafter? Perhaps, but we’d have to make that assumption, because nothing in the graph tells us. Or maybe there was a footnote hiding down at the bottom of the original that articulated this (there wasn't, but in case there were); I shouldn't have to read the fine print in order to know how to read the graph. Don’t make your audience question, make assumptions, or hunt for detail like this—make it clear.
One option is to use words to differentiate between actual and forecast. In the following, I added supercategories along the bottom to indicate which dates are associated with actual data and which are forecast (check out the recent post illustrating the step-by-step on how to achieve in Excel if that’s of interest). Since we're on the topic of dotted lines, I could also add a dotted line to further visually differentiate actual vs. forecast data:
Even better, though, if I preserve my use of the dotted line to depict the uncertainty directly for the portion of the line that represents forecast data. If I do this, I don’t need to add the additional line at all. Here’s what it could look like:
In this case, both the line style and the words on the x-axis make it quickly clear which data points represent actual data and which are forecast. I'm a big advocate of thick, bold, solid lines and data markers for actual data and thin, dotted lines (and sometimes non-filled in data markers, though I felt that looked too messy here) for forecasts and predicted data because of the way it helps us intuitively understand what we are looking at when executed well.
In the above, I’m assuming the forecast data points are important enough to label directly (in other words, that the specific numeric values are important; if that weren’t the case, you might approach this differently, perhaps only labeling the 2017 and 2022 points, or not labeling any of them and rather letting the y-axis for general magnitude be good enough). I can imagining different people making different choices here depending on both what you want the audience to focus on as well as personal aesthetic preferences. The meta-point: be thoughtful when it comes to design details in general, and your use of dotted lines in particular.
the tactical: how do I do this in my tool?
Changing the formatting of a single line from solid to dotted—like in the first example above—is possible and pretty straightforward in most tools. This is typically achieved through a menu or code to change the line style. If you aren’t sure how to do this, some smart Google searching with the name of your tool and something like “change line in graph to dotted” should point you towards a solution.
Changing just part of a line from solid to dotted is slightly more complicated, but there are a couple of solutions for that. There is the brute-force method of physically formatting each individual data point (for example if you are working in Excel, you would click once to highlight the series, then click again to highlight an individual data point, and then can format that data point or associated line individually as you would like it). As you can imagine, this can be time-consuming. Another way is to make what will appear to be a single line actually two different data series, allowing you to format them separately. In the Sales example above, I’d have a column of dates that goes from 2010 to 2022 to set my x-axis. Then my first series for the ACTUAL data would have values from 2010 to 2017. I’d have a second series for my FORECAST line that has values from 2017 to 2022 (note the overlap with 2017 having values for both ACTUAL and FORECAST to avoid a gap in the line). I imagine you could use a similar approach in other tools.
You can download the Excel file with the above examples (including both the brute force and more elegant solutions described above for the second example).
These are the two use cases in which I find myself using dotted lines and promoting their use. Are there other cases where you’d recommend using a dotted line? Or additional considerations we should have in mind when choosing line style? Leave a comment with your thoughts!