putting graph elements in cells directly

Today's short post focuses on an Excel tip that I find myself using frequently and I get asked about a ton. While I focus on Excel here, I imagine the same could apply when working in any spreadsheet application. For additional formatting flexibility, put some graph elements in the cells directly.

This is probably easiest to illustrate through an example. Below is a visual from storytelling with data: a data visualization guide for business professionals. It's highlighted in Chapter 6 (pg. 161), which focuses on what I consider to be model examples of data visualization. I regularly get questions about how this graph was created.

 Source: Knaflic, Cole.  Storytelling With Data: A Data Visualization Guide for Business Professionals , Wiley, © 2015. (Original figure had data labels on the stacked bars; I've omitted those here for simplicity.)

Source: Knaflic, Cole. Storytelling With Data: A Data Visualization Guide for Business Professionals, Wiley, © 2015. (Original figure had data labels on the stacked bars; I've omitted those here for simplicity.)

There are two common questions posed about this visual:

  1. How did you format some of the y-axis labels differently from the rest? 
  2. How did you include the TOTAL % in the graph?

The trick in both of these cases is the same: these elements aren't part of the graph directly, rather they are in individual cells in Excel. In this particular graph, the only things directly in the graph element are the stacked bars. Everything else is done in cells. I do this for additional formatting flexibility. In Excel, you can only apply a single format to axis labels on a given axis. You also have to get creative (and it can sometimes be limiting) if you want to add additional numbers or labels to the visual that aren't part of the data you're visualizing directly. Using the cells allows me to break free from both of these constraints.

In case it's easier to see what I'm talking about, below is what the visual looks like if I show the gridlines in Excel (in the preceding view, all of my cells were filled with white).

FIG0605 with gridlines.png

When approaching this way, you do have to take care to ensure that everything is lined up correctly. This means precisely aligning the graph with the cells and making both cell and graph heights line up (otherwise your bars won't correctly align with the words and numbers). Also, be aware that if you shuffle your data around, you'll need to adjust the elements you've put into the cells directly. Changing the order of your data would change the graph in this case, but not the PRIORITY or TOTAL %, which would have to be done manually (unless you set up so it's pointing to the data you reshuffle). Both of these things lead me to express a word of caution: when I break the pieces apart like this, it's easier to introduce issues and have things not line up: both from a visual standpoint as well for data and labels to get out of sync. Because of this, the tradeoffs may not be worth it if you're trying to automate or it's a repeated process that you don't want to have to adjust each time. But so long as you're careful and pay attention to detail, when you need the additional formatting flexibility, consider putting some elements into cells directly.

If it's of interest, you can download my Excel workbook.

Are there other graphs from storytelling with data that you have questions about or would like a behind-the-scenes peek? Or other tricks you employ when graphing data that you'd like to share? Leave a comment!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

an alternative to treemaps

 
 

We recently worked with a client who was using treemaps in many of their visuals and the client questioned whether there was a more effective way to visualize their data. Today’s post illustrates some pros and cons of using treemaps, plus possible alternatives.

Treemaps are increasingly being included in most data visualization tools (including the latest Excel 2016 version, grouped under “Hierarchy charts”). The treemap was originally designed to visualize a very large amount of data in a hierarchical, tree-structured diagram where the size of the rectangles organized from largest to smallest. Color is used to encode a second dimension. Today, they’re often used generally for categorical data. For further reading on treemaps, check out these posts from Jeffrey Shaffer and Robert Kosara.

appropriate use cases for treemaps

Treemaps can work well if your data falls into this scenario:

  1. You want to visualize a part-to-whole relationship amongst a large number of categories.
  2. Precise comparisons between categories is not important.
  3. The data is hierarchical.

Steve Wexler provides a nice example of why a treemap was more effective than a bar chart when visualizing the Electoral College votes from the 2012 election in this post.

the challenge with treemaps

While treemaps may seem like a sexy choice for visualizing data, they’re often used when another chart type would serve the data better. The limitation is that when we’re encoding data with area and intensity of color, our eyes aren’t great a detecting relatively minor differences in either of these dimensions. If our data is such that our audience needs to make precise comparisons between categories, it’s even more cumbersome when the categories aren’t aligned to a common baseline. We should never make our audience do more work than necessary to understand a graph!

Bottom line: treemaps do have a use case however consider reaching for something else when your audience needs to be able to make specific comparisons. Andy Kriebel provides a great litmus test for the effectiveness of a treemap in this this post: if it takes longer than 2 seconds to compare categories, go with a simpler design.

a real-world makeover

Back to our client: their treemap looked similar to the one below (I’ve anonymized the data to protect confidentiality). The intent was to compare returns activity (measured by volume of dollars claimed and number of returns) for their customer base where % of returns is encoded by the size of the rectangles while % of dollars is encoded by color - both in differences in hue and intensity.

 
original treemap.png
 

I can make a few general conclusions from this visual: three customers (A, B, and C) are roughly half of the return activity (visually estimated based on the size of the rectangles) and these same three are also pretty drastically different in terms of dollars claimed (judged by the intensity of color—although without a color legend it’s difficult to discern much more than that).

I’d argue that this visual might work well in an exploratory phase of analysis, where the analyst is sorting through the data to find what might be interesting to stakeholders. However, it doesn’t work so well when communicating what’s interesting in this data.  

What if I asked you which are the top 10 customers by dollars claimed? Or what’s the exact difference between Customer A and B’s % of returns? We can see how easily a treemap breaks down when trying to make these comparisons.

One possible solution is a bar chart. Below, I’ve displayed all customers’ returns and dollars claimed horizontally. This lends itself well to comparing across categories and we can easily see that four customers comprise the majority of return activity.

 
allcustomers.png
 

When communicating this data, we can help our audience understand what’s important by focusing only on the top four customers. This condensed view could lend itself well to a discussion towards action on these top four customers.

 
top4.png
 

To address the remaining customers, I could provide the entire bar chart for additional detail if the audience needs it. I might also choose to add the other customers with a footnote (as shown) or group them into an “All other” category. The decision point here would be determined by the level of detail my audience needs.

Taking this a step further, we can introduce color strategically as visual cue, paired with some explanatory text, to prompt our audience to act on what this data may suggest. 

 
final solution.PNG
 

For an additional example, Stephen Few provides a similar bar chart solution in this before & after post.

For a deeper look at the graphs, you can download the accompanying Excel file. What do you think? Are there situations where you’ve seen a treemap more effective than an alternate solution? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the storytelling with data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

#SWDchallenge: slopegraph

Last month, we focused on a variation of a bar graph when I challenged you to make a waterfall chart (45 people shared their creations: be sure to check out the recap post!) This month, we’ll be practicing an alternative form of a line graph: the slopegraph.
 

what is a slopegraph?

For me, “slopegraph” is really just a fancy word to describe a line graph that only has two points on the x-axis. Sometimes we don’t think of using lines when we only have two points, however depending on our data—and what we want to enable our audience to see—they can sometimes work quite well.

Let’s look at a progression to better understand what a slopegraph helps us see. Check out the following basic dual-series bar chart depicting sales at two points in time by region.

SlopegraphProgression_1A.png
 

When we look at a dual series bar chart like the one above, typically we are comparing the heights of the paired bars to each other. So let’s actually draw some lines on the graph to reflect this.

SlopegraphProgression_2A.png
 

Now that I’ve drawn the lines, I no longer need the bars. Let’s eliminate those.

SlopegraphProgression_3A.png
 

I also don’t need the spread in horizontal space: I can collapse these lines. (I’ll also remove all of the other details; I’ll add these back momentarily.)

SlopegraphProgression_4A.png
 

I guess I don’t need to compress the lines that much: let’s expand a bit to make better use of the space.

SlopegraphProgression_5A.png
 

Next I’ll add titles and labels back in so we know what we’re looking at. Viola! We have a slopegraph.

SlopegraphProgression_6A.png
 

Let’s pause to take note of a couple of things. First, consider what these lines depict: they are visualizing the change. When the line slopes upwards, it means that region had an increase in sales. Where the line slopes downwards, the region's sales decreased. The lines having steeper slopes represent regions where sales changed more than those with flatter slopes. I find often a solution raised when we want to focus on change in an example like the one above is to move from the dual series bar chart to a single series bar chart that graphs the difference, the change. The downside of this is that you lose the context of the basis: the relative beginning (or ending) heights of the bars. With the slopegraph, however, the focus is on the change (the line) but you still preserve the context of relative sizes of the categories (via vertical position on respective axes). So my audience can intuitively see relative magnitudes of change, as well as what basis that change is happening from.

As another sometimes-benefit, when I shift from bars to lines, I now have the ability to rework my y-axis and can scale it to start at something other than zero. (This is not ok with bars, where we compare endpoints relative to each other and the baseline and thus need the full bar for accurate comparison, but ok for lines where we focus on relative positions in space and the relative slopes of the lines that connect those positions, which remain constant as we zoom). This means if numbers are close and small differences are meaningful, I have the ability to zoom in to get more spread. I didn’t need to do that in this particular example, but this can sometimes be useful.

Also, when there is a general trend (many things increasing, many things staying flat, or many things decreasing), exceptions can be picked out very quickly with the slopegraph view. For example, in the slopegraph above, notice how easy it is to see that Region A is the only region that decreased in sales in the time period shown. I can take this a step further, highlighting this takeaway via words and sparing color:

SlopegraphProgression_7A.png
 


slopegraphs for group comparisons

The preceding slopegraph shows change over time. Slopegraphs can also work well to show group comparisons. I’ve had people get angry with me before for using lines for non-continuous data. This is one of those popular data visualization myths (that lines are only for continuous data—I’ve been guilty of oversimplifying with these words in the past, which aren’t quite right). The rule—and though there aren’t many hard and fast rules in data visualization, I feel comfortable characterizing this as one—is when you are graphing data in a line graph, the lines that connect the points need to make sense. While this is commonly the case for continuous data, it can happen for non-continuous, or categorical, data as well. Take the case of the slopegraph for group comparisons as an illustration. In the following example, I’ve graphed employee survey data for the Total Organization on the left and the Sales Team on the right.

SlopegraphGroupComp_1.png
 

Let’s consider what the lines in the slopegraph above represent. In cases where the line slopes downwards, Sales underperformed (scored lower than) the overall organization. Where the lines slope upwards, Sales outperformed (scored higher than) the overall organization. Lines sloping upwards with steeper slope indicate categories that increased by a greater proportion. These lines make sense. They are visually encoding the difference between each category across the two groups.

By the way, if I spend some more time playing with the design of the above slopegraph, I might do something like the following:

SlopegraphGroupComp_2.png
 

In the above view, I moved the x-axis labels (Total Org and Sales Team) to the top so that my audience sees it before they get to the data so they know right away what they're looking at. I also added vertical lines, as I like how this helps highlight that the overall spread in feedback (from min to max score across categories) is larger for the Sales Team compared to Total Org. I also de-emphasized the data points, labels and lines coming from Total Org so those are there for reference, but attention is focused more on the right hand side via black markers and labels. In the examples I've highlighted here, I used circle data markers for all. In cases where you have many lines, you may simply want to show the lines (without markers), which can look less cluttered. That said, I like using end markers when I'm including the data points and labels, as I think this helps visually anchor and tie the two together.

Perhaps we take it a step further, highlighting a specific takeaway:

SlopegraphGroupComp_3.png
 

And now, recognizing that I’ve focused only on the negative in the two examples we’ve looked at, here’s a view with a happier point made:

SlopegraphGroupComp_4.png
 

Because the lines are what take up most of the ink in a slopegraph, the focus is primarily on the change or difference between the categories—consider using a slopegraph when that is something important you’d like your audience to focus on or easily see.

Slopegraphs are often lauded for their clean design. I frequently do an exercise in my workshops where people in small groups look at the same data graphed multiple ways, with a slopegraph being one of the options. I continue to be surprised by how many favor this view, in spite of the issues that the slopegraph has for the specific example I use. Speaking of issues…let’s discuss some of the shortcomings of the slopegraph.
 

slopegraph shortcomings

Whether a slopegraph will work for any given data is highly dependent on the data itself. If you have tightly bunched data, or many crisscrossing lines, it can be difficult to see what’s going on, or label the data in a way that is legible. Also you don’t have control over the relative ordering of categories (rather, they are where they are because of the data they are tied to). This is no issue if your categories don’t have intrinsic ordering, but can become problematic if they do (for example, survey categories ranging from strongly agree to strongly disagree, which you’d want to order meaningfully). Slopegraphs can also throw people because they look different than a typical line graph, which can sometimes be off-putting. While slopegraphs are possible in most tools (since they are simply line graphs), they can take time and patience to format in the desired fashion.

That said, there are certainly good uses for slopegraphs. Here’s where you come in.
 

the challenge

My challenge to you: find some data of interest that will lend itself well and create a slopegraph. DEADLINE: Friday, June 8th by midnight PST. Full submission details follow (be sure to email it to us, taking note of specifics below, for inclusion in recap post!). You're also welcome to share at any point on social media using #SWDchallenge.

SUBMISSION INSTRUCTIONS:

  • Make it. Identify your data and create your visual with the tool of your choice. If you need help finding data, check out this list of publicly available data sources. You're also welcome to use a real work example if you'd like, just please don't share anything confidential.
  • Share it. Email your entry to SWDchallenge@storytellingwithdata.com by the deadline. Attach your image as a .PNG. Put any commentary you’d like included in my follow up post in the body of the email (e.g. what tool you used, any notes on your methods or thought process you’d like to share); if there’s a social media profile or blog/site you’d like mentioned, please embed the links directly in your commentary (e.g. Blog | Twitter). If you’re going to write more than a paragraph or so, I encourage you to post it externally and provide a link or summary for inclusion. Feel free to also share on social media at any point using #SWDchallenge.
  • The fine print. We reserve the right to post and potentially reuse examples shared.

In case it's of interest, we’ve written about slopegraphs a few times in the past. Check out the following if you’d like to read more: I like [candy] bars better than donutsmore on slopegraphsslopegraph templatevisualizing change via slopegraph. Also if you have tips or tricks or other points related to slopegraphs that others can learn from, please leave a comment to share!

I look forward to seeing what you come up with! Stay tuned for the recap post in the second half of June. Also check out our new #SWDchallenge page on the blog for past challenge details and recaps.

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

when to use a dotted line

When visualizing and communicating with data, one design element we can play with is line style. Most tools default to and we most often use and see solid lines. But a dotted line is another possibility. What considerations should we think about with a dotted line? When should we use one? In this post, I’ll outline my thoughts and illustrate the scenarios in which I find myself using dotted lines through examples plus will share some commentary on how to do this in your tools.

considerations with dotted lines

Dotted lines are super attention grabbing. They also convey a sense of uncertainty that can be useful. The challenge is that dotted lines introduce some visual noise. From a clutter standpoint, we’ve taken what could have been visualized as a single visual element (a line) and chopped it into a ton of pieces (many little lines, dashes, or dots). Because of this, I recommend against using the dotted line as a way to attract attention (rather use less noisy means of contrast, such as position, size, or color for this). Preserve the use of dotted lines for when there is a target or goal we are trying to hit or remain a certain side of or when there is uncertainty to depict (a forecast or prediction). In these cases, the visual differentiation and sense of uncertainty that the dotted line helps depict makes up for the additional visual noise it introduces. Let’s look at an example of each of these use cases and some dotted lines in action.

dotted line for a goal or target

I was recently working with a graph similar to the following that depicted time to fill a given type of role at a company:

Dotted Lines 1.png
 

There are three lines in the graph above: (1) the Goal—which you likely looked at first both due to position and because it’s bold black, which stands out more than the other colored lines, (2) average days to fill roles for Internal candidates (orange), and (3) average days to fill roles with External hires over time (teal). The Internal and External lines represent data that we’ve collected and summarized, whereas the Goal is something we have set and in this case we’d like to stay below. To set the Goal apart—and here I'm interested in making it less attention grabbing but still want it there for reference—we could use a dotted line. Dotted lines come in different styles, both in terms of the thickness of the line and how large or small the individual pieces are. We also have some other design elements at our disposal when it comes to the formatting of the line and the text that goes with it. Here’s an example of how I iterated to land on a combination I liked:

Dotted Lines 2.png

I prefer the final view, where the dotted line is thin and grey, effectively pushing it to the background (in spite of the noise that this line style introduces). I should probably mention that there are other styles of dotted lines as well (for example, some that combine dashes and dots)—I recommend avoiding these because they look quite messy and instead choose a style where the segments (whether dashes or dots) are consistently spaced. Here is what my final iteration looks like in the full graph:

Dotted Lines 3.png
 

I like this. The GOAL is clearly stated and still the first thing I see, but due to the formatting it feels more like reference or context, while the thicker solid lines are the clear focus of the graph. Also—and perhaps it's just me, but—there's something about this view that makes it feel easier to compare each of the individual lines (Internal and External) to the GOAL than in the original view when all of the lines were of similar thickness.

dotted line to depict uncertainty

I often see graphs where some data is actual and some is forecast and there isn’t anything done to differentiate the two, like the following example.

 

Given that we are standing in 2018, some of the data in the graph above clearly hasn’t happened yet and so must be forecast. But how much? Was this graph recently made using actual data through 2017 and forecasting thereafter? Perhaps, but we’d have to make that assumption, because nothing in the graph tells us. Or maybe there was a footnote hiding down at the bottom of the original that articulated this (there wasn't, but in case there were); I shouldn't have to read the fine print in order to know how to read the graph. Don’t make your audience question, make assumptions, or hunt for detail like this—make it clear.

One option is to use words to differentiate between actual and forecast. In the following, I added supercategories along the bottom to indicate which dates are associated with actual data and which are forecast (check out the recent post illustrating the step-by-step on how to achieve in Excel if that’s of interest). Since we're on the topic of dotted lines, I could also add a dotted line to further visually differentiate actual vs. forecast data:

Dotted Lines 5.png
 

Even better, though, if I preserve my use of the dotted line to depict the uncertainty directly for the portion of the line that represents forecast data. If I do this, I don’t need to add the additional line at all. Here’s what it could look like:

Dotted Lines 6.png

In this case, both the line style and the words on the x-axis make it quickly clear which data points represent actual data and which are forecast. I'm a big advocate of thick, bold, solid lines and data markers for actual data and thin, dotted lines (and sometimes non-filled in data markers, though I felt that looked too messy here) for forecasts and predicted data because of the way it helps us intuitively understand what we are looking at when executed well.

In the above, I’m assuming the forecast data points are important enough to label directly (in other words, that the specific numeric values are important; if that weren’t the case, you might approach this differently, perhaps only labeling the 2017 and 2022 points, or not labeling any of them and rather letting the y-axis for general magnitude be good enough). I can imagining different people making different choices here depending on both what you want the audience to focus on as well as personal aesthetic preferences. The meta-point: be thoughtful when it comes to design details in general, and your use of dotted lines in particular.

the tactical: how do I do this in my tool?

Changing the formatting of a single line from solid to dotted—like in the first example above—is possible and pretty straightforward in most tools. This is typically achieved through a menu or code to change the line style. If you aren’t sure how to do this, some smart Google searching with the name of your tool and something like “change line in graph to dotted” should point you towards a solution.

Changing just part of a line from solid to dotted is slightly more complicated, but there are a couple of solutions for that. There is the brute-force method of physically formatting each individual data point (for example if you are working in Excel, you would click once to highlight the series, then click again to highlight an individual data point, and then can format that data point or associated line individually as you would like it). As you can imagine, this can be time-consuming. Another way is to make what will appear to be a single line actually two different data series, allowing you to format them separately. In the Sales example above, I’d have a column of dates that goes from 2010 to 2022 to set my x-axis. Then my first series for the ACTUAL data would have values from 2010 to 2017. I’d have a second series for my FORECAST line that has values from 2017 to 2022 (note the overlap with 2017 having values for both ACTUAL and FORECAST to avoid a gap in the line). I imagine you could use a similar approach in other tools.

You can download the Excel file with the above examples (including both the brute force and more elegant solutions described above for the second example).

These are the two use cases in which I find myself using dotted lines and promoting their use. Are there other cases where you’d recommend using a dotted line? Or additional considerations we should have in mind when choosing line style? Leave a comment with your thoughts!

#SWDchallenge: the waterfall chart

May’s challenge was a waterfall chart - great use case when you have a beginning quantity, additions & deductions and an ending quantity. Click the link below to see the full details, including an example and submission instructions.

Read More
/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.