tactical tip: embedding a vertical reference line in Excel

Today's post is a step-by-step Excel “how-to” inspired by a reader question we received following a recent post on using dotted lines in data visualizations.

Dave asked:
“Do you know of a trick for drawing vertical lines to delineate years (or actuals/historical vs forecast/future segments of the chart)? I currently have to draw them with the line drawing tool, which gets messy when moving the chart on a PPT slide. If there were a way to embed it in the data or somehow format the chart, that'd be awesome.”

The following chart illustrates what Dave describes. The data is units of output over time where the first nine months of the series are actual data and the remaining four months of the year are a forecast. The dotted line serves as a visual cue to differentiate actual from forecast. Created in Excel, the line was physically drawn on the graph with the Shape Illustrator. While this approach might suffice as a quick method for achieving the desired effect; it isn’t ideal for recurring use of the graph, particularly if the line’s position on the x-axis might change in future iterations.

 
Picture1.png
 

After some research and playing around in Excel, I’ve devised one method for achieving this effect, which I’ll outline in this post (I’m sure there are others!). Don't be dismayed by the number of steps: it's a one-time setup after which can be easily refreshed in future iterations by changing where you want the reference line. I’m using Excel 2016 and you can download the accompanying file.

In my spreadsheet, the data for the Output over time chart looks like this:

 
Picture2.png
 

1. Go to a blank cell range and enter these values as shown in my screenshot below. I’m choosing to add these new values directly underneath my data range in cells F19:G21. This will eventually become the coordinates for a secondary scatterplot that we’ll add in a later step.

 
Picture3.png
 

2. Choose where you want the vertical reference line to cross the x-axis and enter those values below “X”. In this example, I want the line located on the September data point, the ninth point in my data series. In cells G20:G21, I entered “9” in each, as shown below. (Note: for a more automated approach in a larger dataset, a MATCH formula could also calculate where September falls in the range: =MATCH("Sep",$F$7:$F$18,0).

 
Picture4.png
 

3. Add a new data series by right-clicking the graph and choosing Select Data:

 
 

4.  In Select Data Source dialog, click the Add button.

 
 

5. In the Edit Series dialog, enter a name for your data series (I chose “reference”) and select the X values you entered from Step 2. I selected the 9’s in G20:G21. Click OK to exit the dialog boxes.

 
 

The resulting visual looks like this:

 
 

6. Right click the new line and choose Change Series Chart Type.

 
 

7. In the Change Chart Type dialog box, select Combo section under All Charts tab. Then select Scatter with Straight Lines and check the option for Secondary Axis. Click OK to exit.

 
Picture10.png
 

The resulting visual looks like this:

 
 

8. Go to the chart, right click the red reference line and choose Select Data again. In the Select Data Source dialog, highlight reference and click Edit.

 
 

9. In Edit Series dialog, update the X values to be the original values you selected in Step 5. Set the Y values to be 0,1. Click OK to exit.

 
Picture13.png
 

The resulting visual looks like this:

 
 

The remaining steps are visual cleanup: first, I forced the red line to align with the top of the primary y-axis and second, I hid the secondary axis line and text labels.   

10. Right-click on the secondary y-axis and select Format Axis:

 
Picture15.png
 

11. In the Axis options section, type 1 into the textbox beside the Maximum option.

 
 

12. In the Text Options section, under Text Fill, choose No fill. This will remove the text labels on the secondary y-axis.

 
 

13. In the Axis Options section, under Line, choose No line. This will remove the secondary y-axis line.

 
 

Voila! The resulting visual has an embedded vertical line, which is plotted on a hidden secondary y-axis.

 
Picture20.png
 

Recall that this goal of this specific scenario was a dotted line which visually differentiated the actual and forecast sections. My last step was to change the formatting of the line to appear as a thin, grey dashed line.  

(Note: To achieve your preferred formatting, right-click the line and select Format Data Series in the context menu where you’ll find formatting selections in the resulting dialog pane.)

 
 

This method does come with some trade-offs to consider.

One downside is that you lose some control over the exact placement of the line where it crosses the x-axis. Below you’ll see a comparison between the manual vs embedded approach. With the manual approach, the line can be drawn exactly on the tick mark between the Aug & Sep data points, providing a clean alignment with the x-axis. With the embedded approach, the line is centered above the Sep label, resulting in a slightly less seamless effect.

 
 

On the cosmetic side, another downside is losing the flexibility to manipulate the length of the line for labeling purposes. I’ll illustrate this with a horizontal bar chart (which I also created using this method). With the manual approach, I can physically draw the line to extend above the x-axis line, aligning it closely to the “Target” text label. With the embedded approach, the line stays below the x-axis line, creating a gap between the line and the label that describes it.

 
 

You can download the Excel file to see the behind-the-scenes of these graphs. Are there other methods you’re aware of for achieving this effect? Or other considerations with embedding the reference line directly? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

visualizing uncertainty

We often have some measure of uncertainty in our data—a forecast, prediction or range of possible values. A common challenge is how to visualize that uncertainty and help our audience understand the implications. In today’s post, I’ll use a real-world example to illustrate one approach and share tactics for creating in Excel.

The client’s original visual looked similar to the one below. It shows 2017 earnings per share (EPS) and the forecast outlook for the next four years. The client used a CAGR to forecast a range of possible EPS values from 2018 - 2021.  

 
Picture1.png
 

At first glance, it wasn’t obvious that the blue bars represented a forecast (even with the x-axis labeling of “E” for expected). The first yellow bar represents the 2017 actual EPS and next four blue bars are the forecast for 2018 - 2021 where the solid section represents the midpoint and the data labels is the uncertain piece—the range of projected values.  

I made a few design changes to make the graph a little easier to interpret. I first changed the bars to lines and used a dotted line for 2018 - 2021 with unfilled data markers to help visually reinforce the uncertainty.

 
Picture4.png
 

In Excel, there are two potential ways to achieve this formatting. A brute-force approach is to use a single data series and format each individual data point as a dotted line. Another approach is to graph two separate data series, one as a solid point or line and the second as a dashed line or unfilled circle, with a point of overlap to make the lines connect. You can read more detail about these two approaches in this prior post.

We often face the decision of preserving the y-axis vs. labeling data directly. I’ve done the latter in the visual below. One consideration in this decision point is the level of specificity your audience needs: are the actual values important? Or is the overall shape of the data more important? You can read more about these considerations in this prior post.

 
 

Next, let’s revisit how to show the range of forecast values. The original visual is shown again below where the forecast EPS values are represented by the data labels on top of the bars.

 
 

Rather than leave the audience with the highly taxing processing of reading these values, we can aid interpretation by instead depicting the forecast as a shaded range around the point estimate. This keeps the emphasis on the midpoints, while reducing clutter and eliminating the additional work the audience has to do. If the specific forecast values are important to the audience, we’ll deal with that momentarily.

 
 

The brute-force Excel method to adding this grey band requires a little math, graphing a second data series as a stacked bar and then formatting the stacked bar so that the bottom section renders white and the top section grey. You can download the accompanying Excel file to see how I accomplished this.

 
Picture6.png
 

But the visual is not yet complete. We should take the opportunity to add value to this data by telling the intended audience what they should know. Let’s assume this is a positive story where the outlook from the original base year (2016) has been extended to 2018. I might add explanatory text, paired with strategic use of color (I chose green to depict positivity) to focus attention on the relevant points of the data. If specific forecast EPS values are important for a given year, I could include them for context in the text. For a very technical audience, I might include even more detail with the statistics around the forecast. Just a reminder to always design with the audience’s needs in mind!

 
Picture5.png
 

 

Are you aware of other methods to achieve this effect? Have you seen other examples of uncertainty depicted effectively or tips you’d like to share? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

putting graph elements in cells directly

Today's short post focuses on an Excel tip that I find myself using frequently and I get asked about a ton. While I focus on Excel here, I imagine the same could apply when working in any spreadsheet application. For additional formatting flexibility, put some graph elements in the cells directly.

This is probably easiest to illustrate through an example. Below is a visual from storytelling with data: a data visualization guide for business professionals. It's highlighted in Chapter 6 (pg. 161), which focuses on what I consider to be model examples of data visualization. I regularly get questions about how this graph was created.

 Source: Knaflic, Cole.  Storytelling With Data: A Data Visualization Guide for Business Professionals , Wiley, © 2015. (Original figure had data labels on the stacked bars; I've omitted those here for simplicity.)

Source: Knaflic, Cole. Storytelling With Data: A Data Visualization Guide for Business Professionals, Wiley, © 2015. (Original figure had data labels on the stacked bars; I've omitted those here for simplicity.)

There are two common questions posed about this visual:

  1. How did you format some of the y-axis labels differently from the rest? 
  2. How did you include the TOTAL % in the graph?

The trick in both of these cases is the same: these elements aren't part of the graph directly, rather they are in individual cells in Excel. In this particular graph, the only things directly in the graph element are the stacked bars. Everything else is done in cells. I do this for additional formatting flexibility. In Excel, you can only apply a single format to axis labels on a given axis. You also have to get creative (and it can sometimes be limiting) if you want to add additional numbers or labels to the visual that aren't part of the data you're visualizing directly. Using the cells allows me to break free from both of these constraints.

In case it's easier to see what I'm talking about, below is what the visual looks like if I show the gridlines in Excel (in the preceding view, all of my cells were filled with white).

FIG0605 with gridlines.png

When approaching this way, you do have to take care to ensure that everything is lined up correctly. This means precisely aligning the graph with the cells and making both cell and graph heights line up (otherwise your bars won't correctly align with the words and numbers). Also, be aware that if you shuffle your data around, you'll need to adjust the elements you've put into the cells directly. Changing the order of your data would change the graph in this case, but not the PRIORITY or TOTAL %, which would have to be done manually (unless you set up so it's pointing to the data you reshuffle). Both of these things lead me to express a word of caution: when I break the pieces apart like this, it's easier to introduce issues and have things not line up: both from a visual standpoint as well for data and labels to get out of sync. Because of this, the tradeoffs may not be worth it if you're trying to automate or it's a repeated process that you don't want to have to adjust each time. But so long as you're careful and pay attention to detail, when you need the additional formatting flexibility, consider putting some elements into cells directly.

If it's of interest, you can download my Excel workbook.

Are there other graphs from storytelling with data that you have questions about or would like a behind-the-scenes peek? Or other tricks you employ when graphing data that you'd like to share? Leave a comment!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

an alternative to treemaps

 
 

We recently worked with a client who was using treemaps in many of their visuals and the client questioned whether there was a more effective way to visualize their data. Today’s post illustrates some pros and cons of using treemaps, plus possible alternatives.

Treemaps are increasingly being included in most data visualization tools (including the latest Excel 2016 version, grouped under “Hierarchy charts”). The treemap was originally designed to visualize a very large amount of data in a hierarchical, tree-structured diagram where the size of the rectangles organized from largest to smallest. Color is used to encode a second dimension. Today, they’re often used generally for categorical data. For further reading on treemaps, check out these posts from Jeffrey Shaffer and Robert Kosara.

appropriate use cases for treemaps

Treemaps can work well if your data falls into this scenario:

  1. You want to visualize a part-to-whole relationship amongst a large number of categories.
  2. Precise comparisons between categories is not important.
  3. The data is hierarchical.

Steve Wexler provides a nice example of why a treemap was more effective than a bar chart when visualizing the Electoral College votes from the 2012 election in this post.

the challenge with treemaps

While treemaps may seem like a sexy choice for visualizing data, they’re often used when another chart type would serve the data better. The limitation is that when we’re encoding data with area and intensity of color, our eyes aren’t great a detecting relatively minor differences in either of these dimensions. If our data is such that our audience needs to make precise comparisons between categories, it’s even more cumbersome when the categories aren’t aligned to a common baseline. We should never make our audience do more work than necessary to understand a graph!

Bottom line: treemaps do have a use case however consider reaching for something else when your audience needs to be able to make specific comparisons. Andy Kriebel provides a great litmus test for the effectiveness of a treemap in this this post: if it takes longer than 2 seconds to compare categories, go with a simpler design.

a real-world makeover

Back to our client: their treemap looked similar to the one below (I’ve anonymized the data to protect confidentiality). The intent was to compare returns activity (measured by volume of dollars claimed and number of returns) for their customer base where % of returns is encoded by the size of the rectangles while % of dollars is encoded by color - both in differences in hue and intensity.

 
original treemap.png
 

I can make a few general conclusions from this visual: three customers (A, B, and C) are roughly half of the return activity (visually estimated based on the size of the rectangles) and these same three are also pretty drastically different in terms of dollars claimed (judged by the intensity of color—although without a color legend it’s difficult to discern much more than that).

I’d argue that this visual might work well in an exploratory phase of analysis, where the analyst is sorting through the data to find what might be interesting to stakeholders. However, it doesn’t work so well when communicating what’s interesting in this data.  

What if I asked you which are the top 10 customers by dollars claimed? Or what’s the exact difference between Customer A and B’s % of returns? We can see how easily a treemap breaks down when trying to make these comparisons.

One possible solution is a bar chart. Below, I’ve displayed all customers’ returns and dollars claimed horizontally. This lends itself well to comparing across categories and we can easily see that four customers comprise the majority of return activity.

 
allcustomers.png
 

When communicating this data, we can help our audience understand what’s important by focusing only on the top four customers. This condensed view could lend itself well to a discussion towards action on these top four customers.

 
top4.png
 

To address the remaining customers, I could provide the entire bar chart for additional detail if the audience needs it. I might also choose to add the other customers with a footnote (as shown) or group them into an “All other” category. The decision point here would be determined by the level of detail my audience needs.

Taking this a step further, we can introduce color strategically as visual cue, paired with some explanatory text, to prompt our audience to act on what this data may suggest. 

 
final solution.PNG
 

For an additional example, Stephen Few provides a similar bar chart solution in this before & after post.

For a deeper look at the graphs, you can download the accompanying Excel file. What do you think? Are there situations where you’ve seen a treemap more effective than an alternate solution? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the storytelling with data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

when to use a dotted line

When visualizing and communicating with data, one design element we can play with is line style. Most tools default to and we most often use and see solid lines. But a dotted line is another possibility. What considerations should we think about with a dotted line? When should we use one? In this post, I’ll outline my thoughts and illustrate the scenarios in which I find myself using dotted lines through examples plus will share some commentary on how to do this in your tools.

considerations with dotted lines

Dotted lines are super attention grabbing. They also convey a sense of uncertainty that can be useful. The challenge is that dotted lines introduce some visual noise. From a clutter standpoint, we’ve taken what could have been visualized as a single visual element (a line) and chopped it into a ton of pieces (many little lines, dashes, or dots). Because of this, I recommend against using the dotted line as a way to attract attention (rather use less noisy means of contrast, such as position, size, or color for this). Preserve the use of dotted lines for when there is a target or goal we are trying to hit or remain a certain side of or when there is uncertainty to depict (a forecast or prediction). In these cases, the visual differentiation and sense of uncertainty that the dotted line helps depict makes up for the additional visual noise it introduces. Let’s look at an example of each of these use cases and some dotted lines in action.

dotted line for a goal or target

I was recently working with a graph similar to the following that depicted time to fill a given type of role at a company:

Dotted Lines 1.png
 

There are three lines in the graph above: (1) the Goal—which you likely looked at first both due to position and because it’s bold black, which stands out more than the other colored lines, (2) average days to fill roles for Internal candidates (orange), and (3) average days to fill roles with External hires over time (teal). The Internal and External lines represent data that we’ve collected and summarized, whereas the Goal is something we have set and in this case we’d like to stay below. To set the Goal apart—and here I'm interested in making it less attention grabbing but still want it there for reference—we could use a dotted line. Dotted lines come in different styles, both in terms of the thickness of the line and how large or small the individual pieces are. We also have some other design elements at our disposal when it comes to the formatting of the line and the text that goes with it. Here’s an example of how I iterated to land on a combination I liked:

Dotted Lines 2.png

I prefer the final view, where the dotted line is thin and grey, effectively pushing it to the background (in spite of the noise that this line style introduces). I should probably mention that there are other styles of dotted lines as well (for example, some that combine dashes and dots)—I recommend avoiding these because they look quite messy and instead choose a style where the segments (whether dashes or dots) are consistently spaced. Here is what my final iteration looks like in the full graph:

Dotted Lines 3.png
 

I like this. The GOAL is clearly stated and still the first thing I see, but due to the formatting it feels more like reference or context, while the thicker solid lines are the clear focus of the graph. Also—and perhaps it's just me, but—there's something about this view that makes it feel easier to compare each of the individual lines (Internal and External) to the GOAL than in the original view when all of the lines were of similar thickness.

dotted line to depict uncertainty

I often see graphs where some data is actual and some is forecast and there isn’t anything done to differentiate the two, like the following example.

 

Given that we are standing in 2018, some of the data in the graph above clearly hasn’t happened yet and so must be forecast. But how much? Was this graph recently made using actual data through 2017 and forecasting thereafter? Perhaps, but we’d have to make that assumption, because nothing in the graph tells us. Or maybe there was a footnote hiding down at the bottom of the original that articulated this (there wasn't, but in case there were); I shouldn't have to read the fine print in order to know how to read the graph. Don’t make your audience question, make assumptions, or hunt for detail like this—make it clear.

One option is to use words to differentiate between actual and forecast. In the following, I added supercategories along the bottom to indicate which dates are associated with actual data and which are forecast (check out the recent post illustrating the step-by-step on how to achieve in Excel if that’s of interest). Since we're on the topic of dotted lines, I could also add a dotted line to further visually differentiate actual vs. forecast data:

Dotted Lines 5.png
 

Even better, though, if I preserve my use of the dotted line to depict the uncertainty directly for the portion of the line that represents forecast data. If I do this, I don’t need to add the additional line at all. Here’s what it could look like:

Dotted Lines 6.png

In this case, both the line style and the words on the x-axis make it quickly clear which data points represent actual data and which are forecast. I'm a big advocate of thick, bold, solid lines and data markers for actual data and thin, dotted lines (and sometimes non-filled in data markers, though I felt that looked too messy here) for forecasts and predicted data because of the way it helps us intuitively understand what we are looking at when executed well.

In the above, I’m assuming the forecast data points are important enough to label directly (in other words, that the specific numeric values are important; if that weren’t the case, you might approach this differently, perhaps only labeling the 2017 and 2022 points, or not labeling any of them and rather letting the y-axis for general magnitude be good enough). I can imagining different people making different choices here depending on both what you want the audience to focus on as well as personal aesthetic preferences. The meta-point: be thoughtful when it comes to design details in general, and your use of dotted lines in particular.

the tactical: how do I do this in my tool?

Changing the formatting of a single line from solid to dotted—like in the first example above—is possible and pretty straightforward in most tools. This is typically achieved through a menu or code to change the line style. If you aren’t sure how to do this, some smart Google searching with the name of your tool and something like “change line in graph to dotted” should point you towards a solution.

Changing just part of a line from solid to dotted is slightly more complicated, but there are a couple of solutions for that. There is the brute-force method of physically formatting each individual data point (for example if you are working in Excel, you would click once to highlight the series, then click again to highlight an individual data point, and then can format that data point or associated line individually as you would like it). As you can imagine, this can be time-consuming. Another way is to make what will appear to be a single line actually two different data series, allowing you to format them separately. In the Sales example above, I’d have a column of dates that goes from 2010 to 2022 to set my x-axis. Then my first series for the ACTUAL data would have values from 2010 to 2017. I’d have a second series for my FORECAST line that has values from 2017 to 2022 (note the overlap with 2017 having values for both ACTUAL and FORECAST to avoid a gap in the line). I imagine you could use a similar approach in other tools.

You can download the Excel file with the above examples (including both the brute force and more elegant solutions described above for the second example).

These are the two use cases in which I find myself using dotted lines and promoting their use. Are there other cases where you’d recommend using a dotted line? Or additional considerations we should have in mind when choosing line style? Leave a comment with your thoughts!

supercategory axis labels in Excel

I have a super quick tip to share today for those reading who work in Excel. I see graphs regularly that look similar to the following:

Supercategories in Excel 1.png
 

Note the x-axis. It isn't awful. But it also isn't ideal. The years are repeated. It looks a little sloppy. "Lack of attention to detail here? What does that mean for all the steps behind the scenes that I can't see?" your audience may be wondering. Also, studies have shown that diagonal text is about 50% slower to read than horizontal text. So if efficiency of information transfer is one of your goals when communicating with data—which I would argue it should be—aim for horizontal text whenever possible.

In Excel, there is a quick trick to fix this. I get asked how to do this regularly, so figured it warrants a quick "how to" post. I'll focus on the x-axis using the above example—you could follow a similar process to create supercategory axis labels on the y-axis when that makes sense, too. Also, this example uses quarterly data but you could follow the same process with monthly data.

In the Excel spreadsheet, the data graphed above looks like this:

Supercategories in Excel 2.png
 

The current x-axis labels (DATE) are in the column on the left and the y-axis values (SALES) are in the column on the right. Start by highlighting these and creating your graph. It will look similar to the graph at the beginning of this post.

Next, we need to add a couple more columns of data to split out the year and quarter (the process would look very similar if you have monthly data). I typically add two columns to the left of my original data to do this (new columns highlighted in blue):

Supercategories in Excel 3.png
 

Once you've done that, right-click on your graph and go to "Select Data."

Supercategories in Excel 4.png
 

In the menu that comes up, "Category (X) axis labels" (highlighted in blue at right, below) will be pointed at the original DATE column (mine, as you see below, was in Column D):

Supercategories in Excel 5.png
 

Click on this and use your mouse to highlight the range of both of the new columns you've added (my YEAR and QUARTER data was in columns B and C, respectively).

Supercategories in Excel 6.png
 

Hit OK, and your graph will look something like this:

Supercategories in Excel 7.png
 

Voila! A beautifully formatted x-axis. If it would be helpful, you can download the Excel file.

Would it be helpful to see more Excel "how to" posts like this? Let me know if there are specific topics or questions of interest that we can answer here by leaving a comment. Do you have any quick tool tips you'd like to share (Excel or otherwise)? Leave a comment to enlighten us all!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.