tactical tip: embedding a vertical reference line in Excel

Today's post is a step-by-step Excel “how-to” inspired by a reader question we received following a recent post on using dotted lines in data visualizations.

Dave asked:
“Do you know of a trick for drawing vertical lines to delineate years (or actuals/historical vs forecast/future segments of the chart)? I currently have to draw them with the line drawing tool, which gets messy when moving the chart on a PPT slide. If there were a way to embed it in the data or somehow format the chart, that'd be awesome.”

The following chart illustrates what Dave describes. The data is units of output over time where the first nine months of the series are actual data and the remaining four months of the year are a forecast. The dotted line serves as a visual cue to differentiate actual from forecast. Created in Excel, the line was physically drawn on the graph with the Shape Illustrator. While this approach might suffice as a quick method for achieving the desired effect; it isn’t ideal for recurring use of the graph, particularly if the line’s position on the x-axis might change in future iterations.

 
Picture1.png
 

After some research and playing around in Excel, I’ve devised one method for achieving this effect, which I’ll outline in this post (I’m sure there are others!). Don't be dismayed by the number of steps: it's a one-time setup after which can be easily refreshed in future iterations by changing where you want the reference line. I’m using Excel 2016 and you can download the accompanying file.

In my spreadsheet, the data for the Output over time chart looks like this:

 
Picture2.png
 

1. Go to a blank cell range and enter these values as shown in my screenshot below. I’m choosing to add these new values directly underneath my data range in cells F19:G21. This will eventually become the coordinates for a secondary scatterplot that we’ll add in a later step.

 
Picture3.png
 

2. Choose where you want the vertical reference line to cross the x-axis and enter those values below “X”. In this example, I want the line located on the September data point, the ninth point in my data series. In cells G20:G21, I entered “9” in each, as shown below. (Note: for a more automated approach in a larger dataset, a MATCH formula could also calculate where September falls in the range: =MATCH("Sep",$F$7:$F$18,0).

 
Picture4.png
 

3. Add a new data series by right-clicking the graph and choosing Select Data:

 
 

4.  In Select Data Source dialog, click the Add button.

 
 

5. In the Edit Series dialog, enter a name for your data series (I chose “reference”) and select the X values you entered from Step 2. I selected the 9’s in G20:G21. Click OK to exit the dialog boxes.

 
 

The resulting visual looks like this:

 
 

6. Right click the new line and choose Change Series Chart Type.

 
 

7. In the Change Chart Type dialog box, select Combo section under All Charts tab. Then select Scatter with Straight Lines and check the option for Secondary Axis. Click OK to exit.

 
Picture10.png
 

The resulting visual looks like this:

 
 

8. Go to the chart, right click the red reference line and choose Select Data again. In the Select Data Source dialog, highlight reference and click Edit.

 
 

9. In Edit Series dialog, update the X values to be the original values you selected in Step 5. Set the Y values to be 0,1. Click OK to exit.

 
Picture13.png
 

The resulting visual looks like this:

 
 

The remaining steps are visual cleanup: first, I forced the red line to align with the top of the primary y-axis and second, I hid the secondary axis line and text labels.   

10. Right-click on the secondary y-axis and select Format Axis:

 
Picture15.png
 

11. In the Axis options section, type 1 into the textbox beside the Maximum option.

 
 

12. In the Text Options section, under Text Fill, choose No fill. This will remove the text labels on the secondary y-axis.

 
 

13. In the Axis Options section, under Line, choose No line. This will remove the secondary y-axis line.

 
 

Voila! The resulting visual has an embedded vertical line, which is plotted on a hidden secondary y-axis.

 
Picture20.png
 

Recall that this goal of this specific scenario was a dotted line which visually differentiated the actual and forecast sections. My last step was to change the formatting of the line to appear as a thin, grey dashed line.  

(Note: To achieve your preferred formatting, right-click the line and select Format Data Series in the context menu where you’ll find formatting selections in the resulting dialog pane.)

 
 

This method does come with some trade-offs to consider.

One downside is that you lose some control over the exact placement of the line where it crosses the x-axis. Below you’ll see a comparison between the manual vs embedded approach. With the manual approach, the line can be drawn exactly on the tick mark between the Aug & Sep data points, providing a clean alignment with the x-axis. With the embedded approach, the line is centered above the Sep label, resulting in a slightly less seamless effect.

 
 

On the cosmetic side, another downside is losing the flexibility to manipulate the length of the line for labeling purposes. I’ll illustrate this with a horizontal bar chart (which I also created using this method). With the manual approach, I can physically draw the line to extend above the x-axis line, aligning it closely to the “Target” text label. With the embedded approach, the line stays below the x-axis line, creating a gap between the line and the label that describes it.

 
 

You can download the Excel file to see the behind-the-scenes of these graphs. Are there other methods you’re aware of for achieving this effect? Or other considerations with embedding the reference line directly? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

#SWDchallenge: let's plot with a dot!

Last month, I left things pretty open with a makeover challenge, where participants selected less-than-ideal visuals and reworked them, aiming to improve. Many of the makeovers were various types of graphs reimagined as bar charts. I like a good bar chart. Still, this month, I thought we'd both get more specific in the challenge and focus on a graph type that can sometimes be a good alternative to the basic bar: a dot plot.

The term dot plot (or dot chart) can be used for any graph where dots are used to encode data. While this can lead to a ton of variations, there are two primary different types. The first can be used to show the distribution of data (and tends to works best with small datasets, as datasets grow beeswarms or box & whisker become more common); it's similar to a histogram, but the dots aren't necessarily uniformly spaced along the x-axis. The second—which is what I think of first when I say or hear "dot plot"—if you imagine a horizontal bar chart, this version (sometimes referred to as the Cleveland dot plot) instead of plotting bars, plots a dot where the end of each bar would be. In searching through Andy Kirk's Chartmaker Directory, I also found the connected dot plot, which is the second type described above but with two points for each category, typically connected via a line or shaded area. There are likely additional variations as well.

Drawing of 3 dot plots.jpg

I welcome you to try out any type of dot plot for this month's challenge. The following focuses on the Cleveland version (including the variation of the connected dot plot), since I have more familiarity and experience with these.

I should probably caveat by saying that I don't use dot plots frequently—that's one of the reasons I'll keep the prelude to this month's challenge short and sweet. In many cases where I try a dot plot as one potential view, I end up preferring one of the alternatives. That said, there are situations where they can work well. They are often lauded for using less ink than the bar chart and from a design standpoint can look cleaner because of this. A potential benefit with the (Cleveland) dot plot is that you can zoom or start your axis at something other than zero. This is a no-no with bars, where we compare numbers both to each other and relative to the baseline. With the dot plot, we focus mainly on the positions of the dots relative to each other (not relative to the baseline, like bars), making starting the axis at something other than zero ok. As another sometimes benefit, dots may lend themselves better to being plotted within a range in a way that's more intuitive or to show multiple series of data without becoming cluttered. Sometimes we simply want to plot categorical data as something other than a bar for various reasons, and the dot plot can be an alternative here.

Here's a dot plot I created as part of a client makeover (in this case we wanted to focus on the top two time items, where the gap is the biggest, and the bottom two spend items where the gap is smallest):

Dot Plot.png

In terms of potential risks or considerations with a dot plot, from my perspective probably the biggest one is that they are less common, which means your audience may not be familiar with them. This isn't necessarily a reason not to use, but be aware any time you show your audience something they aren't familiar with, it means that you introduce a hurdle—you have to get them to pay attention (either to you or the data) long enough to figure out how to read the graph. You can certainly do things with words and color to help make this intuitive. The advice I often give applies here, too: consider what you want your audience to be able to do with the data you are showing, then assess whether the type of graph you're considering—in this case, a dot plot—will help make that easy.

Tactically, when it comes to creating a dot plot, they often start out as another graph type with some formatting applied. Stephanie Evergreen has a nice quick post describing how to create a dot plot from a scatterplot in Excel. Andy Kriebel shows how he created a ranked dot plot using a Gantt chart in Tableau in this tutorial. If you have additional tips or tricks related to dot plots that others can benefit from, please leave a comment on this post to share.

For more on dot plots and additional examples, check out Naomi Robbins' articlethis makeover by Stephen Few, or the SWD post novel vs. the boring old bar chart. Again, if you're aware of other resources or examples, please leave a comment on this post.

the challenge

My challenge to you: find some data of interest that will lend itself well and create a dot plot. DEADLINE: Tuesday, August 7th by midnight PST. Full submission details follow (be sure to email it to us, taking note of specifics below, for inclusion in recap post!). You're also welcome to share at any point on social media using #SWDchallenge.

SUBMISSION INSTRUCTIONS:

  • Make it. Identify your data and create your visual with the tool of your choice. If you need help finding data, check out this list of publicly available data sources. You're also welcome to use a real work example if you'd like, just please don't share anything confidential.
  • Share it. Email your entry to SWDchallenge@storytellingwithdata.com by the deadline. Attach your image as a .PNG. Put any commentary you’d like included in my follow up post in the body of the email (e.g. what tool you used, any notes on your methods or thought process you’d like to share); if there’s a social media profile or blog/site you’d like mentioned, please embed the links directly in your commentary (e.g. Blog | Twitter). If you’re going to write more than a paragraph or so, I encourage you to post it externally and provide a link or summary for inclusion. Feel free to also share on social media at any point using #SWDchallenge.
  • The fine print. We reserve the right to post and potentially reuse examples shared.

I look forward to seeing what you come up with! Stay tuned for the recap post in the second half of August. Also check out the #SWDchallenge page for past challenge details and recaps.

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

visualizing uncertainty

We often have some measure of uncertainty in our data—a forecast, prediction or range of possible values. A common challenge is how to visualize that uncertainty and help our audience understand the implications. In today’s post, I’ll use a real-world example to illustrate one approach and share tactics for creating in Excel.

The client’s original visual looked similar to the one below. It shows 2017 earnings per share (EPS) and the forecast outlook for the next four years. The client used a CAGR to forecast a range of possible EPS values from 2018 - 2021.  

 
Picture1.png
 

At first glance, it wasn’t obvious that the blue bars represented a forecast (even with the x-axis labeling of “E” for expected). The first yellow bar represents the 2017 actual EPS and next four blue bars are the forecast for 2018 - 2021 where the solid section represents the midpoint and the data labels is the uncertain piece—the range of projected values.  

I made a few design changes to make the graph a little easier to interpret. I first changed the bars to lines and used a dotted line for 2018 - 2021 with unfilled data markers to help visually reinforce the uncertainty.

 
Picture4.png
 

In Excel, there are two potential ways to achieve this formatting. A brute-force approach is to use a single data series and format each individual data point as a dotted line. Another approach is to graph two separate data series, one as a solid point or line and the second as a dashed line or unfilled circle, with a point of overlap to make the lines connect. You can read more detail about these two approaches in this prior post.

We often face the decision of preserving the y-axis vs. labeling data directly. I’ve done the latter in the visual below. One consideration in this decision point is the level of specificity your audience needs: are the actual values important? Or is the overall shape of the data more important? You can read more about these considerations in this prior post.

 
 

Next, let’s revisit how to show the range of forecast values. The original visual is shown again below where the forecast EPS values are represented by the data labels on top of the bars.

 
 

Rather than leave the audience with the highly taxing processing of reading these values, we can aid interpretation by instead depicting the forecast as a shaded range around the point estimate. This keeps the emphasis on the midpoints, while reducing clutter and eliminating the additional work the audience has to do. If the specific forecast values are important to the audience, we’ll deal with that momentarily.

 
 

The brute-force Excel method to adding this grey band requires a little math, graphing a second data series as a stacked bar and then formatting the stacked bar so that the bottom section renders white and the top section grey. You can download the accompanying Excel file to see how I accomplished this.

 
Picture6.png
 

But the visual is not yet complete. We should take the opportunity to add value to this data by telling the intended audience what they should know. Let’s assume this is a positive story where the outlook from the original base year (2016) has been extended to 2018. I might add explanatory text, paired with strategic use of color (I chose green to depict positivity) to focus attention on the relevant points of the data. If specific forecast EPS values are important for a given year, I could include them for context in the text. For a very technical audience, I might include even more detail with the statistics around the forecast. Just a reminder to always design with the audience’s needs in mind!

 
Picture5.png
 

 

Are you aware of other methods to achieve this effect? Have you seen other examples of uncertainty depicted effectively or tips you’d like to share? Leave a comment with your thoughts!


Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.

#SWDchallenge: your choice makeover

Less-than-ideal graphs are prominent in the world around us. The July challenge should be a fun one: find one of these and work to make it better. Improving upon an existing graph can be a great way to build your skills and reflecting on this process and sharing is a great way to also bring benefit to others.

Read More
/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2018 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.