improving upon "good enough"

When perusing my Twitter feed recently, I came across a link to a blog post by CoreLogic on the topic of millennial mortgage credit (here). The short article describes the results of an analysis of a few key mortgage variables (credit score, debt to income ratio, and loan to value ratio) by applicant age. Here's an excerpt:

I like the article, it leads the reader through the analysis and pulls in context to make it all make sense. The graphs, however... They are ok. They get the job done. They are probably "good enough." But they could be so much better. If you've taken the time to do a robust analysis, why not take the time to make your data visualizations reflect that? So much of the analytical process happens behind the scenes—gathering data, cleaning data, and analyzing data—the graphs are what your audience actually sees out of all your hard work. They deserve at least as much time and attention as the other parts of the analytical process.

Let's take a closer look at the graph and portion of the story excerpted above that focuses on Loan to Value (LTV) ratio. When I look at the graph, here are the specific things that I notice and would change: 

  • The y-axis doesn't start at zero. This is a no-no for a bar chart (more on why here). We need the context of the full bars in order to make it an accurate visual comparison. Start the axis at zero.
  • The second series doesn't add much. Unless we have a specific need we can articulate for both data series (single applicant and joint application), I'm inclined to reduce the data shown to a single series. For example, we can graph just the single applicant series and then note via text that the same observations hold true for cases where there are co-borrowers.
  • Color can be used more strategically. With the two data series currently shown, color is used to distinguish one from the other. If we remove one of the data series, we no longer need to use color in this way and can instead use it to draw attention to the focus of the article: Millennials.
  • The category descriptions are far away. If you look at the full article, it begins with a table that defines the birth years and ages of the various generations (Millennial, Gen X, etc.). We can eliminate the need for this table and reduce any back and forth by simply embedding some of that info with the category names directly in the graph.
  • A good portion of the text simply describes the data. By labeling the data directly, we eliminate the need for this and can be more concise with the text, using it to focus on the context and story. 

Here's what it looks like when I make these changes:

Note that in this case, I preserved the y-axis labels to make it clear that the axis starts at zero (but pushed it to the background by making it grey). Given that I've also labeled the data directly in the bars with data labels, I could get rid of this axis altogether. 

Yes, the original visual was perhaps "good enough." But isn't this better?

If interested, you can download the Excel file with the above makeover here.