#SWDchallenge: artisanal data

This month's #SWDchallenge comes from guest author Mike Cisneros. Mike is an active participant the online data visualization community, a member of the Data Visualization Society, and is in his second year as part of the Tableau Zen Master program. Recently, we engaged in a discussion about where the chart designer's responsibilities begin and end, in terms of validating the data being graphically depicted. It led down some interesting paths about ethics, truth, and ownership, which in turn motivated him to submit this challenge to the SWD community. To see some of Mike's more creative work, check out his Tableau Public profile or his Instagram feed, or connect with him on LinkedIn or Twitter.  Thanks, Mike, for this galvanizing challenge!

If you work in data visualization, then it’s a safe bet that most of the data you are asked to visualize comes from other people. That is to say, you are not the owner of that data—it belongs to your customer, or comes from a vendor, or it’s open source. 

And because of this, we’ve all probably had the experience of looking at a new dataset for the first time, and scratching our heads as we try to make sense of what we’re seeing. We come to expect that. It’s normal to expect a bit of idiosyncrasy and messiness from any data.

We’ve also learned to take certain datasets with a grain of salt, based on our first impressions with them. When you open up a file to find what was clearly manually-recorded data—survey data, free-form text data, things of that nature—your experience and your instinct tells you that you’re not getting perfect information, and you likely treat it accordingly. Which is to say, you are careful with the conclusions you draw and the analyses you perform, because you know that you might be missing critical context or observations. Those missing pieces could be essential to getting at the truth of the situation.

Sometimes you find yourself working on more objective-looking data—maybe it’s auto-generated (like log files), or electronic records of transactions (purchases, transactions, other financial data), or maybe some kind of sensor-recorded data (from weather stations other environmental monitors). 

Even though it looks objective, it’s still not perfect information; there’s always the chance for recording error, for anomalies in the systems, or other confounding factors. Our mental alarm bells don’t always ring quite as loudly and insistently with this data, but that’s why it behooves us to listen closely, just in case.

And even if we get an absolutely PERFECT dataset—clean, complete, validated, the works—we STILL can’t be 100% assured in our analyses. Because we don’t know what choices the dataset’s owner made before we saw that data. For example:

  • We can see what measures are IN the data, but we don’t know what was chosen to be excluded, or why.

  • We don’t know why the sensors began recording when they did, and why they stopped when they stopped.

  • We don’t know who made any of these editorial decisions surrounding that data.

  • We don’t know why the data was being collected in the first place.

When, then, can you, as the data visualizer, be perfectly confident in EVERY SINGLE FACET of a dataset?

Only when this is true: when YOU are the collector.

If you collected the dataset, and you made those choices, and you cleaned that data, and you know every reason behind every decision, then you are PERFECTLY positioned to analyze THAT data with full confidence. The analyses you perform, the conclusions that you draw—you’ll know just how far you can take them, and where you begin to overstep the bounds of what analyses the data supports. You will know the context in which your analyses can safely and justifiably be performed.

So, for this month’s #SWDchallenge, we’re not focusing on a specific chart type, but rather on a data type: bespoke data. A dataset that you collected yourself—electronically, manually, via surveys, by observation…whatever you like, as long as it is yours and yours alone.

Now, it doesn’t have to be about you, although it certainly can be. That’s the choice I made for my own entry this month. I decided to look back at my own history of purchasing books on Amazon, and to see how my purchasing habits changed once I bought my first Kindle. I expected to see my “books purchased” go down in roughly the same volume as my “ebooks purchased” number increased.

This was not what I found.

my-reading-was-ignited.png

First of all, I made an invalid assumption: I assumed my ebook purchases would begin after my first Kindle purchase, which was in 2012. In fact, I started buying ebooks in 2011, when I first downloaded the Kindle app to my tablet.

And second of all, my purchases of physical books didn’t decrease in line with the increase in ebook purchases. In fact, while my Kindle book purchases went up WAY past any prior annual purchase of physical books, my paper-and-glue book purchasing stayed the same, and even increased in some measure.

Because I knew this data so well, I could find some other interesting tidbits.

  • I found that I bought three Edward Tufte books back in 2005, so I can pinpoint that year as the time I first thought of data visualization as an actual, distinct discipline that I could study and focus on as a career.

  • I found that 2 of the 3 books I bought in 2007 were the same book, on the same day. (Can you imagine what book it was, and why?)

  • And, I found that since 2015, there’s been a bright line dividing the different genres of book I choose to buy in physical copies and which I buy electronically.

What will you create, and what conclusions will you be able to state with confidence, when you’ve collected your very own, hand-crafted, artisanal, dataset? Let’s find out.

the challenge

Go out and collect a dataset of your own, analyze it and create a graph visualizing your findings. (Remember: sometimes the smallest, most specific stories can tell the most universal truths.)

DEADLINE: Friday, May 10th by midnight PST

NEW SUBMISSION INSTRUCTIONS: upload your visual and related commentary via at storytellingwithdata.com/SWDchallengeSUBMIT. Feel free to also share on social media at any point using #SWDchallenge. For inclusion in the summary recap post, submissions must be officially submitted to us (still a time-intensive process and we aren’t able to scrape Twitter and other social media sites).

We look forward to seeing what data you collect and visualize! Stay tuned for the recap post later this month, where we’ll share back with you all of the visuals created and shared via form as part of this challenge. Until then, check out the #SWDchallenge page for past challenge details and recaps.