interesting data is probably wrong

Dec 8

There’s a particular kind of excitement that comes from spotting something unusual in a dataset. A sharp spike, an unexpected dip, or a strange gap appears, and your brain starts reaching for explanations. Maybe it’s the result of a behavioral shift, or a deliberate strategy, or an operational quirk that reveals something important. Maybe it’s the thing that makes the whole story come together. But more often…it isn’t.

Tony Twyman made his name as a pioneer in the field of audience research for television and radio in the UK. For our discussion today, though, he’s best remembered for a single, enduring quotation, which is now known as Twyman’s Law:

“Any figure that looks interesting or different is usually wrong.”

It’s not meant to discourage curiosity or keep you from finding insights. It’s meant to remind you that the most exciting observations are often the ones most likely to be errors—either in the data, or in the way you’re interpreting it.

When I first heard of Twyman’s Law, it immediately reminded me of a personal, but very public, project I undertook back in my pre-SWD days. My experience of working on it stuck with me, as public mistakes often do, and I’m sharing it here simply to show how easy it is to get drawn in by the appeal of a dramatic discovery.

The strange time gap in a public figure’s Tweets

In 2017, I was exploring the Twitter behavior of a well-known public figure, a man in his late 60s. (You could probably guess who it was, but that’s not important to this story.) He was frequently in the news, and over the last couple of years had grown more and more active online. I wasn’t looking at the content of the tweets. I was more interested in the timing. What hours of the day did he tend to post? Were there obvious rhythms or gaps in his online behavior?

I pulled the dataset into a radial graph and almost immediately noticed a pattern that seemed too clean to ignore.

There were no tweets—none at all—between noon and 1 PM. Eastern Time. This gap showed up across multiple months of data, and because it was so consistent, it looked meaningful. I started trying to make sense of it. Was that when he typically stepped away from his phone? Did he have a daily briefing or meeting? Had I found some hidden quirk in his routine?

I almost shared it. But something about it felt a little too perfect.

So I went back to the raw data and took a closer look at the timestamps. That’s when I found the first issue. The tweets that were supposed to be time-stamped between 12 and 1 PM had been incorrectly encoded as happening between 12 and 1 AM. Somewhere along the way, either the AM/PM flag had been dropped, or the data had been pulled in a 12-hour format and misread. The result was a tidy, hour-long void in the middle of the day that never actually existed.

After correcting the AM/PM problem, I re-engaged with the data and iterated a bit on the specific design of my radial chart, using what I now believed to be accurate timestamps:

However, I wasn’t quite finished being carelessly wrong about things yet.

At the time, I was living in the Washington D.C. area. While the U.S. spans multiple time zones, nearly half the population runs on Eastern Time. My work, my family, and pretty much everything around me operated on that same clock, so I tended to treat Eastern as the default. With that mindset, I assumed the timestamps in the dataset were either recorded in the public figure’s local time zone (Eastern), or in what I considered the “default” time zone—also Eastern.

Unfortunately for me, this was not the case.

The timestamps in the dataset were in Coordinated Universal Time (UTC), not local time. Which meant that everything I was now confidently interpreting—early morning activity, late-night bursts, midday gaps—was five hours off. (Which explained why my analysis suggested a nearly 70-year-old man was posting on The Socials until 4 AM daily.)

Once the tweets were correctly placed on the radial diagram, the pattern looked much more reasonable:

What initially felt like a meaningful visual insight was actually the result of two separate issues: a formatting error and a faulty assumption. Neither was especially complicated, and both could have been avoided if I had slowed down and asked a few more questions early on. By the time I corrected everything and was finally working with an accurate version of the data, the story I had imagined was no longer there. What remained was the real story…unfortunately, one far less exciting than what I thought I had discovered initially.

Lessons (re)learned

If this had been a personal experiment that I never shared, it might have been nothing more than an annoying detour. But if this had been a client project, or a published post, or part of a public dashboard, I would have been in trouble. Not because the chart was unattractive or unclear, but because it would have told a false story.

There are a few basic principles I try to follow in situations like this, though clearly I don’t always follow them perfectly.

First, don’t rush to interpret the surprising thing.
When something jumps out of the data and feels unusually clean, it’s more likely to be a sign of a problem than a breakthrough. The more exciting it is, the more carefully you need to verify it. That doesn’t mean it can’t be true. But it does mean you should treat it as suspicious until you’ve ruled out the obvious errors. If the choice is between “major behavioral insight no one has ever noticed before” and “simple encoding bug,” assume the bug until proven otherwise.

Second, double-check the details you think you understand.
In this case, it wasn’t that the data was messy. It was that I thought I knew what the timestamps meant. I thought I understood the format. I thought I knew the time zone. And because of those unchecked assumptions, I built a clean-looking, wrong interpretation on top of a misaligned foundation.

Third, acknowledge the role your ego plays in all of this.
I liked what I had found. I was proud of the chart. I wanted it to be real. That made me slower to question it, and quicker to explain it. Fortunately the stakes weren’t very high for me, in that instance…but if they had been, things could have played out much worse.

So slow down. Stay calm. Take the time to validate your assumptions, and be honest with yourself about the limits of what your data can actually tell you. Because once you publish or share something, it doesn’t belong to you anymore—it becomes part of how people understand the world. That’s a real responsibility; and your own credibility, once damaged through carelessness, is difficult to rebuild.

Twyman’s Law isn’t about distrusting your data or being a kneejerk naysayer. It’s about remembering that excitement and accuracy don’t always arrive together. When something looks especially compelling, it’s worth taking a step back to make sure it’s real.

In the end, the most valuable stories we find in data aren’t the ones that surprise us the most. They’re the ones we’ve taken the time to get right.

Sign up for our newsletter
View our privacy policy

SEARCH STORYTELLING WITH DATA:

Mike Cisneros

interesting data is probably wrong

The strange time gap in a public figure’s Tweets

Lessons (re)learned

at year’s end, a fermata

mind the gap: how to represent partial data