On the Thursday of Dashboard Week we were tasked with producing a piece of work about cheese. Andy supplied us with an API, which we all gave up on using within about 15 minutes, and a link to cheese.com, with instructions to scrape their cheese data. I’ve never scraped data before, and had absolutely no clue where to even start, so I ended up heavily relying on help from fellow DSers (although most of them had barely more experience).
Anyway once that ball was rolling, I very quickly narrowed the data down to Australian cheeses. Australia’s on my mind since I’m flying away to Perth on January 1sts to (hopefully) compete in the World Championships of Ultimate (injured hamstring allowing).
Here’s my theory on narrowing data sets early in this sort of situation – just do it. Every day this week there’s been a division between people who have narrowed the data as soon as possible, and those who tried to get everything so that they could decide what was interesting further down the line, once they’d had a chance to look at all the data. But with datasets as big as some of the ones that we’re being handed, we can’t always look at all the data. It just isn’t possible given the time-frame we’ve been allowed. And even if you had unlimited time on a project, making sense of 5000 rows of data isn’t really a functionality that I’d expect most people to have.
So here’s what I think: If you have a limited data set, you’ll find an angle. On the one hand, the volume of data is small enough that you can actually investigate it in a meaningful way, and start to see any nuances or patterns. This way if there is a great story, it’ll pop out at you much more easily than if those 100 rows were buried in another 4900. And even if there isn’t a great story that’s obvious – so what? Everybody wants something great. But I’d say that people who have kept as much data as possible this week, to explore everything in pursuit of that one golden nugget of insight, have had no better luck producing fascinating things than people who have filtered the data down quickly.
Now I’ll admit – I got lucky today. I honed in on Australia, made about two exploratory charts, found an angle and hard committed to it. But even if I hadn’t found anything super fun to visualize, I would’ve found something, and it would’ve got done.
Here’s what I came up with today. A look at the citrusy flavoured cheeses of the Yarra Valley Dairy near Melbourne, Australia.