Dashboard Week Day 1: Water stations around the world

by Andy Kriebel

What better way is there to welcome DS17 back from the holidays than Dashboard Week? Over the last few cohorts, I’ve gone from extremely difficult (especially with data prep), to fairly simple data sets that help them focus more on the aesthetics and analysis. This week, I’m going to aim for simple data prep, with the focus primarily on analysis and design, with analysis being the most important. I want them finding insights in the data and learning to explore the “why” in the data.

If you’re unfamiliar with Dashboard Week, the basic idea is that the team is given a data source, they prep the data, they do analysis, create a visualization, and write a blog post. Some of them haven’t been blogging much, so this is a good way for them to get into the groove writing.

Here are the rules that are common across each day:

  1. They MUST work independently.
  2. Everything MUST be done by 5pm (so that I can write a recap blog post).
  3. They MUST leave their laptops at work in the evening.
  4. The next morning, they present back the previous day’s work.
  5. No complaining!
  6. Pay attention; requirements may change day-to-day (because sometimes I don’t do enough research about the feasibility of the data).

For day 1, the team will be using water quality data from GEMStat. Specifically, they will be looking at the transparency, pigment, temperature, and dissolved solids in water stations all around the world.

All five continents have data, so the team will choose amongst the five continents, with no more than two people using data from the same continent. They need to get all of the historical data from all stations within that continent and an analyze the water quality. I’m expecting excellent analysis with design being secondary to insights.

The download is limited to 500 stations. I don’t know if that is per IP address (our office would all use the same IP), so hopefully they look at the data on the website before they just start downloading everything. Planning will be key. Downloading all of the data without a plan will be a poor strategy.

As an analyst, I start investigating the data by asking: when, where, what, how, then why. I’ve found this to be a super effective process, but they are welcome to use whatever technique works for them.

It will be interesting to see what they can find from an analytical standpoint.