So dashboard week begins. One of the most anticipated weeks in the Data School but weirdly enough, I’m looking forward to it. The projects allow you some personal time to get creative and hone the skills that we’ve started to develop. We seem to of escaped the demands of data prep (for now) as the focus this week is on providing insight. The theme is the environment.
The data: Gulf of Mexico Temperature Data
This dataset compiles temperature and water depth data around the Gulf of Mexico. It includes data on the organisation which conducted the results, the depth of water around, temperature and locational data.
In these scenarios, my first step is to view the variables and see if any are unnecessary and rename the others to better my understanding at a glance. For example, Characteristic, DepthHeightUnit and Measure Unit are all one reading, under multiple aliases. I can hide all these.
Secondly, I like to rename my variables. It takes 2 minutes, but it makes the world of difference throughout the project. DO IT.
This is where I start my analysis. I can’t say there is any structured plan, but I use line graphs and bar charts to see trends, and as I learnt from the Iron Viz Final commentary, using scatter plots to test for association between variables is also very useful.
I began with these line charts. On the face of it, there is some insight. The top chart trends upwards quite quickly, and the opposite for the bottom chart. But, under further analysis count of records increases as many more tests are done today then 30 years ago, and avg height(depth), I’m not sure what’s going on there. I tried to find out for 20 mins but left it - time can be better spent elsewhere.
And for average temperature, is that really the best indicator of natural disaster susceptibility? Natural disasters, as you see in the Gulf of Mexico, are as a result of high temperature waters evaporating and becoming fuel for more powerful storms to develop (or at least that’s how I remember it). So maybe, instead of avg(temperature), Max(temperature) might be a better indicator. The hotter these waters get, the more likely they are to lead to disasters.
Now here’s a graph I can use.
So that first graph will set the story for the rest of the dashboard to follow. And by the way, I’m doing a longform dashboard because I think they aid story better (and I haven’t done one in ages).
So we have our setting, now I think some locational data could help. I want to show how locations can change over time. However, there’s 8 million marks when I place this data on a map. In addition, I need a way to clearly show temperature increases. When I plot marks for the years, theres too many to tell and hard to see temperature increase.
I’m going to use a benchmark. Filter upon a reading of 33.5c or higher. The fact there’s 2 in 1950-1970 but 4,000 in 2010 should say all I need.
Now here is where I’ll try to add some additional insight. Is there any areas which have disproportionately got worse? Looks like it. The south of Florida has 25% of all the readings. Let’s check it out.
The Florida keys. Here I am thinking I’ve done it. The insight I needed. Florida Keys appearing to have got disproportionately worse over the years. Just throw in another graph to show this and we are in the clear. WRONG. This is the mistake I made. I presumed this data was how it looked and with 1 hour to go I realised not. Nearly every point measured here came in the 2009-2010 Atlantic storm season.
The Florida keys has a clear upward trend, but that’s not to say it is any faster than the average.
With some super quick formatting and a super quick blog post we are done. Mistakes were made but lessons were learnt. I had to concede that Florida keys was not a place of interest by almost dispelling the insight with the final bar chart. Oh well.