As part of DS21's first project for dashboard week, Andy has decided to focus on climate change, and our first day will be focused on looking at the temperature change of water around the Gulf of Mexico. The link provided us with an excel file, KML files, and shape files.
Since this week is focused on getting insights out of the data, my first step was to explore the data and work out what could be interesting to look at. After a few minutes of looking at the data, I came up with the following ideas:
- On a continuous scale, how has the temperature of water changed over time. There is a lot of data here, so I think it would be best to average the temperature on a weekly, or monthly level.
- Over 24 hours, how does the temperature of water change (what time is it highest, and lowest, and why?).
- The data provides multiple sources for data collection, so I think it might be interesting to look at how different data collection sources compared.
- Is there a correlation between depth of the water and temperature?
- Are there any hotspots where the temperature appears to be the highest?
One of the first things I decided to do was look at the data in a little more detail. A boxplot of distributed temperatures for each date measured showed that there were temperatures recorded in celcius of over 200. This made me question the data, and some research uncovered that the highest recorded water temperature for any given day was 56.7 degrees celcius. With this in mind, I added a data source filter to exclude any values above this. Since this value was recorded in Florida, and they are equatorially close, any value close to this could be possible.
Furthermore, the data ranged from 1922 - 2014, however the records were inconsistently collected until 1955, so I decided to focus on these years.
Another inconsistency I found was that some measurements for the depth of the sea (I assume at which the temperature was recorded) were in negative numbers. These were also excluded.
After a bit of data exploration, I found some really nice insights from this data. Firstly, There was quite a lot of variation between the lowest and highest temperature recorded. I decided to highlight this with a view of the minimum and maximum temperature at a yearly level, with a set-action that allows you to drill down to the month level when clicking on a specific year (tutorial here).
Since there is a lot of data, and I found quite a bit of additional information about what causes the water temperature to change, I decided to look at average temperatures on a monthly level too, specifically looking at August and January (the hottest & coldest months respectively). Filtering to these months and putting 'year' on the columns, I could see how the temperature has changed from the start to the end of the time series for only August months.
After this, I decided to see if there was a correlation between the depth at which temperature readings were taken, and the temperature recorded. A scatterplot heavily populated the zero axis, and therefore I decided to take a look at binning the depth values (per 100m). This produced an interesting, right skewed histogram where the lower the depth the measurement was taken, the higher the average temperature appeared to be (this stabilised at around 1100m).
Bringing in New Data
I saw a bit of a trend emerging which showed that temperature collection points around Florida (namely the south west coast) appeared to have the highest temperatures. I brought in some new data, which looked at the actual temperature of Florida to see if there was a correlation. While there appears to be some noise around the trend, there is a slight correlation emerging which shows that as temperature increases, so does the water temperature. Evidently, data points collected in August appear to be the most positively correlated.
Bringing it all together
One thing I struggle on (and was seriously time constrained with today) is design. Therefore I'm not too happy with the outcome of the dashboard, but I am happy with the content!