Dashboard Week Day 4. A slightly different task today. For today’s dashboard, we had to pair up with another member of the Information Lab (outside of our own cohort). Here is Andy’s instructions;
I want them to work with people that have more experience so that they can see how they:
- Approach a project
- Decide what to focus on
- Prepare the data (and only the data that is necessary)
- Think analytically
- Design their work
It worked perfectly. I was paired up with Robin Kennedy and he consistently blew my mind throughout the day.
The Data : California Wildfires
California is under severe threat from wildfire since the turn of the century. This dataset compiles location (lat and long), county, acres burned, number of fires etc.
The one caveat. The pretty little download button next to the dataset was BANNED by Andy. We had to use web-scraping or an API. Thank god I’ve got help.
Initially we tried to web-scrape. However, the important data is under a layer of the website. For example;
All the important data is under the blue highlighted area. However, we can’t click into it and extract because we need the top layer as well. You can use ParseHub, but without a premium account it will take hours and you can only get 200 records max. We had to use the API.
Downloading and Parsing the API
So we knew we had to go down the API route now. Here’s my workflow;
1) Download and Json Parse.
2) Text to Columns to split the record ID (number identifier) from each column name. That way we can use this record ID to group and eventually pivot the data.
3) Transpose to pivot the column names.
4) Text to columns to split the counties up. Eg Fires over multiple counties would register as Sacramento, Santa Clara. This would not be able to be plotted in Tableau. We decided to take the first county name as that’s where it started.
5) Sort and Auto-Field
Robin had a great idea for analysis. Take the introduction from Andy’s blog post and test it as if it’s a hypothesis. The bold sections are all areas for analysis. Here’s the intro;
While wildfires are a natural part of California’s landscape, the fire season in California and across the West is starting earlier and ending later each year.
Climate change is considered a key driver of this trend. Warmer spring and summer temperatures, reduced snowpack, and earlier spring snowmelt create longer and more intense dry seasons that increase moisture stress on vegetation and make forests more susceptible to severe wildfire.
The length of fire season is estimated to have increased by 75 days across the Sierras and seems to correspond with an increase in the extent of forest fires across the state. ~ Cal Fire
We didn’t have enough time to cover everything and in some cases the data just wouldn’t tell us what we wanted it to. We covered two areas.
How many fires? Set the scene.
Are the fires starting earlier and ending later each year?
We tried a heat map and it worked but we wanted to go further. But using this graph we can plot day of fire start, acres burned and use transparency show density. Robin was great in helping me create this.
So we had to make a calculation for day of year. That way we could plot each fire on the the day and create this density around points.
Then by bringing on month (discrete) we can see the lines which will support our years.
And that’s it! Day 4 done. Added some text to enrich the analysis.