Today, we were asked to look at fire incidents in the state of California. However, unlike in previous days, the data wasn’t readily available for us to download – we had to use our web scraping skills to get it ourselves. I’ll be honest – it took me a bit longer than I expected, but I managed to get there in the end, and even built a batch macro that processed the data for all the years available. So overall, I was quite proud of myself.
When it came to analysing and visualising the data in Tableau, one interesting thing I noticed was that while the total number of fire incidents was higher in the southern part of California, the total area burned was bigger in the northern counties. I wanted to explore what could be the reason for that. I assumed it might have something to do with the state’s geography, but could not find any usable data on forest vs urban or desert area by county, so needed to look for other potential explanations.
I came across an article by National Geographic which suggested that a sharp increase in the population and the development of fire-prone areas in California can be seen as another factor contributing to the increase of uncontrollable fires. I downloaded some historical population data for each county and tried to see if there was any correlation between the population growth rate and the percentage increase in fire incidents or acres of land burned. Neither of the scatterplots showed any clear correlation, which meant that the number of fire incidents and their magnitude are not directly related to population growth, at least in the last decade.
Finally, I wanted to have a look at the number of fire stations in each county and whether or not that had any relationship with the number of fires. My analysis showed that, in general, as expected, more acres of land have been damaged by fires in counties with fewer fire stations. Unfortunately, I didn't quite manage to dig any further, but if I had more time, my next step would have been to add some data on total land area for each county to be able to normalise the values and see whether any additional insights would show up then.