Dashboard Week day 2. Yesterday went better than expected but today really brought me back down to earth. Data prep consumed my morning, and a struggle to get my head around the data really kept coming back to bite me. My walkthrough follows…
The Data: A vast dataset containing worldwide deforestation figures by country, region and biome. They offered around 25/30 indicators but I felt their was considerable overlap. In addition, seeing this never fills you with confidence that this dataset is ready to go. (see below).
It wasn’t so much the processes that got me, but about how the data was arranged. Some countries had 6 or 7 readings per year, whilst others only had 1. Were they duplicates? Or do some countries take multiple readings for their multiple rainforests? I’d revisit it later. For now, it was about pivoting this data so I can use it to plot line graphs and time series in Tableau. Here is my workflow;
The same processes for all three of those awfully laid out metrics. A transpose to move them into one column, grouped by their name. Secondly, rename that column to year. Lastly, a data cleansing tool to remove letters and punctuation so that Year field can act as a year in tableau. This seemed like the best approach considering a normal pivot in Tableau and Alteryx didn’t really solve the problem and gave a lot of nulls.
It’s now about 11:30am and I haven’t made a graph. Oh no. The race is on. A huge dataset required considerable analysis still. I looked at everything. I was drawn to African rainforests until I realised half-way through I was on sum(my values), instead of average, and the results were greatly exaggerated. I looked at the Amazon but I I’d already heard of a few people looking there.
I now just thought let’s get something on the paper and we can go from there. My plan-z seemed to work pretty well (although I might just of got lucky). I made a few classic KPIs and split them over Biome.
You see that co-ordinated spike between the Tropical and Sub-Tropical Biomes around ¾ the way through? Yeah, me too. I wondered what that was and it seemed like a fairly big spike so I could guarantee I could get some good context figures out of it. I grouped those two biomes and did the underlying math. 36 million hectares in 5 years, 4.3% of their rainforest cover!
However, I knew these rainforests couldn’t characteristically be the same. It was easy to group in this way to show the shared impact but I needed some deeper reasoning.
It turns out they are different but interestingly the way humans treat them (cut them down) is different too. I did some year 9 geography research on the few definitions and came across the deforestation site. Is it outer-edge or inner-edge deforestation. I thought Inner would be worse but turns out not. Outer-edge tends to fragment the rainforest, further threatening the ecosystems. As a result of this analysis, I could see Tropical and Sub-tropical could not be more different in that respect.
So this led me down another route. If sub-tropical rainforests are harvested in a much more threatening way then tropical rainforests, why isn’t sub-tropical much worse? It turns out the regeneration rate of sub-tropical rainforests are over 3 times higher.
Lastly, I think we need a tiny bit more. It seemed like a natural evolution to now wonder how (as in what method) is used to deforest. Annoyingly, I couldn’t get felling data but I could get burned forestry. Annoyingly as well, only for sub-tropical rainforests. Nevertheless, it showed a considerable decline and could leave the dashboard on a positive note.
So that’s it. Day 2 done. Here’s the final product.