"Dashboard week"day 4: a "Wasted Food" dataset experience.

by Tommaso Ferri

I NEVER waste food...Toughts on today's project.

Andy today sent us a dataset on wasted food all around the world, an interesting problem to analyze. The dataset is hosted by OECD (Click HERE if you want to download it) and it is pretty big. I felt confident, the dataset seemed pretty clean and complete...BUT....once I opened the data with Alteryx...i realized that i was wrong.

Alteryx flow with Snoflake connection 

Out of all countries from the dataset, only a few had consistent entries of coherent values over a period of time long enough to make a proper analysis.

In the dataset there were:

  1. 24 different measuring unit used
  2. 4 main categories
  3. 510 different categorization of the waste

After a brief analysis i have noticed that over the 70% of the results were expressed with tonnes and kilos per capita BUT only a few countries were reporting the values consistently over the years.
Two of the categories were present only in the 20% of the data, only in the Japanese and Korean records to be precise. The different categorization of the waste were a total mess, there were doubles, "Total" numbers that were not totals, and a lot of different single value categories.

I have decided to use the most present variables in the data set and make this viz

The viz that i have made 

I have imagined to work for a company and I have learned and  3 things:

  1. Always explore the data first and NEVER trust the consistency of the data.
  2. It might be useful to consider a simpler viz.
  3. Never settle with your initial viz idea if you do not know the data enough, especially for super short projects like this.
  4. Even if the data is BAD you can find something good to use.
  5. I have to complain less about the datasets : (see point 4)
  6. The points were actually 3, but learning 6 things is better than 3.

As usually i have uploaded my data on Snowflake, Cloud Data Bonanza!

See you tomorrow with the last episode of the