Today we had to look at some OECD Food Waste data, the issue with the main dataset is that each OECD country will often report food waste statistics in different units, and most countries only have one entry for one year. This makes comparison across years, and between countries extremely difficult.
When I downloaded the CSV from https://www.oecd-ilibrary.org/ - the data looked pretty straight forward in terms of cleaning, so I just removed some unwanted fields and uploaded the table to Snowflake.
When I connected to the data in tableau, only then did I realise that a lot of manual grouping would be needed: for the US alone, food groups were split across several different variables.
Whereas for some european countries, data would be recorded in Tonnes and have differently named food sub-groups
I managed to get some consistent values within each country, but not across them, so I decided to focus on the US. I learnt a lot about sets and groups in this project as I needed to create quite a few to filter my data correctly and to combat the multitude of variable names.
I also learnt how to be flexible, so instead of doing a slope chart between e.g. 1995 and 2010, as each country had a different range of years, I just took the min and max years for each and drew a line between them.
This project was definitely less data prep heavy than the others we had this week so I’m glad I got to build something complex in Tableau and learn how to deal with data that may have more gaps than I’m used to.