After a months-long break from academic learning, I can safely say it is nice to be back in an environment where you are pushed to learn and grow.
This week we focused on data cleaning and preparation, something I have done many times in the past but never thought too deeply about. This meant I had to fight the urge to follow my gut and play around with the software just to see what works. Instead, I had to pause, think and plan every step I take. I imagine many people start in the same situation as me. Maybe like me, others have had to analyse complex datasets at university, where the focus is on the research quality rather than proper data preparation.
I realised I was pushing data prep to the side when we started discussing aggregation, a core data concept. Of course, I've come across this word and may have even aggregated once or twice myself. But did I know what I was actually doing and what was going on behind the scenes? No. So I struggled to properly understand the steps I needed to take instead of going in brute-force and adjusting course as I go.
So what really is aggregation? It is a way of clustering certain measures based on a shared attribute. It's actually quite simple once you think it through a few times. You can think about it in visual terms or conceptual terms. I find it useful to think about the question it asks and the answer it gives. For example, with a simple data set, as shown below, you might want to ask what is the sum of measures which belong to group B. To answer that, you would aggregate the Measure column by Group and see the answer is 7!

You may not want to know the sum of Measures but rather the count. This can easily be done in most tools, but the concept remains the same. Thinking in these very simple visual and conceptual terms helps to develop an idea of how aggregation might change your larger and more complex data.
I've really enjoyed pausing and allowing myself the time to actually internalise this core concept rather than gliding past it to get to more complex things. There is no shame in taking your time to understand something that others might find simple. Ultimately, that is the only way to learn.
