Today was DS43’s third day at The Data School - the focus of today was ‘preparing data’. Throughout the day we covered cleaning, reshaping and preparing multiple data sources.
While cleaning data was straightforward enough, reshaping and preparing multiple data sources proved to be much more challenging. As a group, we tackled various ‘Preppin Data’ challenges. These involved altering columns of the data (e.g splitting ‘date’ variables into their corresponding quarters) and pivoting rows into columns or columns into rows to achieve the desired format. Although I’ve worked in positions that required reshaping data before, going through the basics in such an interactive way helped solidify the concepts in my mind and the other data schoolers in DS43 picked up the concepts very quickly.
Throughout training today, we were regularly reminded of the 4 "rules" of data structure:
- One data field for each category or measure
- One data type for each data field
- A single date column where possible (unless there are different date records in the data set - ie order date and fulfilment date)
- One row should be a record containing all the values in each data field possible
Tomorrow, we start using Tableau Prep to see how the theory we covered today is applied in practise.
