At the beginning of the day, it appeared that the data prep would be easy. That turned out to be seriously wrong…
Today’s challenge was working with data from the American Time Use Survey by the U.S. Bureau of Labor Statistics.
Working with survey data is usually always a bit tricky since everything in the data means something else. However, prepping survey data can be relatively painless if one has a good data dictionary and variable codex. Thankfully we got those but to our horror, they only had PDF files!
This meant that we had to extract the tables from the PDF into a format we could use, like an excel file. We tried using Tableau and Alteryx to parse the PDF but got very insufficient results.
Thankfully we came across an online service that gave satisfactory results: https://www.pdftoexcel.com/
Unfortunately, there was still some manual excel work involved to get the variable codex to function properly.
After getting the codex to work it was time to decipher and reshape the data in Alteryx
Just getting the data prep to work properly took most of the day but I’ve learned a lot about prepping survey data effectively!
Around the end of the day, it was time to actually make a dashboard. After a lot of laughs exploring many of the weird and wonderful things in this dataset I finally decided to focus on the time spent reading.
Why reading you might ask? Well that is simply because I like to read and thought it would be interesting to see how I compare to the average American.
How do you compare?
Check it out here: