Hey everybody! Welcome to DS7 Dashboard Week. We’ve heard so much about this joyous time – surely there’s no better way to spend a rainy week in December?
The task for Day 1 was based on the American Housing Survey dataset.
After a quick look around the data in Alteryx I turned to the internet for help – because frankly, what did any of it mean? The cumulative shared efforts of a room of eight people returned two particularly useful tables. George shared with us the Value Labels Package, which allows you to de-code the values in the columns, whilst I found the Variable Crosswalk, which provides a hierarchy of question groupings and allows you to make sense of the field names. I spent this morning working in Alteryx to join these two helpful tables to the broadly unhelpful main table.
One of the first things I spotted was to do with housing occupancy during different months of the year. I was thinking about working with this data to look into seasonality trends. However I was hoping to take a geographical approach to this analysis, and some exploration of the data showed absolutely no useful geographical fields. Not even state! So I went back to the drawing board.
The next thing I caught on to was a field relating to broken windows, which reminded me of a course I did during the second year of my geography degree about the environmental factors which contribute to crime (hugely interesting – not just in terms of the theories that people apply to the subject, but also in terms of the way these theories are framed in relation to broader issues of deprivation, exclusivity, etc). I spent most of the morning focussing on this, and was feeling fairly positive about the whole thing as we broke for lunch… Sadly, when I came back to it I realised within about 15 minutes that I had a problem, and after another 10 minutes counselling from Andy that the problem was major. I decided it was insurmountable given the time frame, and returned to the drawing board with a brand new workflow, and a new focus.
By this point it must have been getting on for 14:00, so I made an effort to simplify things as much as possible. Why did people move house? It still took a while to get working. I was stuck up for some time on the fact that my values lookup table had ‘M’ for ‘Not Reported’ and ‘N’ for ‘Not Applicable’, whereas my values table had ‘-6’ and ‘-9’ – and apparently no way to match which number represented ‘M’ or ‘N’. I looked back through every table I could find on the website in case there was something on there that would help, but there didn’t seem to be. After perhaps half an hour, I gave up and decided that I would bucket them both into an ‘NA’ category. Not ideal, but needs must when you’ve only got a day to get your shit together, and you’ve only got two hours of that day left.
Once the workflow finished running (the one perk being that waiting for Alteryx workflows to run gave me a solid amount of time to crack out the majority of this blog post), I was finally able to get the data into Tableau sometime after 15:30. And then, when I realised it wasn’t quite right (again), and needed changing (again)… I got it back into Tableau at 15:45. Then again at 16:00. And finally around 16:30.
Not ideal. Fortunately, by this point I had the data in a relatively usable format, so I could get some calculations going and practice my donut-making skills. I continue to lament the lack of a geographical field, but you can only work with the data that you have. I even remembered to edit my tooltips! The final product.
Going into tomorrow:
Hope the data is nicer.
Don’t despair when it isn’t.
Simplify as soon as possible.
Believe that it’s possible (it probably is).
I think the hardest thing is knowing how to shape the data. I thought I had it cracked at least twice today, only to begin working in Tableau and realise that I was wrong. Do I have a solution for tomorrow? I do not. But at least I know to expect it as a problem.