So this week we begin Andy’s favourite part of the dataschool, dashboard week. Where he gets to see us sweat (Andy’s favourite past time) by pointing us in the direction of some data, get us to collet said data, pull it through Alteryx and in the few minutes we have left, create a viz and write a blog post about it all… Sounds fun right?
Today’s mission revolved around Prudential’s RideLondon event; on their web page is a list (or actually multiple lists spread over multiple pages) of all those who competed in the events, specifically the 100 mile ride on the Sunday.
We had to download the data and clean it up first, which we wisely did on just the first 25 records (of 27,000+). This involved a rather ugly and not particularly optimised Alteryx workflow, but when you are against the clock, as long as it works that’s all that matters!
My preferred way of parsing out web-scraped data is to separate all the would-be columns (key data such as distance and rider name etc) into separate streams and then joining them all back together again at the end, it just feels like I have more control and can edit things easier. Obviously, this leads to the workflow repeating over and over again, and wouldn’t be suitable if you planned to have hundreds (or even tens) of columns. But as I said, it did for now.
The only thing I forgot to take into account as I was building this workflow, is that we still had thousands more rows of data incoming, some of which did not line up perfectly with the conditions I had set out for the 25 strong sample I was working with. So once the rest came in, and the odd balls were massaged out (I mean seriously, whose first name is AK? – the ? is actually part of the name…).
Anyway, once the data was ready to play with, I got stuck in. Now I have noticed recently that I have begun to move away from contextual analysis and been focusing more on just what I could find. The issue with this is that the data is then often of no interest or use to anyone. So this time I made sure to always consider what would the organisers or riders themselves want to get out of this viz?
As such, there are far fewer graphs in the resulting product and its far more descriptive, I think going forward I may need to work on that balance a bit more…..
Find the viz here: https://public.tableau.com/profile/alex.h.w.#!/vizhome/PrudentialRideLondon_1/Dashboard1