On our second day of DS11 we got stuck in to our first experience of Alteryx. After showing us the logical clean up of a simple Excel sheet which included the use of an ‘if then’ statement, using a filter and transposing a set of rows, Carl decided to have a little ‘fun’…. We were set the task of attempting to use Alteryx in pairs to prep an Excel data sheet taken from online sources. Maleeha and I were given the results from the 73rd La Vuelta ciclista a España (the ‘Tour de France’ of Spain) held recently this year, downloaded from https://www.procyclingstats.com/race/vuelta-a-espana/2018. The Excel sheet showed problems almost immediately:
The winner, Simon Yates, had his winning time, however the other racers had the time they took after Yates had crossed the finish line.
The second problem was a mistake where the recorded time jumped from 17 minutes and 57 seconds to 25 hours and 40 minutes, rather than going to 25 minutes and 40 seconds. This mistake remained until 57 ‘hours’ and 29 ‘minutes’ jumped back to the correct 1 hour, 3 minutes and 7 seconds.
Then another shock came, when loaded into Alteryx we were sent back to the 1800’s… Due to the hours taken to finish the race clocking over the amount of hours in a day, the time taken became amount of days since a certain date but with no knowledge of when the date was. This date was then taken to be the start of time as Alteryx knows it, which happens to be 30/12/1899, resulting in some funky stuff in the time column!
The next two hours were spent running along a logical trail trying to sort out these issues without touching the Excel sheet (which would have probably solved the problem in a much quicker way – but way less cool!). Coming from a mathematical background I loved the challenge of attempting to solve this time issue. Here is what was produced:
The route taken involved sectioning off the different ways time was recorded and then taking steps in order to reduce these different timings to a format that then was able to be brought back together to be useful in a data visualisation. We did not manage this due to Alteryx being very reluctant to like a timing be longer than 24 hours long, however I really enjoyed giving it a good shot!
Whilst a lot of this post may seem to sound rather moany about how Alteryx run but my experience was quite the opposite! I loved how it runs, how logical the tools were and was even excited enough to hop straight back on my laptop once on the train to keep on trying to get further with the data prep (until I ran into the 24 hour issue and decided to cut my losses). What I have learnt from this experience is that Excel may try to help you and really just shoot you in the foot, Alteryx has a few (maybe more) issues with time and that some things may be easier solved by going back to the data source rather and editing rather than trying to do it all in the manipulation software. I am really looking forward to my continuing my use of Alteryx and learning how powerful a tool it can be! Carl is yet to make me hate him…