Today’s task was to scrape data from a pdf. The Diamond League (athletics events) was the chosen one!!! The data I chose to prep was from the Rabat 2019 meet.
To parse pdf’s in alteryx I used the following blog which was absolutely essential https://oliverpower.wordpress.com/2018/02/08/parsing-pdfs-using-alteryx-and-a-little-r/ Thanks Oliver Power.
My Alteryx workflow was limited due to the Regex and Text to Columns problems I ran into. This limited the number of records I produced to take to Tableau to visualise the data. The practical uses of both Parse tools made this challenge hard but interesting. It required plenty of patience and a slice of luck along the way.
A ranking issue meant I could not use any of the Women’s data I produced and therefore I focused on the three men’s events in the data.
Filtering and formatting within this dashboard would have increased its functionality for the user, however the beauty of dashboard week is the time pressures felt to produce the end result does not allow for these alterations if like today time runs out.
Day 2 done!