Dashboard Week Day 2: Scraping PDFs with Alteryx

by Sarah Jellenc

On day 2 of Dashboard Week, our task was to scrape data from a PDF using Alteryx and build a viz with the resulting data set. We each chose a different PDF of results tables from a meet of the IAAF Diamond League (I chose the Monaco 2019 results). According to Andy, this is one of the hardest data prep exercises!

Today’s task was particularly challenging because the tables in the PDF had alternating structures; we needed to use a bit of R, a lot of Regex, a blog, and the help of several colleagues to get the job done.

The original PDF to be scraped.
Kicked it off with a bit of R…

Once I had imported the PDF into Alteryx, I got to work restructuring the data. This wasn’t entirely straightforward, but I learned quite a bit.

My initial workflow.

Next, it was time to build the viz. I sketched out a few ideas first, then decided to focus on the top competitors in each event.

My fancy sketching.

And finally, the (almost) finished product:

In the future, I’d like to add both the competitors’ countries of origin and result times to the tooltips.

How to Prepare for the Tableau Certified Associate Exam

3 mins read

Tue 27 Aug 2019

Dashboard Week Day 5: Uber Movement

1 mins read

Fri 23 Aug 2019

Dashboard Week Day 4: No-go for Tableau

2 mins read

Thu 22 Aug 2019

Dashboard Week Day 3: Four KPI Dashboards

2 mins read

Wed 21 Aug 2019