Dashboard Week Day 2: Diamond League PDF parsing

by Andy Kriebel

Sara and Bona are feeling particularly evil for day 2. One of the hardest data prep exercises is to scrape data from PDFs. IT IS BRUTAL!

Photo by Braden Collum on Unsplash

The data sets to prep are from the Diamond League, which I had to google to find out what it is. The data, in pdf form, can be found here. To parse PDFs in Alteryx, they will need this blog to get started. Here are some hints:

  1. Use lots of Regex
  2. There are alternating table structures.

Today’s rules:

  1. They must gather AT LEAST then Name/Nationality/Year of Birth/Result
  2. Everyone must pick a different country
  3. They may not use PDF connector in Tableau.

If they are still stuck at 1pm, they may use Prep Builder as an alternative method, but still may not use the PDF connector in Tableau (it won’t really help them anyway).

Reminder that before 5pm, they must have their workflows completed, dashboards uploaded to Tableau Public, and blog posts written.

That’s it! Simple.