Dashboard Day 3 - Star Wars Characters

by Ollie Matthews

Today's task was to extract Star Wars film and character data from a PDF file and then make something interesting. We only had a few hours for this and I was focusing on the characters.

Data from a PDF

Extracting data from a PDF can be extremely challenging so tried to avoid doing this where possible. Such a task is sometimes unavoidable however and in our case we had 260 tables to deal with.

Fortunately there exist some free tools online to make things easier. In our case we used the PDF to Excel converter here: https://www.ilovepdf.com/pdf_to_excel. With this I was able to get one excel file with one sheet per table. A good start!

out of the 260 tables there were about 70 that gave impressions of individual characters by demographic group, all with the same schema. My task from this point was to isolate these tables, work out which character each table referred to and union the tables together. These steps were necessary before prepping the data for Tableau.

Alteryx

I needed to use a batch macro in order to union all the sheets that we wanted:

This macros reads in and unions multiple sheets and then appends a field with the Sheet/table number.

This worked well but of course, each table refers to a different character and currently we only have the number of the table. To Remedy this I created a lookup table for the character names:

The next step was a workflow which incorporated the macro above and joined the main data with the lookup table:

While I was working on these steps, my cohort buddy was working on cleaning and preparing the data. As a result, within the space of only a few hours, we had a useful dataset exported to Tableau.