Dashboard Week - Day 3: Star Wars

by Jyoti Gupta

Last day of dashboard week and DS 25 got to work on the most interesting data set so far: the epic “Star Wars”. The requirements were:

  1. To get data from 268 pages long pdf file available here
  2. Do the data prep
  3. Prepare something interesting and write a blog (ofcourse!!!)

I haven’t worked with pdf files before in either Alteryx or Tableau Prep. That was the biggest challenge of today. It took a lot of time to even figure out how to even start. I got to know about one macro that Ollie Clarke created to input pdf files in Alteryx. I tried with that macro but no luck. Later on got to know that I would require R plugins to work with that Macro. So, I decided not to go through that path. Then I stumble upon the adobe online pdf to excel convertor. It was as simple as uploading the pdf file and downloading the excel sheet from your adobe account. While the process was simple, the excel sheet I got was a complete mess.

Nevertheless, I exported it in Alteryx. I was working on “Popularity of the characters”. So I exported just Table BRD15: Which of the following is your favorite character?. It was split into two sheets. And each sheet had headers. So while importing each sheet individually, I imported certain number of rows. This got rid of headers and empty rows at top.

All the fields name were messed up so had to rename them keeping track of the field number. Interestingly, when converted to excel, in some of the sheets ‘1’ got replaced with special character which was unreadable by Alteryx. And within a fields, some rows had percentage (eg. ’12%’) while others were numbers (e.g. 0.10; meaning it was 10%). So to tackle this I used multifield formula, as this problem persisted in all fields. The process was similar for both the tables and after doing slight prepping, I was in position to union both together and output result in hyper format.

Once I had data in tableau, the first thing I did was to create groups in demographic category. I merged Gender Female and Gender Male into one group called ‘Gender’ and so on. I didn’t had much time to work on my dashboard as data prepping took majority of my time. However, I was bale to create the following:

In the above dashboard, user can select their character and see how popular they had been in different demographics.

Challenges:

The biggest challenge for today was dealing with pdf file. It took a lot of my time to even figure out how we can get this pdf data into a format which could be easily dealt with in Alteryx or Tableau prep. Then cleaning the excel sheet was another big hurdle. It took a lot of trial and error to get something fruitful from the file. Lastly, like always, time was running short.