There was no slowing down this week. Our project was predominantly Alteryx based – a software we had never used before so this was a real uphill battle! The theme was Education and the task was to find Education data on the Internet, use Alteryx to prepare and collate it then use Tableau to create a Viz.
Fortunately the almighty Alteryx Ace Chris Love provided us with the ammunition to combat the challenge by teaching us about connecting to data, dataset manipulation, data parsing, data blending, web scraping and where to look for help. Always impressed by experts in a subject who are able to pass on their knowledge so well!
As usual, Andy had lined up another prominent figure to speak at the Data School; yes we are becoming a spoiled bunch! This week Tableau CMO, Elissa Fink talked to us about how Tableau is marketed. Elissa was very informative, encouraging and simply magnificent. The more I hear about how Tableau operates, the more I fall in love with it. Tableau’s success has been built on putting us customers and the community first and at the heart of everything they do. And no doubt, they use Tableau themselves in the Marketing team too!
During the week we were also fortunate to have Laszlo Zsom talk us through the differences in table joins between Alteryx, Tableau and SQL as well as consultants Mike Lowe and Robin Kennedy who further prepared us for the Tableau Desktop Qualified Associate exam by delivering some more Tableau training.
Back to Alteryx. My aim for this project was to drive more insight from The Compete University Guide’s university ranking tables. I wanted to import all 9 years worth of league tables by the 67 subjects along with the overall league table for every year, meaning I needed to import 9 x 68 = 612 league tables. Now imagine going to each and every one of those 612 web pages, copying and pasting the tables, removing and column headers, making sure all the data was in the right format and so on. The whole process would be boring and repetitive and would’ve taken days if not weeks to complete! Luckily, I was able to do this in no time once I figured out how to use Alteryx.
To begin with, I notice a pattern emerging with all the URLs:
- Overall table for 2016 – no year at the end
- Overall tables by year – year at the end
- Subject tables for 2016 – no year at the end
- Subject tables by year – year at the end
My first task therefore was to generate all 612 URLs using the first part of the URL, which they all had in common, and XML elements for all 67 subjects. I got the XML elements by right clicking on the webpage that displayed all the hyperlinks to the subject league tables and clicking on ‘view source code.’
Below is the workflow I created in Alteryx to generate all 612 URLs.
I used import.io to extract the 612 league tables of data. My next task was to extract the subject names and years from the URLs, I was thinking one step ahead here, as I knew I needed a column for Subject and Year so the end user can filter the league tables by these two variables. This led me to create the workflow below.
The rest was straightforward. I used Tableau to create my Viz. Click here to interact with it on Tableau Public. It allows you to see the number of universities you specify ranked and on a map by year, subject and one of four indicators, explained here on their website.
Quite astonished I was able to do this! Previously thought you had to be a hardcore programmer to do things like web scraping! Excited by the possibilities of using Alteryx in combination with Tableau, especially since we accomplished so much in so little time using Alteryx!