Dashboard Week #2: Web Scraping the Rules of Parliament

by Joe Beaven

Today’s challenge was based on Rules of Parliament and involved getting data from ParlRulesData.org. The only problem was, we weren’t allowed to download any data. Instead, all the information we needed had to be scraped from the website directly.

Getting the data:
The ParlRulesData.org set out its data as a ‘landing page’ which could be clicked through to underlying data that was to form the rows of our output:

Landing Page
Underlying Data

The first stage was to address the Landing Page to generate a list of URL’s for the different pages of underlying data – these were the URL’s behind the blue ‘Date’ field.

This meant downloading the Landing Page to Alteryx then splitting each row on the Landing Page into separate rows in Alteryx and extracting the URL behind ‘Date’ using RegEx. Inspecting the HTML revealed that all of the links under ‘Date’ were prefaced with ‘ <h ref =” ‘ and ended with ‘ “> ‘ so it was possible to extract the underlying hyperlink by taking information between the two start and ending expressions.

This process generated a list of all the URL’s under ‘Date’:

Data started back in 1811!

This list could be used to access the underlying webpages of data. The next step was to call each of the underlying webpages by inputting the URL’s above.

Once this was done, the challenge was to deal with the Underlying Data pages. Through inspecting the HTML and using Regex I carried out the following steps:

  • From the Underlying Data webpage, extract the table
  • Split the table up from block text to one row per table row
  • Parse out the Date, Text (Rule of Parliament) and it’s SO number (Basically Rule ID)

Finally, some Rules had multiple sub clauses contained as line breaks. These had to be labelled using 1.01, 1.02 etc.. To do this I made use of Alteryx’s multi-row formula to create a ‘row id within each SO’.

My final output looked like this:

Dashboarding:

I wanted to look at how the Rules of Parliament have changed within governments and Prime Ministers so I had to initially pull some data from Wikipedia on the winning Primeminister and Party from each general election. The problem was my Rules of Parliament Data was every year, not just election years so I had to ‘scaffold’ the data creating the party in government and PM for every year, not just election years.

My output:

Joining this data with my Rules of Parliament Data, allowed me to build the following dashboard looking at which governments and PM’s have seen the most change in Rules of Parliament during their tenure.

I looked at Rules of Parliament that had come into effect per length of tenure as we would expect more change for longer governments or premierships. Interestingly this showed that minority or coalition governments have overseen much greater levels of change in the Rule of Parliament. This is despite their reputation for being slow to reach decisions.