Dashboard Week Day 4 - London Marathon

by Alfred Chan

Second last day of the dashboard week. Today we were given the task of looking at the London Marathon data, if you clicked into the link and tried to find some result by a specific year you might find out there is no download button.

Therefore I believe I have faced my biggest challenge so far this week - Web scraping and Regular Expression(Regex). These are never my strongest suit, I have never fancied regex as I always have to check the syntax and the logic to get through them and web scraping needs quite a lot of regex hence I am not a big fan of it either. However it is what it is, we will have to try to get the link for each year, and try to get the data by using Regex. If you are a keen reader, or you have read Andy's blog. You'll know supposedly we should work alone. But I'd say teamwork is really a strength of DS28. We managed to help each other in terms of the Regex, because every year the table structure of the data is a bit different hence we cannot just do 1 regex formula and parse all years' data. We will need to configure the regex syntax year by year (which is painful). So we have decided each of us would look at a specific year and work out the syntax. And in the end the Altreyx flow look something like this:

It might look scary at a first glance, but in fact, each container is each year's data to be parsed. And each green light icon is the regex we did for the web scraping (yea in half day). And some of the years' data have nearly the same structure so once we figured out the regex it is not really a lot of effort except for sense checking the data.

I have specifically downloaded the wheelchair category data, as most of us have spent I think over 70% of the day trying to web scrape the data out. So to avoid doing something similar I have chosen a specific category to look at.

As mentioned most of the time today was spent on Alteryx, regex. So there was not much time for me to build anything too insightful or sophisticated. My dashboard is as follow:

Tableau Public Link

Insight wise, as usual there is not much. But I believe instead of finding insight the more important thing is what lesson we have learned from today. So today I think I have a little bit more understanding on Regex and html table structure. Regex is definitely something I have not been trying hard to get. But today it came back and bite me at the bottom. But luckily as mentioned, teamwork is something DS28 really proud of. I am really glad we could help other out (as we are all in the office now). And with their help I have got a somewhat decent dashboard to show at the end of the day. So today is as fruitful as other days.