Scraping Data with Alteryx & ChatGPT

Today's Challenge: Scrape and Visualise

Day 4 presented us with a challenging task: to scrape data from a website and transform it into compelling visualisations. While the requirements were straightforward, the data preparation phase was anything but.

The Task: Visualising London Marathon's Historical Results (2014-2023)

The Requirements:

We had to input the first two letters of participants' surnames into the website's search box, which had to be the same as ours (for me "Do".
We had to bravely venture into the depths of data, downloading the historical results for each page, for each year.

The Journey:

My journey began with the task of scraping data using Alteryx. However, like in any challenging endeavor, I soon encountered a significant obstacle: I was unable to retrieve data for all participants whose last names began with "Do."

Facing this roadblock, I turned to ChatGPT, which has consistently been a helpful tool for me. I asked ChatGPT to generate a Python script to assist me in my mission. While it wasn't without its share of challenges, it took a few iterations for ChatGPT to provide me with a functional script. However, it's worth noting that this script was initially tailored for just one year's worth of data. Below is a glimpse of the code:

But I didn't give up. I wanted to try both Alteryx and ChatGPT. After some experimenting, I got Alteryx to work, and it collected data for multiple years.

Comparing ChatGPT and Alteryx, I found that ChatGPT was faster but better suited for single URLs. Alteryx, on the other hand, handled multiple URLs well, so I chose to use it for this task.

Below is the workflow:

Author:

Sanjeev Dosanjh

View Profile