Dashboard Day 1 - Web Scraping

by Joe Stokes

First day out the way and I am relieved to have it out of the way. I am actually pleasantly surprised with how I managed my time today. I completed the dashboard with around 30 minutes before the 5pm deadline, with enough time to ramble on in the blog.

The task was the web scrape the ASOS website for all of the new products for both Men and Women’s clothes, and make a viz.

The web scraping proved to be really tough at times. Most of the examples we’d had in training was relatively straightforward so I was quietly optimistic. That was quickly squashed when I took a look at the HTML under the hood of ASOS.com

It was a scary sight and immediately knew it would be a challenge to quickly extract the information we needed. It took me lots of Regex frustration and practice before I could pull out the data I wanted. At around 2pm I was content I had all of the information I needed, but only for one page!

The next 1.5 hours was a crash course in how to build iterative Macros. It was a great experience for me because it had slightly flown over my head whilst training but the opportunity to use it in a project has helped me understand it and see when I can utilise it in the past. I essentially created two iterative Macros, one for Men’s clothes, the other for women’s clothes and unioned the output of both.

Now I had all of the data I had to quickly decide on a viz. As soon as we found the data I had an idea that it would be interesting to compare the pricing of female clothing to male clothing on the website. The result really surprised me, I had an inkling women would be paying more, but not quite to the extent they do.

The dashboard compares male and female prices across different brands and allows a user to click in and investigate what products make up each brand.

One limitation is the brand names are all in lower case and connected with hyphens. Unfortunately I ran out of time to edit my workflows to split up on hyphen and convert to title case.

See the interactive viz here.

Day one done, bring on the next one!

© 2022 The Information Lab Ltd. All rights reserved.