Dashboard Week: Day 1

Scraping the Backpacker Index with Python and Alteryx

Today marks the first day of Dashboard Week.

For our first challenge, our coach Lorna gave us the following task:

I chose to do the Australia & New Zealand region. https://www.priceoftravel.com/australia-new-zealand-backpacker-index#BackpackerIndexesfromtherestoftheworld

I started by using Python to extract the structured text content from the main Australia/New Zealand Backpacker Index page. The website doesn’t use tables, which made the process slightly more difficult - it’s just styled content made to look structured. So I had to go old-school and extract text from the raw HTML using:

Python Libraries Used:

  • requests - to pull the raw HTML
  • BeautifulSoup - for initial parsing (although I pivoted to regex)
  • re - to write regular expressions and pull out key sections
  • pandas - to help clean/export the data
  • VSCode – for writing and debugging the script

Here's a snapshot of some of the script written:

I used regex patterns to:

  • Extract the city name and country from <h3> tags
  • Extract each cost item (hostel, meals, drinks, attractions, backpacker index) from <li> elements
  • Create a structured block of text for each city

Once I had the raw matches, I printed and exported them.


Switched to Alteryx for Final Cleaning

Rather than trying to build a full parser in Python (which would get messy and at the moment my Python is not advanced enough (especially combined with my Regex)), I exported the scraped data as plain text and cleaned it up in Alteryx Designer.

In Alteryx, I used:

  • The Regex Tool to split out the values cleanly
  • A Multi-Row Formula to fill down city names
  • A Data Cleansing Tool to strip any leftover symbols
  • Final preparation before exporting it as a structured dataset

My Dashboard

Here's a preview of what I built with the data I collected using Power BI:


Insights

Cairns is definitely the cheapest city to backpack through, though has scorching temperatures.

I'd probably prefer a city like Auckland or Sydney myself, as they are 2nd and 3rd on the list in terms of pricing, but the temperatures are much lower for someone like me who can't handle scorching hot weather.


Challenges Faced

  • Web scraping 
    • Structure of the website - no tables made life much more difficult :-(
    • Using Python
    • Regex

Final Thoughts

Sometimes, especially when websites don’t use proper tables or structured HTML, regex is faster than BeautifulSoup - especially if you're just grabbing blocks of known content. And Alteryx is a great cleanup tool when you don’t want to keep debugging code in Python.

Author:
Kristupas Justas Jucaitis
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab