Scraping the Backpacker Index with Python and Alteryx
Today marks the first day of Dashboard Week.
For our first challenge, our coach Lorna gave us the following task:

I started by using Python to extract the structured text content from the main Australia/New Zealand Backpacker Index page. The website doesn’t use tables, which made the process slightly more difficult - it’s just styled content made to look structured. So I had to go old-school and extract text from the raw HTML using:
Python Libraries Used:
requests
- to pull the raw HTMLBeautifulSoup
- for initial parsing (although I pivoted to regex)re
- to write regular expressions and pull out key sectionspandas
- to help clean/export the data- VSCode – for writing and debugging the script
Here's a snapshot of some of the script written:

I used regex patterns to:
- Extract the city name and country from
<h3>
tags - Extract each cost item (hostel, meals, drinks, attractions, backpacker index) from
<li>
elements - Create a structured block of text for each city
Once I had the raw matches, I printed and exported them.
Switched to Alteryx for Final Cleaning
Rather than trying to build a full parser in Python (which would get messy and at the moment my Python is not advanced enough (especially combined with my Regex)), I exported the scraped data as plain text and cleaned it up in Alteryx Designer.
In Alteryx, I used:
- The Regex Tool to split out the values cleanly
- A Multi-Row Formula to fill down city names
- A Data Cleansing Tool to strip any leftover symbols
- Final preparation before exporting it as a structured dataset
My Dashboard
Here's a preview of what I built with the data I collected using Power BI:

Insights
Cairns is definitely the cheapest city to backpack through, though has scorching temperatures.
I'd probably prefer a city like Auckland or Sydney myself, as they are 2nd and 3rd on the list in terms of pricing, but the temperatures are much lower for someone like me who can't handle scorching hot weather.
Challenges Faced
- Web scraping
- Structure of the website - no tables made life much more difficult :-(
- Using Python
- Regex
Final Thoughts
Sometimes, especially when websites don’t use proper tables or structured HTML, regex is faster than BeautifulSoup - especially if you're just grabbing blocks of known content. And Alteryx is a great cleanup tool when you don’t want to keep debugging code in Python.