Last week in training we had a training session with Andy on Power BI. One of the things that interested me about this software was its ability to web scrape compared to Tableau (which is, as far as I know, currently non-existent).
In the example that was walked through with us, we managed to web scrape IMDb's Top 250 Rated Films in only a few clicks.
Here's a picture of what the webpage looks like:
You must first connect to the webpage. If you are creating a new report, you'll be greeted with this window, click on 'Get data' at the left hand side.
Next you must find 'Web'. I found it easier to use the search function. Click on 'Connect' at the bottom right.
An alternative way is to click on 'File' at the top left of Power BI and then clicking on 'Get data' and then 'Web' in the resulting windows.
After clicking on 'Connect', you will see this pop up. In this case, I simply entered the URL for the IMDb page and clicked on OK.
Afterwards, you'll get a 'Navigator' window which shows the tables that have been scraped. It's always good to have a look at all the tables to see which is the best to use. In this case, I used Table 1 so I selected it (as evident by the tick mark next to 'Table 1'). Next, I clicked on 'Transform 'Data' on the bottom right because the data needed cleaning. You could of course, if your web scraped table was clean, directly Load it into Power BI as it is.
This is what the data looked like. There's quite a few problems such as unnecessary columns, the years need extracting, and the ranks need extracting. Note that Power BI is using Power Query to transform and prepare the data.
I won't go into detail of the cleaning steps (that could even be its own blog post) but here's a preview of the Applied Steps and the resulting table.
I was talking to Andy about this and we said that because this was relatively easy, it might even be more efficient to use Power BI for this web scraping ability and then output it into other tools e.g. Tableau and Alteryx. It's free, so nothing to lose!
With this being said, who knows, it could be a right pain when it comes to web scraping more complicated web pages. But we'll see. This has certainly opened new doors.