Today the data was about content delistings due to copyright infringements based on Google’s Transparency Report. I started looking at a different subset of the data and later understood that it wasn’t the one that was specified in Andy’s post. So, I had to retrace my steps in the afternoon and analyse the main Transparency report’s dataset in the time left. I decided to look at the number of requests Disney Corporation have submitted to Google asking to delete links published on Pirate Bay, one of the largest torrent websites, to prevent the illegal download of Disney’s content.
Three separate datasets present data on individual requests, domains where the infringing links were, and the list of URLs that Google didn’t take any further action. The data sets are quite large, but don’t require much preparation.
Working in Alteryx, I started with filtering the data to only Disney’s requests and Pirate Bay’s domain, and splitting the original date column to have the date of the request in the correct format. I also used the Unique tool before exporting the data to a .hyper file as I understood that the entries were duplicated to show each URL for which Google took no further action. However, as you can see from the screenshot below, the total number of URLs removed and not actioned is repeated in such case (for example, for the Request ID 1244590):
Visualisation in Tableau
For my dashboard, I provided an overview of Disney’s requests to remove their content from Pirate Bay, showing the total number of requests and URLs removed based on these requests. Shortly after Disney announced their new Disney+ streaming service in November’18, Pirate Bay published a list of over 600 Disney movies and series coming soon to its own service. It was interesting to see from the data that this prompted Disney to submit more requests to remove content from the torrent site.
Link to the interactive dashboard on my Tableau Public page.
Let me know if you have any questions.