Second day of dashboard week was about copyright infringements. The data came from the Google Transparency Report.
The objective was to choose one topic out of the database to download. However, we could not find a way to download by topic, only the whole database, which was about 3.6 GB! We solved the problem by having only one person to download it and sharing it with us with the use of a USB stick.
With my dataset in hands, I started to analyse what could be interesting to talk about. I then chose to look at requests made on behalf of FOX, which was one of the top copyright owners to request the removal of URLs due to copyright violations.
The folder came with 3 excel files and a README text document, which is pretty much the data documentation and an essential material to read before working with the data.
Bringing it to Alteryx was not an easy step. The data was huge and took awhile to run everything. I tried to build my workflow as much as I could before running it and I then cached it even before deciding on what I would focus on. I decided to work only with the domains and requests and didn’t use the file that contained information about actions not taken.
Here is what my final data set looked like.
Working in Tableau, I looked at the number of requests made by month and year. I found interesting the number of copyrights owned by FOX and decided to show it along with the number of reporting organisations working on their behalf.
I found an interesting observation about the first semester of each year, which shows a higher density of requests than the second semester. 2017 was the year with the highest number of requests and then it has decreased since then. Would be interesting to know why but I didn’t have enough time to find a reason on Google.
Here is my final outcome: