Today I was faced with a great deal of time pressure – leaving the DS at 2 in the afternoon for a teaching commitment meant I had to approach today’s challenge in a different way, and in many ways reign in the scope of my project on what is a very comprehensive and interesting dataset.
The initial download was a very large one, consisting of 3 files: one containing copyright requests made to google, another containing the offending domains data, and a final one containing information about requests for which no action was taken.
My initial instinct was that I wanted to join the requests and domains data together (with ideas of creating a sankey or flow plot of some sort floating around my head….good job reigning in the scope Ross). But given that was obviously not going to work given the enormous size of the files involved and the inevitable multiplication of records it would produce, I went ahead and opened the requests file in tableau.
It was here that I noticed the companies that made the most requests were typically involved in the music industry, and so I went back to alteryx to limit the requests data source to a handful of music industry companies that I was keen to analyse.
I then joined this smaller dataset to the domains data in order to get an accuser and an accusee for each request. However this still proved to be a large dataset, so I went ahead and filtered down to just members of BPI (the British Phonographic Industry) as the largest single copyright requester and something very relevant to my interests.
The results of this join were much more usable, and so with a few alternative outputs at earlier points for different aggregations my data prep was done, so I moved into tableau.
In the interests of time and keeping it simple, I decided to go with a simple long form dashboard discussing the relationship between the BPI and Google in terms of handling piracy, accompanied by a few simple charts to illustrate Google’s performance on requests from BPI and to show the worst offenders when it comes to attempting to defraud the BPI.
The result looked a little like this: