Kriebel crushes DS with every baseball game ever

Yeah. American sports. Not much of a fan. Sorry Andy and Carl

I do enjoy sport though I’ve never really been inducted into the joys of baseball or football or whatever else there is. I guess I’m missing out, right? In any case, this puts a spanner in the works for my dashboard week narratives. I’ve hit a dataset I don’t really feel any personal connection to! Such is life. And, more importantly, such is data. Time to chow down and get it done.

The data

Stats from every baseball game from 1871 to 2017, held at Retrosheet. No API this time. We were able to make use of Alteryx to batch read in the data as they were held in text files for each year. A little magic was required to access the column headers. These were held in a separate file. Two camps rose to the challenge. The first went rogue and manually cleaned the index file using Excel. The second fronted it out in Alteryx.

There’s some wisdom I heard once and I can’t remember the attribution now (maybe someone can let me know if they recognise it?). It seems quite prudent to reflect on it here anyway. If you have to do something really complex once then the manual : automation time ratio is heavily skewed. Writing the code to pick out the logic may take much, much longer than just biting the bullet and rolling up your Excel slieves. If you will have to repeat the process for a large number of datasets, use code to make things automated, repeatable and verifiable…

Getting the data

Boom.

The big thing for us here is being able to use the prefix of the file along with an asterisk. This tells Alteryx to bring in all the files with this prefix in our folder – one file per year. Once in Alteryx, we clean and rename the data, then export it in a tableau-readable format.

Telling the story

While I don’t have much of an interest in American sports I do enjoy a bit of history. So I thought I would have a look at baseball through the second world war. Perhaps we can pick out some attendance changes or connect the stats in the dataset to the historical timeline of this period? I thought, perhaps, it would also be possible to pick out professional baseball players who subsequently enlisted and lost their lives in the war. It was possible to pick out one of these men – Elmer Gedeon. However, the nature of the data precludes individual player analysis. It’s really about each game as a whole.

What did I land on in the end?

Take a look at the final viz here!

Let me know if I hit a home run or struck out by pitching me your thoughts on Twitter – @AdiBop_.

See y’all soon.