Data Preparation... for Dummies

by Alessandro Costanzo

Last Friday DS13 was assigned a project. In the time span of 2 hours (which became 4 afterwards), we had to re-create our application viz. It was also required to find a new dataset, preparing it with Alteryx or Tableau Prep, and adding it to the original Viz.

Thankfully I had in mind which dataset I could add, although I was pretty worried for the timing. When I originally designed my Viz I spent hours to manually cleaning the data. Only the thought of repeating this process, plus doing it for a new dataset discouraged me. Despite that, I started a workflow in Alteryx and to re-build my dataset from scratch.

Against all my expectation, the cleansing process took me around 45 minutes only. In this post, I would like to describe how I used to clean up the data before joining DS and how this process can be streamlined using a tool such as Alteryx or Tableau Prep.

The original dataset I’ve used for the Viz was the Pokémon database file, retrieved from https://public.tableau.com/en-us/s/resources.

The first thing I’ve changed from the original dataset was to get rid of all the duplicates (Pokemon with 2 different types, I just wanted to keep the original one). Originally, I manually performed this task in Excel, deleting every row containing duplicates.

In Alteryx this action can be performed with a single tool, called Unique.

Just select the field to group the data by, allowed me to perform this operation in a matter of seconds, without the risk of missing rows of data.

The dataset I decided to add contained the Pokémon evolutions. I originally thought to include this in my application Viz, although due to my little knowledge at the time, I did not manage to join these 2 datasets properly.

This action, again took seconds to be performed, all I had to do was to join this 2 datasets and union the L and J outputs.

This new union allowed me to display Pokémon evolutions in the tooltip, a task that I could not perform before.

The most time-consuming task during the original creation of this Viz was to add the Pokémon descriptions. At the time I actually had to copy and paste this text from a webpage (yes, this means that I copied and pasted 151 different strings manually).

This operation could be streamlined using an Alteryx macro. Unfortunately, we haven’t discussed this topic yet, but I look forward to learning this and use it for other projects.

Reflecting on how I used to clean data makes me feel dummy, although I know is all part of the learning process. I look forward to keeping improving my skills and laugh about all the time I spent to compute simple operations.