Two major contenders in the ring tonight: new but developing at light speed, Tableau Prep and tried and trusted, Alteryx Designer.
Who will win this time after a draw in the last Prep Off? Watch the replay of Round 1 here.
How do these two softwares compare when cleaning an Excel spreadsheet with 182 fields? The chosen data features the table seen in Figure 1. This data was used in Makeover Monday Week 2 2019 (after it was cleaned, of course). Download the Excel file (Freedom of the Press) and follow along with me.
Unlike Round 1 of the Prep Off, I will be cleaning this data in Alteryx first and then I’ll do the same in Tableau Prep. Figure 2 below shows how the Freedom of the Press data is loaded into Alteryx. Note that there are 182 fields and only 211 records. This data is very wide and short, our aim is to get it into a friendlier long and thin format. The method for cleaning this dataset was formulated by DS11’s Ellie Mason. My method for cleaning this data was a little more complicated so it’ll be better to use Ellie’s neater method. Let’s get started.
As per Andy’s instructions, we are only interested in 1993-present, not any years before. So, the first step was to use a Select tool to remove unwanted years. This reduced the number of fields to 161. Next we bring the contents of row 1 up to the field names using Dynamic Rename. Figure 3 shows how the data now looks. We’ve lost the years but we’ll add these back in later.
Next, we want to use a Transpose tool to get all the values in one column and the type of value in another, instead of having 10 columns for each year in the dataset. Figure 4 shows how the data now looks.
According to Andy, we are only interested in the Total Score values and the Status values. So we filter for rows containing Total Score and rows containing Status into two separate streams. Next, we bring the years back in. We know that the years start at 1993 and count up until 2016 so we can use the Multi-Row Formula tool to create a Year field. Joining the two streams back together, we have a column for Total Score and one for Status. Figure 5 shows how the data now looks.
We will next add a column for Edition (which is Year plus 1) and filter out rows where both Total Score and Status don’t contain information. The final result is is shown in Figure 6. Nice and neat!
After cleaning the data in Alteryx, I attempted to clean the data in Prep using the same method. Right off the bat, it was much nicer due to the Data Interpreter.
As you can see in Figure 7, both the years and the value types (Total Score, Status, Print, Broadcast, etc…) were brought into the field names. This will make cleaning the data much easier.
As with Alteryx, the first thing I do is remove the years I don’t want by selecting the columns I don’t want and clicking remove in a Clean step. Next, I use Pivot to get all my values in one column and types of values in another column (like the Transpose tool in Alteryx). Figure 8 shows how the data now looks.
As you can see, the “Pivot1 Names” column has year, edition year and value type information. We can now use a split to split this field into three. Figure 9 shows the result of a split.
Next, I pivot the values and value types field to get separate columns for Total Score and Status. I then filter out nulls as before and rename columns. Figure 10 shows the final result! Note the image is not sorted but rest assured that the result is the same as with Alteryx!
So which was better?
Both Tableau Prep and Alteryx were able to clean the data and get into the same format for Tableau. Figure 11 below compares the full workflows in both Tableau Prep and Alteryx.
The process of cleaning the dataset, I felt was much simpler in Prep than in Alteryx. This was largely due to the Data Interpreter. Without this, I probably would have had to follow the same method as with Alteryx but I likely would have struggled without a Multi-Row Formula tool.
Overall, Tableau Prep wins!
Cleaning this dataset took me longer in Alteryx than it did in Prep. Additionally, the method for cleaning the data in Alteryx in this post was not my own logic, but Ellie’s. My own logic for cleaning the data was a bit more complicated. Whereas putting the data in Prep, I was able to clean it very quickly without any logic issues. This is a true testament to Prep’s functionality and ease of use as I am more familiar with Alteryx but preferred Prep for this data. Bear in mind that this was a small dataset without any super complex ETL needs and so, Prep was better in this particular case. Data Interpreter for the win!
Take your pick of which ever software you’d like. They both work!
Happy data prepping!