Preppin' Data Solutions in Alteryx: 2019 Week 2

by Etienne Soubes-Goldman

The following will be an Alteryx solution to the Preppin’ Data 2019: Week 2 Challenge. If you haven’t already, go check out the challenge and the associated input data here.

Step 1: Remove the extra field names and the empty rows

The first problem, as you can see from the picture below, is that the input data contains extra field headers when the location changes from London to Edinburgh.

Since the row containing these extra field headers will be the only one without a number in the value field, we can create a custom filter in the filter tool that checks if the value field contains only numbers. We can do this by using the REGEX_match function with the value field as our string and ‘\d+’ as our pattern. The \d just means digit, and the addition of the + means we want the pattern to contain one or more digits. This will mean our filter removes the extra field names and the null rows in one filter.

Step 2: Fix the location names

As you may have noticed from looking at the entries in the location field, some of the city names are spelled wrong. We can use a formula tool with the configuration shown bellow to fix this.

This works because each time London was spelt wrong, it still always contained ‘on’, and the only other city in the field is Edinburgh.

Step 3: Create separate columns for each metric

The first step to doing this is to merge the metric and measure column. This can be done simply by using a formula tool with the following setup.

Before using a crosstab tool to generate our new columns from the ‘Metric – Measure’ column, we can use a select tool to get rid of the now useless ‘Metric’ and ‘Measure’ column and change the Value column’s datatype to a double.

Finally, by using a crosstab tool with the following configuration:

We can create a separate column for each of the Metrics – Measures. To finish with a perfectly matching data set, use a select tool to rename each of the Metric – Measure columns before outputting the data.