Alteryx Fundamentals - Data Cleansing Tool

by Emma E. Jones

Tidying up your data before you use it is vital to ensure you do not run into problems at a later stage of analysis.

Using the Results Pane

The results pane is useful for understanding the issues with your data. It can show you which of your data can be used in analysis and which cannot.

The Data shown in the results pane below has several issues which will impact analysis if not corrected.

By hovering over the column headers in the results pane when the workflow is run we can get a better understanding of whether our data needs cleansing.

Hovering over the 'Name' column with your mouse will give you a break down into your data. Here, it is evident that in the 'Name' column only 63.64% of data is formatted correctly and therefore only 63.64% of this data can be used in analysis.

The coloured lines under each header demonstrate how much of your data can be used in analysis. It acts as a visual representation of the percentage of useable data that you have. Red represents 'Not Ok', yellow 'Nulls' and green 'OK'.

If the data is 'Not Ok' a red triangle will appear in the corner of the cell. This highlights which specific aspects of your data need fixing.

Hovering over any of the cells which contain a red triangle will tell you what is wrong with the data in that particular cell. In the case of the cell containing 'Lucy', Alteryx explains that there are leading spaces which are causing the cell to be 'Not Ok'.

All of these 'Not Ok' values can be fixed using the data cleansing tool.

Configurating your Data Cleansing Tool

The configuration for the data cleaning tool will depend on the errors in your data.

Using the previous example we know that we have 'Not Ok' and 'Null' values in both columns of our data. We therefore must ensure that both columns are selected.

All columns are automatically selected so if you did not want to make the same changes to multiple columns only select the columns you do want to make changes to.

Removing 'Nulls'

In our example we have nulls in some rows but not in columns.

We therefore select the 'Remove null rows' box to fix this.

Running the workflow after this point will remove the nulls from your data.

If we had an entire column of nulls we could remove the entire column by selecting remove null columns but in this case it works just to remove null rows.

Our data has now removed all the nulls and we can see this since the yellow bar under the column header has been removed. Hovering over the data will also no longer show any null values.

Correcting 'Not Ok' Values

Our data still needs some cleaning as Alteryx is still showing 'Not Ok' values.

From the data above we can see that there is still whitespace in some of the cells and there is some unwanted punctuation. The 'Remove Unwanted Characters' category will allow you to select any field that you do not wish to appear in your data. In our example we want to remove 'Leading and Trailing Whitespace' and 'Punctuation'.

Running the workflow with these changes will provide this output.

Now our data no longer has any leading spaces or punctuation but there are still some inconsistencies in the data which could cause problems later.

Modify Case

Records 8 and 9 are written entirely in capital letters whilst all other records are Title case. This can cause issues since Alteryx is case sensitive.

This can be fixed in the Data Cleansing tool by selecting the Modify Case field and then choosing 'Title Case'.

This will convert all string data into Title Case format. The modify case field will convert your string data to any case type that you select.

After running our workflow we now have a dataset that is consistent and free of 'Nulls' and 'Not Ok' values.

Avatar

Emma E. Jones

2 mins read

Fri 26 Nov 2021

Thu 25 Nov 2021

Fri 19 Nov 2021