Week 3 at The Data School, and Day 2 of Alteryx 101 training. We learnt a huge amount about the variety of different tools available, as well as how to create and use macros and apps within the Alteryx environment. The star of the show for me today was the Text to Columns tool. I confess that I am already slightly falling in love with Alteryx – it appeals tremendously to the librarian part of my brain – the ease and speed with which it allows you to clean and sort data is quite spectacular. Every workflow clearly shows the series of decisions taken to prep the data into a more usable format, and the whole thing is simply an exercise in applying logic in a number of steps to achieve the outcome you require.
The Text to Columns tool
This does exactly what it says on the tin. Imagine you have a mass of unwieldy text in just one field and you need to split it out into several columns – here’s where this tool comes into play. By dropping it into the workflow it allows concatenated data to be separated out into more a usable form – a set number of new columns containing, for example, one value in each column. Below is an illustration of where you could place a Text to Columns tool into a workflow. This example uses OECD data on teaching hours at primary schools around the world.
Below is a good way to see how the Text to Columns tool can be used in practice. It’s easy to see how messy the data brought in from the original OECD .csv file is. All the values are squashed into the first column and separated by commas. Using Text to Columns we can quickly transform the data into individual columns and make the data usable.
Configuring the tool
Simply drag the tool into the workflow and the configuration panel will appear.
Here you can see that it is simply a case of choosing which field you wish to split from the drop-down menu at the top of the panel.
The next field allows you to indicate the type of delimiter in use. In this case it is a comma (‘,’).
The configuration panel then asks you what sort of split you wish to perform – a Split to Columns or a Split to Rows. Here we want to split the long, unwieldy text into 12 new columns and call these new columns ‘Field1’, ‘Field2’, ‘Field3’ etc.
We also want to put any extra left over data into a spare column at the end.
Running the workflow after configuring this tool results in a clear, easily visible separation of data across the 12 new columns. The location and various values within the original long string in the first column have now be separated. We can now continue with the workflow using these new columns, and start to clean and prep the data further using other tools such as Dynamic Rename to move the Fields names out of Row 1 and into column headers, and Select to remove any columns which are deemed unnecessary.