Regex vs Text to Columns

by Hannah Murphy

Both tools can be used to split a string field into separate, multiple columns or rows with the use of specified delimiters.

Using Text to Columns

  1. Select a column to split
  2. Delimiters: Enter the delimiters to use to split the data. Each character is treated independently, meaning you cannot delimit on a word.
  3. Split to columns: Split a single column of data at each instance of the specified delimiter into multiple columns. Output root name: Enter the name for the newly generated columns. The new columns are named as the root name with a serially increasing integer appended.
  4. Split to rows: split a single column of data at each instance of the specified delimiter into multiple rows. Use a key column in your record so you don't lose track of which value came from which row.
  5. In Advanced Options, select delimiters to ignore:
  • Ignore Delimiters in Quotes
  • Ignore Delimiters in Single Quotes
  • Ignore Delimiters in Parentheses
  • Ignore Delimiters in Brackets
  • Skip Empty Columns

However, the Text to Columns tool is limited as it can only split on a single delimiter at a time. If you want to split on a word for example, then the Regex tool is needed...

Using Regex

There are multiple functions within the Regex tool, to achieve the same effect as the Text to Columns tool, you will need to use Tokenise:

Tokenize: Split the incoming data using a regular expression. This option works similarly to the Text to Columns tool, except instead of matching and removing what you do not want, you match for what you want to keep. You want to match to the whole token, and if you have a marked group, only that part is returned.

In this example, I am creating a new row based on Title, First name and Surname

To configure the Regex tool, you will need to enter a regular expression. This will act as your delimiter. You already use some of these tokens in Text to Columns to split on white space:

Websites such as regex101.com allow you to test your regular expression before entering it into Alteryx. This also has a reference pane which lists common tokens and provides guidance in writing your expression:

Once you have your regular expression, you can configure the Regex tool the same as you would the text to Columns and run the workflow.

Here I have told Alteryx that I want a new column for every word that starts with an uppercase letter

Your string should now be split into multiple columns.

And so, Text to Columns is a great and simple way to split your data. However, it has limitations and it is unlikely that all your data will be in the right format to split as you need.

The Regex tool, whilst intimidating at first, is very flexible and allows for data to be split on multiple variations of custom delimiters.

Remember: Text to Columns will treat each delimiter on an individual basis!

Avatar

Hannah Murphy

Mon 04 Oct 2021

Thu 30 Sep 2021