Regex is incredibly versatile and can be employed to solve various data cleansing issues. This week, I challenged myself to use as much Regex as I could to complete Alteryx Weekly Challenge #35 . Also included are some examples of using string functions in the Formula tool. This challenge is made up of separate ‘mini tasks’ where each task starts off with an input of a single field which must be cleaned.
Task 1: Remove leading zeros.
- Tool: RegEx Tool
- Configuration Pane: In Field1, Replace any pattern matching (^0*) with nothing.
- Regex Explained: ^ at the start of the string if there is 0* any number of zeros then replace it.
Task 2: trim leading zeros AND any text at the end of a numerical code.
- Tool: Regex tool
- I used two Regex tools here, one to remove the leading zeros, and another connected to the first’s output to remove the excess text. The first Regex tool is configured as above, the second looks like this:
- Configuration: In Field1, Replace any pattern matching (\D?) with nothing
- Regex explained: \D for any character that is NOT a digit, replace it, ? non-digit characters at the end of the string are optional, if they are not present, nothing needs to be replaced.
Task 3: Remove ‘ID’ only if it appears at the end of the string.
- Tool: Regex Tool
- Configuration: In ID, Replace any pattern matching ([ID]*$) with nothing
- Regex explained: any character in the capture group [ID], * which can appear any number of times, $ only where they appear at the end of the string, should be replaced.
- Be careful here, the [ID] syntax looks like how we would reference a field name in the formula tool, in Regex, the square brackets signify a group of characters to match.
- Alternative Solution: Formula Tool
- Formula explained: if the string ends with ‘ID’ then remove it, otherwise keep the ID as it is.
Task 4: if the code is more that 8 characters long, trim it to 8 characters, if it is only 6 characters long, add the ‘SC’ prefix.
- Regex is not the only way to alter strings in Alteryx and there are plenty of string functions which would achieve this result, so here I used the Function tool:
- Formula explained: if the string is less than 8 characters long, then append ‘SC’ onto the beginning of it. Otherwise, if the string is more than 8 characters long, then remove ‘: Fire Warning’ from the end of the string. Otherwise, return the string as it is.
The beauty of Alteryx is that a result can be achieved several different ways using various tools. This is only one solution to the challenge and if you wish to have a go yourself, the starter file can be found here: https://community.alteryx.com/t5/Weekly-Challenge/Challenge-35-Data-Cleansing-Practice/td-p/36438