A quick way to extract month names or abbreviations using a single Regular Expression (RegEx) string.
Regular Expressions are used to identify patterns in text strings in order to extract, replace, delete and/or match these strings. Alteryx has a build-in RegEx tool, which can be found in the Parse Tab.
Below I will show you the RegEx syntax to match all the months (full word) and their most commonly used abbreviations, with just 1 line. If you are new to RegEx and are keen to explore more, I would recommend reading the blog post of Naledi Hollbruegge, which briefly outlines RegEx and has links for further reading as well.
I ran into a few data sets where months were embedded in the data, but they were either scrambled throughout the field or even inside a text string. Furthermore, people sometimes used the full word, such as, September or decide to use abbreviations as, Sept or Sep.
1. RegEx to separate Months into their own Fields
I created an example worksheet below containing all months and their most commonly used abbreviations embedded into one string also containing some waste numbers and text.
The following Regex syntax was build to recognise those particular months and abbreviations:
By following the steps in the image below and copying in this syntax, it will pull out the months,
2. Preparing your own RegEx Syntax
A list of standard syntax is provided by Alteryx to get started, by clicking on the drop down menu inside the configuration menu of the RegEx tool.
However, these are the basics and there are many websites out there to help create and test your syntax. If you decide to use them, make sure to use Perl syntax.
I find one website particularly useful to start out with as it allows you to type in a piece of string and gives you some suggestions: text2re.com
For instance, if you type in January, click show matches, it suggests the syntax for the word January or month. You can then copy over the Perl syntax, edit it if needed and use it in Alteryx.