RegEx or Regular Expressions is a sequence of characters that define a search pattern. The example I will show you uses Alteryx, a software that makes advanced analytics accessible to any data user.
Why use RegEx?
RegEx allows the user to extract specific parts of a field. In the following example, the email is the part of the field of interest. I have highlighted one of the emails in the example below.
After inputting the data, click, drag and drop the RegEx tool into the view.
A great website to practice using RegEx is https://regex101.com/ which allows the user to see what symbols cause each particular pattern to be pulled out. Copying and pasting the above data into RegEx101, I will explain what each symbol equates to.
( ) = marked group
[a-z] = any lower case letter
+ = one or more
\ = interpret next character
\. = interpret the ‘.’ as a specific character
. = any single character. ALERT – the dot without a ‘\’ means something different
? = proceeding token optional
\w = alphanumeric character
* = zero or more
The ‘@’ is a specific character in this case and therefore has no function.
With these definitions in mind. This expression states that, within the marked group ‘( )’, one or more ‘+’ lower case letter ‘[a-z]’, followed by an optional ‘?’ full stop ‘\.’, before an optional ‘?’ one or more ‘+’ alphanumeric character ‘\w’ before the @ will give this…
With this in mind, the rest of the email can be pulled out using similar steps.
After the ‘@’, zero or more ‘*’ of any single character ‘.’, before a full stop ‘\.’, followed by one or more ‘+’ alphanumeric character ‘\w’ before another optional ‘?’ full stop ‘\.’.
Finally closing the bracket to complete the regular expression. Hopefully the title makes a little bit more sense now? ‘? means optional’ 🙂 that’s a smiley face. No expression there!
This real case example is extremely useful for quickly pulling the essential contact information out of a field that has different email styles.