RegEx or Regular Expressions is a sequence of characters that define a search pattern. The example I will show you uses Alteryx, a software that makes advanced analytics accessible to any data user.

Why use RegEx?

RegEx allows the user to extract specific parts of a field. In the following example, the email is the part of the field of interest. I have highlighted one of the emails in the example below.

Note how all the emails are not in the same format.

After inputting the data, click, drag and drop the RegEx tool into the view.

A great website to practice using RegEx is https://regex101.com/ which allows the user to see what symbols cause each particular pattern to be pulled out. Copying and pasting the above data into RegEx101, I will explain what each symbol equates to.

This expression pulls out all emails in the field

Definitions:

( ) = marked group

[a-z] = any lower case letter

+ = one or more

\ = interpret next character

\. = interpret the ‘.’ as a specific character

. = any single character. ALERT – the dot without a ‘\’ means something different

? = proceeding token optional

\w = alphanumeric character

* = zero or more

The ‘@’ is a specific character in this case and therefore has no function.

With these definitions in mind. This expression states that, within the marked group ‘( )’, one or more ‘+’ lower case letter ‘[a-z]’, followed by an optional ‘?’ full stop ‘\.’, before an optional ‘?’ one or more ‘+’ alphanumeric character ‘\w’ before the @ will give this…

Half way there!

With this in mind, the rest of the email can be pulled out using similar steps.

After the ‘@’, zero or more ‘*’ of any single character ‘.’, before a full stop ‘\.’, followed by one or more ‘+’ alphanumeric character ‘\w’ before another optional ‘?’ full stop ‘\.’.

The other half!

Finally closing the bracket to complete the regular expression. Hopefully the title makes a little bit more sense now? ‘? means optional’ 🙂 that’s a smiley face. No expression there!

The finished product

This real case example is extremely useful for quickly pulling the essential contact information out of a field that has different email styles.