Regex in Snowflake: escaping character \

by Ali Agah

This week at The Data School, we've been learning SQL in Snowflake. As someone with a basic understanding of SQL, I was cautiously confident going into the first Friday project but alas SQL in Snowflake has a few quirks.

Regex in Snowflake

A particularly tricky quirk is the way Regex is used. Yes you can use some Regex functions, namely:

1) REGEXP : Returns true if the subject matches the specified pattern.

2) REGEXP_COUNT : Returns the number of times that a pattern occurs in a string.

3) REGEXP_INSTR : Returns the position of the specified occurrence of the regular expression pattern in the string subject.

4) REGEXP_LIKE : Returns true if the subject matches the pattern. The syntax is different to REGEXP & you can add additional parameters.

5) REGEXP_REPLACE : Allows you to replace a subject(s) with the specified       pattern with a replacement string.

6) REGEXP_SUBSTR : Returns the substring that matches a regular expression within a string.

Ideally I would have written this blog on these functions but I'm not quite there with my understanding of them. For more on using Regex on Snowflake visit here.

Escaping character: \

An interesting quirk I did get to understand (thanks to Chris Meardon & Peter Silvester) is that backslash characters (\) in meta characters need to be escaped using another backslash (\).

In Regex, meta-characters (i.e. characters that have a special meaning) are "escaped" using a backslash. This means that Regex will treat that character as its literal version rather than a meta-character (see fig.1 below).

fig.1: Using \ changes the wildcard (.) to a literal character (.)

On first glance this is the same in Snowflake as with Alteryx or other tools I've used.

However, unlike other Regex tools I've used, Snowflake requires escaping backslash sequences (e.g. \d) for them to work. These are really common meta characters so I'd guess most people feel the frustration early on.

fig.2: LEFT - backslash sequences in Regex101.com RIGHT - backslash sequences in Snowflake

Why is this happening?

The reason  is that Snowflake processes the string given (i.e. the Regex pattern) then passes the string to be processed again by the Regex function. This means that Snowflake processes the first backslash before the Regex function has a chance to view it.

So you need to write another backslash (i.e. \\d instead of \d)  for the Regex function to process it as intended.