This week at The Data School, we've been learning SQL in Snowflake. As someone with a basic understanding of SQL, I was cautiously confident going into the first Friday project but alas SQL in Snowflake has a few quirks.
Regex in Snowflake
A particularly tricky quirk is the way Regex is used. Yes you can use some Regex functions, namely:
1) REGEXP : Returns true if the subject matches the specified pattern.
2) REGEXP_COUNT : Returns the number of times that a pattern occurs in a string.
3) REGEXP_INSTR : Returns the position of the specified occurrence of the regular expression pattern in the string subject.
4) REGEXP_LIKE : Returns true if the subject matches the pattern. The syntax is different to REGEXP & you can add additional parameters.
5) REGEXP_REPLACE : Allows you to replace a subject(s) with the specified pattern with a replacement string.
6) REGEXP_SUBSTR : Returns the substring that matches a regular expression within a string.
Ideally I would have written this blog on these functions but I'm not quite there with my understanding of them. For more on using Regex on Snowflake visit here.
Escaping character: \
In Regex, meta-characters (i.e. characters that have a special meaning) are "escaped" using a backslash. This means that Regex will treat that character as its literal version rather than a meta-character (see fig.1 below).
On first glance this is the same in Snowflake as with Alteryx or other tools I've used.
However, unlike other Regex tools I've used, Snowflake requires escaping backslash sequences (e.g. \d) for them to work. These are really common meta characters so I'd guess most people feel the frustration early on.
Why is this happening?
The reason is that Snowflake processes the string given (i.e. the Regex pattern) then passes the string to be processed again by the Regex function. This means that Snowflake processes the first backslash before the Regex function has a chance to view it.
So you need to write another backslash (i.e. \\d instead of \d) for the Regex function to process it as intended.