Yesterday we had a full day of training covering SQL in Snowflake. Safe to say that the team smashed it. Today we've been set a project to take on a Preppin' Data challenge of our choice using only SQL within Snowflake. I chose to tackle 2021 Week 2. This blog post covers: Inputting Tables from Prep, REGEXP_REPLACE, COLLATE(), and Exporting as CSV/TSV.
Below is a screenshot of my queries for the solution:
Here are some of the techniques I used and some oddities:
Inputting the data:
Inputting the table from Tableau Prep was very easy. All you have you do is to install a driver, load up the file and output it straight into Snowflake.
The first thing I messed up was inputting the data. I forgot to put 2021 into the name of the table. Oops.
Here's how to rename a table in Snowflake:
ALTER TABLE "Original table name" RENAME TO "New table name"
Using this format I ran the query
ALTER TABLE "Henry Preppin' Data Week 2 INPUT" RENAME TO "Henry_PD_2021w02_INPUT"
and there we go, nice and easy.
Extracting the Brand from the Model:
For this part of the challenge I had to extract the brand i.e. only the letters within the Model. The model had a mixture of alphanumeric characters including a "/" e.g. "GIA31292/003". To solve this part I had to use REGEXP_REPLACE.
SELECT REGEXP_REPLACE(“Model”, '[^a-zA-Z]', '') FROM "Henry_PD_2021w02_INPUT"
Wait what? This query doesn't work?!
I kept getting this error:
SQL compilation error: error line 1 at position 7 Function REGEXP_REPLACE does not support collation.
After a bit of time trying to work it out, I went to Google and found a solution. I had to use the COLLATE() function.
SELECT REGEXP_REPLACE(COLLATE("Model", ''), '[^a-zA-Z]', '') FROM "Henry_PD_2021w02_INPUT"
But... I wasn't satisfied. Why did this fix the problem? What was the problem in the first place? Sadly Google couldn't help me here, reading all the documentation/forums was like reading a foreign language. So I went to Convo (TIL's communication platform).
And help arrived:
Ahh... so that's why.
Exporting the data:
Exporting the data into csv (comma separated values) and tsv (tab separated values) files is actually super easy (from what I've experienced). I thought it'd be harder and that I'd have to run some sort of special query to direct the outputted file from the data warehouse into my local system. Nope.
All you need yo do is to press this button in the Results pane: