Prepping Data with Snowflake

by Irene Diomi

In this blog post, I will show how  Snowflake can be use to prep your data. The data I used is from the #PreppinData challenge week 1 of 2021. I started with Tableau Prep to connect my data to Snowflake, then to complete the challenge, I used the Split_Part function, and Snowflake syntax to have the desired output for my data. This blog post will focus on the steps I took to prep the data within Snowflake.

1) Snowflake Syntax follow this order:

SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY

Snowflake uses SQL ANSI, it is thus important to respect the syntax order to avoid errors.

My output from Tableau Prep looked like this

Tableau Prep view

The Challenge :

Split the 'Store-Bike' field into 'Store' and 'Bike'
Clean up the 'Bike' field to leave just three values in the 'Bike' field (Mountain, Gravel, Road)
Create two different cuts of the date field: 'quarter' and 'day of month'
Remove the first 10 orders as they are test values.

2) In Snowflake, we will see our Database appears in the right side of our view, first, we have to to make sure we are connect to the right Database and Schema.

3)  To prep our data, we will use SQL within Snowflake, we will start by writing

The Select query allows us to get the set of record we want from our database.

Select *

from "database_name" ( you will to insert the name of the database you are using)

This syntax will bring all the data from the Database.

However in our example, we will write our SQL slightly differently as we will omit some columns from our dataset.

To resolve our Challenge the queries looked like this :

Our first task was to Split the 'Store-Bike' field into 'Store' and 'Bike'.

To do so, we have use the Split_part function. Snowflake documentation describes the function syntax as:

SPLIT_PART(<string>, <delimiter>, <partNumber>)

string

Text to be split into parts.

delimiter

Text representing the delimiter to split by.

partNumber

Requested part of the split (1-based).

If the value is negative, the parts are counted backward from the end of the string.

For our example, I used:

,SPLIT_PART("Store - Bike", ' - ', 1) as "Store"
,SPLIT_PART("Store - Bike", ' - ' , 2) as "Bike"

Then our next task was to create two different cuts of the date field: 'quarter' and 'day of month'.

To do so, we used:

, Quarter("Date") as "Year Quarter"
, Day("Date") as "Day of the Month"

And finally we had to remove remove the first 10 orders as they are test values.

To do so, we used:

AND  "Order ID" > 10

The overall SQL looks like this:

SELECT "Customer Age"
,"Bike Value"
,"Existing Customer?" AS "Customer"
,"Order ID"
,Quarter("Date") AS "Year Quarter"
,Day("Date") AS "Day of the Month"
,SPLIT_PART("Store - Bike", ' - ', 1) As "Store"
,SPLIT_PART("Store - Bike", ' - ' , 2) AS "Bike"
,Case
WHEN LEN("Bike") = 4 Then 'Road'
When LEN("Bike") = 6 Then 'Gravel'
When LEN("Bike") = 8 Then 'Mountain'
Else "Store - Bike"
end as "Bike type"
FROM "Prep_Challenge_1"
Where "Order ID" > 10

I hope this was useful feel free to reach out if you need help with this tutorial.