Week 10: Snowflake & SQL Challenges | DS23

by Auguste Navickyte

Last week we started learning about Snowflake which is a "data platform built for the cloud for all your data and all your users". Snowflake goes hand in hand with SQL, and therefore, our first training session was based on writing SQL commands on the platform. We were then challenged with what we learnt and more. This post is going to be all about the two challenges which you can find here and here.

Uploading data to Snowflake using Tableau Prep

To get your data onto Snowflake, Tableau Prep becomes handy. All you need to do is upload your data on Tableau Prep and create and Output from it. The configuration of the output is as follows:

  1. Save output to Database table (as opposed to File or Published data source)
  2. Select your connection to Snowflake
  3. A window will pop up, and you will need to fill in your Snowflake details (server, authentication, etc.)
  4. Once you are connected, you will have to choose the warehouse, database, schema, table name and refresh options of your choice
  5. Run the flow - your table should now be loaded on Snowflake

To begin the work on Snowflake, you first need to request your data table. The simplest way to do it is to request everything (*) from the table (AN_WK1) as shown below.


select *
from "AN_WK1"

This will be your frame which you can build the rest of the work around - the * can be changed into specific (perhaps edited) fields of your choice.

Challenge 1

Splitting fields

To split a field into two, I used SPLIT_PART function (make sure you have your SELECT function before this):


 split_part("Store - Bike",'-',1) as "Store Location"
, split_part("Store - Bike",'-',2) as "Bike Type"

This resulted into two extra columns.

Cleaning up the a field to leave just three values in it

This one was tough! I took two steps to tackle this:

  1. /Looking for misspelled words in bike types/
    select
    split_part("Store - Bike",'-',2) as "Bike Type"
    , count("Bike Type")
    from "AN_WK1"
    group by "Bike Type"
    /Results that I got: Mountaen, Graval, Rowd, Gravle, Rood/

    This step was done on the side (or outside) of my main code just to note down the mistakes that needed to be corrected. In the next step, I got back to my main code.

  2. /Replace these misspelled words/ (again, this should follow the SELECT function):
    , replace(replace(replace(replace(replace("Bike Type",'Mountaen','Mountain'),'Graval','Gravel'),'Rowd','Road'),'Gravle','Gravel'),'Rood','Road') as "Bike Type"

Creating two different cuts of the date field: 'quarter' and 'day of month'

For this one, I used a couple of date functions (after the SELECT function):

  year("Date") as "Year"
, day("Date") as "Day of Month"

Removing the first 10 records as they are test values

A simple WHERE function following FROM "Your Data Source Name" is needed here:

where "Order ID">10

Final code for the challenge 1

/Start!/
select "Customer Age"
, "Order ID"
/Split the 'Store-Bike' field into 'Store' and 'Bike'/
, split_part("Store - Bike",'-',1) as "Store Location"
, split_part("Store - Bike",'-',2) as "Bike Type"
/Clean up the 'Bike' field to leave just three values in the 'Bike' field (Mountain, Gravel, Road)/
, replace(replace(replace(replace(replace("Bike Type",'Mountaen','Mountain'),'Graval','Gravel'),'Rowd','Road'),'Gravle','Gravel'),'Rood','Road') as "Bike Type"
, "Existing Customer?"
, "Bike Value"
, "Date"
/Create two different cuts of the date field: 'quarter' and 'day of month'/
, year("Date") as "Year"
, day("Date") as "Day of Month"
from "AN_WK1"
/Remove the first 10 orders as they are test values/
where "Order ID">10
/Finish!/

/Looking for misspelled words in bike types/
select
split_part("Store - Bike",'-',2) as "Bike Type"
, count("Bike Type")
from "AN_WK1"
group by "Bike Type"
/Results: Mountaen, Graval, Rowd, Gravle, Rood/

Challenge 2, table 1

Cleaning up a field to leave only the letters only

For this part, I needed to delete any symbol that is not a letter from a field "Model". I used a REGEXP_REPLACE function, and had to COLLATE the field - collate("Model",'') - as Snowflake was struggling otherwise. This code comes after SELECT.

regexp_replace(collate("Model",''), '[^a-zA-Z]', '') as "Brand"

Working out the Order Value using Value per Bike and Quantity

After SELECT, we have a calculation:

, "Value per Bike"*"Quantity" as "Order Value"

Aggregating Value per Bike, Order Value and Quantity by Brand and Bike Type to form: Quantity Sold, Order Value and Average Value Sold per Brand, Type

Note that the first line is taken from above and a SUM is added to it.

, sum("Value per Bike"*"Quantity") as "Order Value"

, sum("Quantity") as "Quantity Sold"

, avg("Value per Bike") as "Average Value Sold per Brand, Type"

from "AN_WK2"

group by "Brand"

, "Bike Type"

Final code for the challenge 2, table 1

select
/Clean up the Model field to leave only the letters to represent the Brand of the bike/
regexp_replace(collate("Model",''), '[^a-zA-Z]', '') as "Brand"
, "Bike Type"
/Workout the Order Value using Value per Bike and Quantity/
, sum("Value per Bike"*"Quantity") as "Order Value"
/*Aggregate Value per Bike, Order Value and Quantity by Brand and Bike Type to form:
- Quantity Sold
- Order Value
- Average Value Sold per Brand, Type*/
, sum("Quantity") as "Quantity Sold"
/* Order Value already aggregated*/
/Round any averaged values to one decimal place/
, round((avg("Value per Bike")),1) as "Average Value Sold per Brand, Type"
from "AN_WK2"
group by "Brand"
, "Bike Type"

Challenge 2, table 2

Calculating Days to ship

A simple DATEDIFF function that comes after SELECT was required here:

, datediff('day',"Order Date","Shipping Date") as "Average Days to Ship"

Aggregating Order Value, Quantity and Days to Ship by Brand and Store to form: Total Quantity Sold, Total Order Value and Average Days to Ship

Similar to the first table's calculations:

, sum("Value per Bike""Quantity") as "Order Value"
, sum("Quantity") as "Quantity Sold"
, avg(datediff('day',"Order Date","Shipping Date")) as "Average Days to Ship"
from "AN_WK2"
group by "Brand"
, "Store"

Rounding any averaged values

Let's round our Average Days to Ship:

, round(avg(datediff('day',"Order Date","Shipping Date")),1) as "Average Days to Ship"

Final code for the challenge 2, table 2

select
regexp_replace(collate("Model",''), '[^a-zA-Z]', '') as "Brand"
, "Store"
/*Aggregate Order Value, Quantity and Days to Ship by Brand and Store to form:
- Total Quantity Sold
- Total Order Value
- Average Days to Ship*/
, sum("Value per Bike"*"Quantity") as "Total Order Value"
, sum("Quantity") as "Total Quantity Sold"
/Round any averaged values to one decimal place/
/Calculate Days to ship by measuring the difference between when an order was placed and when it was shipped, aggreagete/
, round(avg(datediff('day',"Order Date","Shipping Date")),1) as "Average Days to Ship"
from "AN_WK2"
group by "Brand"
, "Store"

And that's all for the two challenges!