Using External Stages to load Data from S3 Buckets to Snowflake

by Frederik Egervari

In Snowflake an external stage allows you to pull data from an existing S3 Bucket into your snowflake instance. Let’s go through how this can be done and why and how an external stage is used.

What is a stage?

Stages are there to place files temporarily before moving somewhere different (a short-term temporary location). Stages can be used whenever data needs to be transferred from one system to another system, for temporary storage.

This means that we can request big data packages in one single process and write it from our source system to the staging file in one process. Then in a second process, the data is written to the target system. This is advantageous compared to writing it directly from the source to the target system because we have fewer, but bigger writing and reading processes, with which classic databases perform better.

When loading data from an s3 bucket, a stage stores the file from within the bucket, before we move it to our database (Fig.1).

Figure 1: Process of pulling Data from an S3 Bucket into a Snowflake Table. 

How to use a stage in snowflake?

We also need to specify a file format for our stage, to explain how data should be stored.

So let’s go through this in snowflake:

1.      At first, we specify the file format:

2.      Followed by the creation of a staging file:

3.      The next step is to create a table in the snowflake instance:

4.      And lastly we have to move the data in the staging file to the table:

And then you have all the data from your S3 Bucket, saved in a snowflake table to work with.

 

If you want to do it yourself, check out this example in a FrostyFriday Challenge. For the full solution of the Challenge, check out this blog post.

Sun 10 Sep 2023

Sat 24 Dec 2022

4 mins read

Wed 31 Aug 2022