Moving Mailchimp Data to S3 and Snowflake

Today I continued working on my Mailchimp data pipeline, building on the foundation from yesterday’s post How to Incrementally Extract and Deduplicate Your Mailchimp Data. After getting the campaign metadata and subscriber activity extracted locally, the next step was to push that data into a data lake like s3

Extract JSON files → Upload to S3 → Auto-ingest into Snowflake via Snowpipe.

Uploading JSON Files to S3:

Using boto3, the steps were

Create an S3 client
Upload each JSON file from the local data/ directory
Optionally set metadata and content type

Setting Up Snowflake Storage Integration:

To securely connect Snowflake to S3, I set up a Storage Integration. This avoids storing long-lived AWS credentials inside Snowflake and instead uses short-lived tokens with an IAM trust policy.

Define a Snowflake storage integration
Snowflake creates an external identity (IAM role or user)
AWS trusts that identity
Snowpipe uses this identity to read new files from S3

After creating the integration, I inspected it with:

DESC INTEGRATION <STORAGE_INTEGRATION>;

To automate ingestion use Event Notifications: S3 → SQS → Snowpipe

Go to Properties
Scroll to Event Notifications
Create Event Notifications
Target the SQS queue backing Snowpipe and provide notification_channel arn

The Problem: Snowpipe Won’t Load Duplicate File Names

Here’s where things got interesting.

Snowpipe tracks every file it has loaded.
If a file with the same name appears again, even if the content has changed, Snowpipe ignores it.

This this didn't work for my previous workflow because my extract script always produced the same filenames. So today’s updated campaign activity would never load as Snowpipe thought it had seen the file already.

The Fix: Timestamped Filenames

To keep Snowpipe happy, I updated the extraction step to rename each output file using an extract timestamp: campaign_12345_2025-11-26T14-52-00Z.json

This guarantees every run produces uniquely identifiable files.

Later, inside Snowflake, I can parse that timestamp to determine the most recent version of each campaign’s activity.

Extracting Timestamps in Snowflake

Once files land in the external stage, we can read metadata such as filename and load time. The filename contains the extract timestamp, so we can parse it using Snowflake’s string functions.

SELECT
metadata$filename AS filename,
json_data VARIANT
FROM @mailchimp_stage

Now we are able to continually load fresh data into S3 and, from our Snowflake stage, select the most recent campaign activity. With timestamped filenames and Snowpipe in place, the pipeline is ready to handle updates automatically. We are fully set up to orchestrate ongoing data flows and start combining Mailchimp activity with other sources for deeper analysis.

Author:

Otto Richardson

View Profile