Today I continued working on my Mailchimp data pipeline, building on the foundation from yesterday’s post How to Incrementally Extract and Deduplicate Your Mailchimp Data. After getting the campaign metadata and subscriber activity extracted locally, the next step was to push that data into a data lake like s3
Extract JSON files → Upload to S3 → Auto-ingest into Snowflake via Snowpipe.
Uploading JSON Files to S3:
Using boto3, the steps were
- Create an S3 client
- Upload each JSON file from the local
data/directory - Optionally set metadata and content type
Setting Up Snowflake Storage Integration:
To securely connect Snowflake to S3, I set up a Storage Integration. This avoids storing long-lived AWS credentials inside Snowflake and instead uses short-lived tokens with an IAM trust policy.
- Define a Snowflake storage integration
- Snowflake creates an external identity (IAM role or user)
- AWS trusts that identity
- Snowpipe uses this identity to read new files from S3
After creating the integration, I inspected it with:
DESC INTEGRATION <STORAGE_INTEGRATION>;
To automate ingestion use Event Notifications: S3 → SQS → Snowpipe
- Go to Properties
- Scroll to Event Notifications
- Create Event Notifications
- Target the SQS queue backing Snowpipe and provide notification_channel arn
The Problem: Snowpipe Won’t Load Duplicate File Names
Here’s where things got interesting.
Snowpipe tracks every file it has loaded.
If a file with the same name appears again, even if the content has changed, Snowpipe ignores it.
This this didn't work for my previous workflow because my extract script always produced the same filenames. So today’s updated campaign activity would never load as Snowpipe thought it had seen the file already.
The Fix: Timestamped Filenames
To keep Snowpipe happy, I updated the extraction step to rename each output file using an extract timestamp: campaign_12345_2025-11-26T14-52-00Z.json
This guarantees every run produces uniquely identifiable files.
Later, inside Snowflake, I can parse that timestamp to determine the most recent version of each campaign’s activity.
Extracting Timestamps in Snowflake
Once files land in the external stage, we can read metadata such as filename and load time. The filename contains the extract timestamp, so we can parse it using Snowflake’s string functions.
SELECT
metadata$filename AS filename,
json_data VARIANT
FROM @mailchimp_stage
Now we are able to continually load fresh data into S3 and, from our Snowflake stage, select the most recent campaign activity. With timestamped filenames and Snowpipe in place, the pipeline is ready to handle updates automatically. We are fully set up to orchestrate ongoing data flows and start combining Mailchimp activity with other sources for deeper analysis.
