Mastering MERGE in Snowflake: Why, When, and How

Why use MERGE

MERGE lets you insert, update, or delete rows in a target table based on matching conditions from a source. It’s ideal for:

Upserting data from staging tables
Incremental loads in ETL pipelines
Synchronizing data between systems
Handling Slowly Changing Dimensions (SCD)

It’s cleaner than writing separate UPDATE and INSERT statements—and more efficient.

When to Use MERGE

Use MERGE when:

You have new and existing records in your source data.
You need to preserve existing data while updating only changed fields.
You want to avoid duplicates and maintain data integrity.

Avoid using MERGE if:

Your logic is overly complex or hard to debug.
You don’t have a reliable primary key or unique identifier.

How to Use MERGE in Snowflake

Here’s a simple example: imagine you’re syncing a staging_customers table into a customers table.

MERGE INTO customers AS tgt
USING staging_customers AS src
ON tgt.customer_id = src.customer_id
WHEN MATCHED THEN
UPDATE SET
tgt.name = src.name,
tgt.email = src.email,
tgt.updated_at = CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN
INSERT (customer_id, name, email, created_at)
VALUES (src.customer_id, src.name, src.email, CURRENT_TIMESTAMP);

Tips for Reliable MERGE Logic

Use deduplicated source data: Apply ROW_NUMBER() or QUALIFY to ensure one row per key.
Log affected rows: Add RETURNING or wrap in a stored procedure with logging.
Validate match conditions: Ensure your ON clause uniquely identifies records.
Test with small batches: Use LIMIT to simulate before full runs.

Common Pitfall: Duplicate Rows

If your ON condition matches multiple rows, Snowflake will throw a MERGE failed due to multiple matching rows error. To fix this:

SELECT * FROM staging_customers
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY updated_at DESC) = 1;

This ensures only the latest record per customer is used.

Author:

Jacob Kilroy

View Profile