Why use MERGE
MERGE
lets you insert, update, or delete rows in a target table based on matching conditions from a source. It’s ideal for:
- Upserting data from staging tables
- Incremental loads in ETL pipelines
- Synchronizing data between systems
- Handling Slowly Changing Dimensions (SCD)
It’s cleaner than writing separate UPDATE
and INSERT
statements—and more efficient.
When to Use MERGE
Use MERGE
when:
- You have new and existing records in your source data.
- You need to preserve existing data while updating only changed fields.
- You want to avoid duplicates and maintain data integrity.
Avoid using MERGE
if:
- Your logic is overly complex or hard to debug.
- You don’t have a reliable primary key or unique identifier.
How to Use MERGE in Snowflake
Here’s a simple example: imagine you’re syncing a staging_customers
table into a customers
table.
MERGE INTO customers AS tgt
USING staging_customers AS src
ON tgt.customer_id = src.customer_id
WHEN MATCHED THEN
UPDATE SET
tgt.name = src.name,
tgt.email = src.email,
tgt.updated_at = CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN
INSERT (customer_id, name, email, created_at)
VALUES (src.customer_id, src.name, src.email, CURRENT_TIMESTAMP);
Tips for Reliable MERGE Logic
- Use deduplicated source data: Apply
ROW_NUMBER()
orQUALIFY
to ensure one row per key. - Log affected rows: Add
RETURNING
or wrap in a stored procedure with logging. - Validate match conditions: Ensure your
ON
clause uniquely identifies records. - Test with small batches: Use
LIMIT
to simulate before full runs.
Common Pitfall: Duplicate Rows
If your ON
condition matches multiple rows, Snowflake will throw a MERGE failed due to multiple matching rows
error. To fix this:
SELECT * FROM staging_customers
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY updated_at DESC) = 1;
This ensures only the latest record per customer is used.