One Dataset to Rule Them All: Techniques for Combining Multiple Sources

Day 3 at the Data School focused on a fundamental—but often underestimated—part of data analysis: merging data from multiple sources. Whether you're matching customer data with sales records from a POS system, or combining multiple Excel sheets, understanding how to join datasets is critical.

We explored five core techniques today, each offering a different way to bring data together:

1)Union

2)Inner Join

3)Right Join

4)Left Join

5)Outer Join

LETS BREAK THEM DOWN!

1)Union – The Stack Approach

Unions are the simplest way to combine data: you stack one dataset on top of another, assuming the field names and data structures are aligned. This is especially useful when dealing with data of the same structure collected at different times—like survey results from different months.

Joins – Matching on Common Ground

Joins are a more powerful (and sometimes trickier) way to merge data. Instead of stacking, joins align records horizontally, based on matching fields like customer ID, product code, or date.

Before applying a join, two things matter:

Join condition: The field(s) you're using to match the data (e.g., Customer_ID).

Join type: This determines what stays and what gets filtered out.

2)Inner Join – The Common Ground

This is the most widely used join. It returns only the records where there's a match in both datasets. It's great for precise queries, like seeing which products were sold and also delivered.

🔹 Caution: It excludes all unmatched records, which can be risky if you're looking for a full picture.

3)Left Join – Keep Everything on the Left

A left join returns all records from the left (primary) table, along with matching data from the right table. If there’s no match, you’ll still keep the left table's record, with nulls filling in the gaps.

4)Right Join – The Opposite Perspective

This is essentially the mirror of the left join. You get all records from the right table, plus the matching ones from the left. It's useful when your secondary data is the one you want to preserve in full.

5)Outer Join – The Full Picture

An outer join gives you everything from both tables—matches and non-matches. It's the most inclusive option and ensures that no data is lost, which is perfect for high-level, exploratory analysis.

Final Thoughts

Today’s session really highlighted the strategic thinking required to choose the right type of merge for your analysis goals. As we continue to dive deeper into tools like Tableau and Alteryx, mastering these techniques will be essential in creating impactful dashboards and insights.

Looking forward to more data adventures ahead!

Author:
Zainul Abedin Natha
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab