Day 3 at the Data School focused on a fundamental—but often underestimated—part of data analysis: merging data from multiple sources. Whether you're matching customer data with sales records from a POS system, or combining multiple Excel sheets, understanding how to join datasets is critical.
We explored five core techniques today, each offering a different way to bring data together:
1)Union
2)Inner Join
3)Right Join
4)Left Join
5)Outer Join
LETS BREAK THEM DOWN!
1)Union – The Stack Approach
Unions are the simplest way to combine data: you stack one dataset on top of another, assuming the field names and data structures are aligned. This is especially useful when dealing with data of the same structure collected at different times—like survey results from different months.
Joins – Matching on Common Ground
Joins are a more powerful (and sometimes trickier) way to merge data. Instead of stacking, joins align records horizontally, based on matching fields like customer ID, product code, or date.
Before applying a join, two things matter:
Join condition: The field(s) you're using to match the data (e.g., Customer_ID).
Join type: This determines what stays and what gets filtered out.
2)Inner Join – The Common Ground
This is the most widely used join. It returns only the records where there's a match in both datasets. It's great for precise queries, like seeing which products were sold and also delivered.
🔹 Caution: It excludes all unmatched records, which can be risky if you're looking for a full picture.
3)Left Join – Keep Everything on the Left
A left join returns all records from the left (primary) table, along with matching data from the right table. If there’s no match, you’ll still keep the left table's record, with nulls filling in the gaps.
4)Right Join – The Opposite Perspective
This is essentially the mirror of the left join. You get all records from the right table, plus the matching ones from the left. It's useful when your secondary data is the one you want to preserve in full.
5)Outer Join – The Full Picture
An outer join gives you everything from both tables—matches and non-matches. It's the most inclusive option and ensures that no data is lost, which is perfect for high-level, exploratory analysis.
Final Thoughts
Today’s session really highlighted the strategic thinking required to choose the right type of merge for your analysis goals. As we continue to dive deeper into tools like Tableau and Alteryx, mastering these techniques will be essential in creating impactful dashboards and insights.
Looking forward to more data adventures ahead!