When it comes to prepping data for analysis, one of the most important tools to have in your wheelhouse is being able to combine different data sources.
Union
The first method of combining two data sources is a union. This is when you want to stack two similar tables one on top of the other. This can only be done if both tables share the same columns and data types. Examples of when this could be useful include data sets which measure the same things, but are captured at different times or for different areas.
Joins
If you think about unions as a vertical way to combine two tables, joins can be thought of as horizontal. Here, instead of stacking one table on top of another, you are combining rows based on a common field (or fields).
When performing a join there are two things you must consider:
- The join condition/clause: the field(s) that you are matching on (ID columns are usually useful for this).
- The join type.
What do I mean by join type?
Depending on what you want the output of the join to look like, you have some different options. Let’s take a closer look:
Inner join
Usually the default option. This will return rows only where the join condition is met.
Since you are only bringing back rows which satisfy the join clause, you would not expect this to produce any null values.

Left join and right join
These are essentially the same, it’s just a matter of perspective. To make this a bit clearer, let’s consider two tables, table A and table B:
Scenario 1: Left join table B onto table A.
This would return all the rows in table A, but would only return rows in B where the join condition has been met.
Scenario 2: Right join table B onto A.
This would return all of the rows in B and only the matching ones in A.
In either case, any rows which don’t meet the join condition, but are included in the output data set, will have null values for the fields from their ‘non-native’ table.


Full Join
Finally we have our outer joins. Here, all records will be returned, regardless of meeting the join condition or not. It’s worth noting that the use cases for this option are much more limited than the others.
Any rows which are brought back and don’t meet the join clause will contain null values in their non-native columns.

Hopefully this has been a useful overview of your options when combining different data sources!