For a long time I didn’t understand how joins work, and had to wing it when combining data sets. That time is no more and I am writing this blog to share my new knowledge and confirm to myself that I actually do understand data joins. A data join is when two data sets are combined in a side by side manner, therefore at least one column in each data set must be the same. This differs from a union which puts data sets on top of each other, requiring all of the columns to be the same (or have null values in a column which isn’t common across data sets). There are 6 types of join, inner, left inner, left outer, right inner, right outer and outer.
An inner join takes only the rows of data in which both data sets have common fields. Therefore any rows which have no commonality between data sets in the join column will be excluded. In the example below the join takes place on the food column, therefore the curry and chips rows will be excluded from the joined data set as they are not common between the two data sets.
Left Inner Joins
A left inner join takes all the rows that are common between two data sets along with the remaining rows from the data set being joined from the left hand side. For the rows joined from the left which do not have commonality with rows from the right the values for the right hand columns ill be NULL. In the example below Burger and Hot Dog are joined due to commonality and curry is kept from the left filling in the blanks with NULL.
Right Inner Joins
A right inner join behaves the same as a left inner join but keeps rows from the data set being joined from the right as opposed to the left. In the example below Chips is kept with the left hand column values being filled with NULL.
Left Outer Joins
A left outer join excludes rows which are common between data sets and any extra rows from the right data set. This leaves only rows from the left data set filling in the columns from the right data set columns with NULLs. In the example below only Curry will be kept.
Right Outer Joins
A right outer join is the same as a left outer join but keeps only rows from the right data set instead of the left. Therefore in the example below only Chips will be kept with the left data set columns being filled with NULLs.
An outer join will keep all data from both data sets. Rows which are common across data sets will have columns filled from both data sets, whereas rows without commonality will fill the blanks in with NULLs. In the example below Burger and Hot Dog rows will have all the fields filled, but Curry and Chips will be included as well but with NULLs to replace incomplete data.
*When doing the joins the column being joined on will duplicate, this hasn’t been shown in the examples for simplicity.