Combining Data in Tableau and Alteryx

At the end of this first week, we’ve encountered a lot of new information. I thought I would use this post as an opportunity to set some of it straight in my head, and in doing so look at differences between Tableau and Alteryx. I’ll focus on how the two tools combine data sources.

Tableau

Since I’m more familiar with Tableau, I thought I’d start there and look at how it joins two data sources.

I’ve made an excel file looking at a fictional computer hardware company. The first spreadsheet tells us what products a customer bought with their order. The second tells us the payment status for each order.

           

The two sheets share an Order ID column, so if we drag them into the data connection canvas then Tableau automatically creates an Inner Join using the Order ID column. What this means is that Tableau looks at both sheets and gives us all the rows which have the same Order ID and puts them next to each other. 

Types of Join

If we click on the venn diagram connecting the sheets, we can see what type of join Tableau is using, and what it’s using to connect the sheets. Tableau has 4 types of join to choose from: Inner, Left, Right and Full Outer. I’ve said what an Inner Join does, so let’s look at the other three.

A Left Join takes all the data from the first sheet, as well as the rows from the second sheet which match, and puts them next to each other. It ignores anything else in the second sheet.

A Right Join takes all the data from the second sheet, as well as the rows from the first sheet which match, and puts them next to each other. It ignores anything else in the first sheet.

A Full Outer Join takes all the data from both sheets and puts them next to each other.

In our case, using an Inner or Left Join will result in us losing data – since the first sheet’s Order ID runs 1-4 and the second’s runs 1-6. That’s something to bear in mind, but in this case, I think it’s okay to ignore the extra Order IDs.

Duplicate Values

It’s also worth noting that when joining sheets with duplicate values, Tableau will duplicate those values in the other sheet as part of the join. So in our case, some Order IDs contained multiple products (Order 1 was for a mouse, a screen and a keyboard). However, the second sheet only had one payment status for each Order ID. As we can see, Tableau added duplicate rows to match up the second sheet to the first in its join.

Joining on Multiple Fields

I’ve made another excel file which looks at a hardware company which has two stores ‘North’ and ‘South’. Each one of these stores has separate Order IDs. Combining both sheets tells us: which store the information is for, the Order ID, what products were contained in that order, and also the name of the customer.

 

 

 

 

However, if we do an Inner Join on them using Order ID to connect them, we run into a problem.

As you can see, for each sheet 1 Order ID, Tableau has given both the North and South Order ID from sheet 2. This means that half our newly joined table is wrong. Data from the South store should not be being matched with data from the North store.

To get around this problem, we need to tell Tableau to make its join on multiple fields. In this case, we want Tableau to look at both the Order ID, and the store location, and then pull from sheet 2 all the rows which match. Luckily this is really simple to do. You click on the connection venn diagram, click on “add a new join clause” and select the new field you want Tableau to use to make its join.

If we do that, then we see that the new join table no longer has incorrectly matched rows.

Joining on multiple fields is something that can be overlooked, so it’s always a good idea to have a little think about exactly how you want your data combined.

Unions

Thus far I’ve only spoken about joining data – taking matching rows and putting them next to each other. There is another type of data combination which you will probably want to do too though: making a Union.

If we make a Union of two sheets, then instead of taking matching rows and putting them next to each other, we’re taking matching columns and putting them on top of each other.

To do this in Tableau, start with a sheet in the connection canvas, then simply drag your other sheet(s) just below the first. Tableau will prompt you to ‘drag table to union’, drop your sheet and you’re done!

Now you have a new table with all the columns of both sheets, any data that shares a column name is put underneath each other.

So that’s how to Join and Union data in Tableau. Next I’ll show how to do it in Alteryx.

Alteryx

Using the same sheets as before, to create a join in Alteryx, first you have to input your data sources. Having set up a data input for both worksheets, we use the Join Tool to create the join. Once we’ve connected our data sources to the Join Tool we are given a number of options.

We can ‘Join by Record Position’ – whereby Alteryx uses the order of the data to make its join. So cell 1 in the first sheet will be matched with cell 1 in the second sheet and so on. If there are more rows of data in one sheet than the other, then Alteryx will not place them in the join, but keep them in their corresponding Left or Right output.

We can also ‘Join by Specific Fields’ which allows us to tell Alteryx by which rows we are interested in making the join. In this case I made the join using the Order ID columns of both sheets.

Once again, there are situations where we need to Join on Multiple fields, and Alteryx has a pretty similar set up for this as Tableau. In the Join by Specific Fields option, you can add or delete additional joins as I have done below.

Types of Join

Like Tableau, Alteryx offers us a selection of joins from which to choose. However, this selection is different from Tableau’s. When Tableau offers Inner, Left, Right and Full Outer joins; Alteryx gives us Left, Inner and Right.

Importantly, Alteryx offers exclusive Left and Right joins, whereas Tableau offers inclusive Left and Right joins. What that means is that, in Alteryx, if we choose the ‘L’ output (a Left join) Alteryx gives us all the data that is in the ‘L’ input EXCEPT those rows that had a match in the ‘R’ input. In other words, the Left or Right joins used in Alteryx exclude the Inner join; whereas the Left and Right joins in Tableau include the Inner join.

Inclusive Left and Right joins are available in Alteryx, but they require the extra step of using the Union Tool.

Unions in Alteryx

To Union data in Alteryx, surprise surprise, you have to use the Union Tool. Once you have made the connections to the tool, there are three ways Alteryx can make its union:

‘Auto Config by Name’ – where Alteryx looks at column headers and then stacks matching names on top of each other.

‘Auto Config by Position’ – where Alteryx takes column 1 from sheet 1 and puts that on top of column 1 from sheet 2 and so on.

‘Manually Configure Fields’ – whereby you can select the layout of the Union, by moving columns on top of each other.

In this case, since I knew that both sheets had a column called ‘Order ID’, I chose to Auto Config by name. After we run the workflow, Alteryx gives us a Unioned table of our two sheets.

If we look at the image above, you can see that the ‘J’ and ‘R’ outputs of the Join Tool have connections into the Union Tool. This is how you can do an Inclusive Right join in Alteryx. We take the Inner join and the Right join and then Union them together. This extra step might seem like a bit of a faff, but there are many cases when it is helpful to separate out the Inner, Left and Right joins of our data.

Recap

So that’s how you can combine data in Tableau and Alteryx through use of Joins and Unions. These tools are the bread and butter of data analysis, as it is very rare to have all the data you need in one file. Now you know the differences between different types of Joins, you know how these tools work, and you know to stop and think about what kind of Join you want, and whether it should take place on multiple fields.

This blog has got a bit lengthy and is tackling a pretty basic subject, but foundations are very important! Hopefully you found something interesting in it, and if you want to ask me anything about it (or anything else) then you can always catch me on twitter @olliehclarke 

 

Author:
Oliver Clarke
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab