Like many millions of us about 3 months ago I was entirely dependant upon Microsoft Excel for data preparation. And like many millions of us I got continually frustrated with how slow it can become when working with large (by excel standards, tiny to others) data sets.
Since moving to The Data School we have been introduced to alteryx, a data preparation tool that is hugely responsive and easy to use. It has changed the way I prepare data hugely. The only problem I have results from the fact that I am so used to doing data preparation in Excel and the syntax it uses that whenever I move through a workflow in alteryx I still think about it with my Excel ‘hat’ on.
So I am then left with the question ‘well this is how I would do this in Excel, how the hell do I do it in alteryx, it must be possible!’ In every case I have encountered so far, it is!
This series of blogs named ‘From Excel to alteryx’ will focus on this transfer and hopefully help others in a similar situtation to myself identify how we can recreate Excel functions in an alteryx workflow.
The focus of this first blog will be on one of the most used functions in Excel, the VLOOKUP function.
In Excel we use the VLOOKUP function or INDEX MATCH formula to bring in information to our data sets from a different source(s) or table(s) (for those unaware of INDEX MATCH, we use this instead of a VLOOKUP formula if the lookup value is not in the left most column and thus we cannot use the VLOOKUP formula. A great article on how this works can be found here for those not sure how it works)
For example in Table 1 (on the left side) we have information in regards to tennis player rankings, in Table 2 (on the right side) we have a list of tennis scores. I would like to bring information of players rankings and their ranking points into Table 2. We have a common field which is player names, so as a result we can run a simple VLOOKUP or INDEX MATCH to bring this information through.
Within alteryx this is what the JOIN tool is used for. To join information from one source (or multiple sources) together.
The first thing we must do is to create two inputs on our alteryx workflow (one for each Table) or sheet. We then bring in the JOIN tool; connecting each of the inputs to either one of the left or right joins.
Now we must configure our join. The common field name for the 1st Table was our player name or ‘PLAYER’, in the 2nd Table our common field is our winners player name or the ‘Winner’ field.
And this is where I shall mention a few key details:
- In alteryx we can join by more than one field. You would probably do this if you did not have a unique ID field but did have an individuals Name and DOBs in both tables. Thus allowing you to make sure that you are indeed bringing the correct information through.
- You can bring through multiple fields from either Table, whereby in Excel you would have to create multiple columns of VLOOKUP formulas.
- If there are duplicate matches your new data set will create a new line for every name that matches that description. If you see your data set expands after your join on what should be a one-to-one join then it is highly likely this has occurred.
- You can select the fields you wish to bring through from either Table. In theory you are creating a brand new Table. You select these fields in the lower end of the configure menu. You can also rename your fields here.
- The ‘J’ output from the JOIN tool represents all records that matched. The ‘L’ represents all records where no match was found from your left Table input. Your ‘R’ output represents the same but for your right Table input. In Excel your formula would error. If you wish to add these records back into your data set it is a simple case of unioning these with your Table (but more about that in a future blog post!).
- A MULTI-JOIN tool is also available, allowing us to bring information in from numerous sources at once (again, expect more about this in a future blog post!).
The below image gives the example of the configuration properties I used to join my two Excel Tables contributing to an identical output.
I really hope the information I have provided helps you understand the way in which the JOIN tool in alteryx can be used in a similar manor to Excels VLOOKUP formula.
Any feedback you have in regards to this blog would be great! Especially in regards to the style of the piece given that it will now become a regular series.