Alteryx - One-hot Encoding

by Liu Zhang

In machine learning, one-hot encoding is a frequently used method to deal with categorical data. Because many machine learning models need their input variables to be numeric, categorical variables need to be transformed in the pre-processing part. (see Wikipedia)

To be more visually intuitive, the aim is to convert the data from the left table to right table format.

Right table is one-hot encoded

Although it is not directly useful for other applications, it is a very good practice to perform in Alteryx.

The workflow

We load the file (link) into Alteryx where it contains three columns, Store is a categorical variable while Assigned and DistaneMiles are numerical variables.

Initial input

First we add a row id to keep track individual records.

Add RecordID

Then we change the data type for the columns as Alteryx reads in V_string by default.

Change numerical variables into correct data type

Next we create a dummy variable column to indicator for each level within the categorical variable Store (for pivoting later), one easy way to do it is performing a self-join.

Join by position as it's self-join
Self-join result

With two columns (Store & Right_Store) we can perform a pivot wide on the above table by using Cross Tab tool.

For each RecordID and numerical values, we pivot the table
Pivot table result

This step effectively created a square table for the pivoted columns, where we don't really care about the fill values, we are only concerned about it contains values or not.

For create a dummy 0 and 1 for each cell, we can use a Formula tool, but given the large number of columns, a better way is to see Multi-Field Formula tool.

Select the text columns from the pivot table
Result

Now we have all the columns with either a 0 or an 1 in the table, where it can be used for a lot more machine learning algorithms.

Note: There are a few issues need to be considered after transformaion, such as zero value columns, multicolinearity, which we are not going to discuss here.


Looking for more guides, tips and tricks in Tableau or Alteryx? Go check out the other blog posts from the Data School.

Tableau Public

LinkedIn

Twitter