In machine learning, one-hot encoding is a frequently used method to deal with categorical data. Because many machine learning models need their input variables to be numeric, categorical variables need to be transformed in the pre-processing part. (see Wikipedia)
To be more visually intuitive, the aim is to convert the data from the left table to right table format.
Although it is not directly useful for other applications, it is a very good practice to perform in Alteryx.
We load the file (link) into Alteryx where it contains three columns, Store is a categorical variable while Assigned and DistaneMiles are numerical variables.
First we add a row id to keep track individual records.
Then we change the data type for the columns as Alteryx reads in V_string by default.
Next we create a dummy variable column to indicator for each level within the categorical variable Store (for pivoting later), one easy way to do it is performing a self-join.
With two columns (Store & Right_Store) we can perform a pivot wide on the above table by using Cross Tab tool.
This step effectively created a square table for the pivoted columns, where we don't really care about the fill values, we are only concerned about it contains values or not.
For create a dummy 0 and 1 for each cell, we can use a Formula tool, but given the large number of columns, a better way is to see Multi-Field Formula tool.
Now we have all the columns with either a 0 or an 1 in the table, where it can be used for a lot more machine learning algorithms.
Note: There are a few issues need to be considered after transformaion, such as zero value columns, multicolinearity, which we are not going to discuss here.
Looking for more guides, tips and tricks in Tableau or Alteryx? Go check out the other blog posts from the Data School.