Clustering in Alteryx

by Tom Stinson

Clustering is an interesting concept. Being able to group members is extremely powerful, as is any predictive analytics, but you have to be sure to be careful when deciding what to group by and how many clusters you build as this can massively swing your results.

The first step is to decide how many clusters to produce. In order to do this you need to use the K-centroid diagnostics tool. If done properly, in the following browse tool you should see two tables and then two series of boxplots that have been produced by R. This analysis helps you decide how many clusters you should produce.

The Adjusted Rand Index (ARI) is a measure of similarity between two data clusterings. In essence it tells us how reproducible the groupings are. So, if you have an ARI of 0 you are essentially clustering entirely randomly, and placing each value in a random bucket. This would not be reproducible at all. At the other end, if you have a perfect score of 1, the results are entirely reproducible and you have some perfect clusters that have no overlap.

The Calinski–Harabasz index is a measure of the composition of the clusters. So, if you have boxplots that are very dense with few outliers then the grouping within the cluster is very precise, and the values do not extend massively.

From this you can begin deciding how many clusters to produce. This should be a weigh up between the information provided by the boxplots, as well as the judgement of the user. For example, the charts could suggest to you that two clusters is by far the best choice, however you know that you need to group the data into three or more groups, so you would choose three clusters.

Finally, you can append the cluster name/group onto your main data. This is done by taking the output from your K-Centroid Cluster Analysis, and sending it to the bottom input of your append cluster tool. You send the rest of the data that you want to append your clusters to to the top input tool, and after you run it you should find your clusters appended to the end of the table.

Fri 29 Jul 2022

Thu 28 Jul 2022

Wed 27 Jul 2022

Tue 26 Jul 2022