Statistics Series // 1.2 Cluster Analysis in Tableau

by Ellie Mason

Cluster analysis can also be performed in Tableau. This post will show you how, using the same beer data as my previous post which introduces cluster analysis and shows how to perform it in Alteryx.  This is part of a series of blogs I’m writing on Statistical techniques in Alteryx and Tableau.

Note that clustering is available in Tableau Desktop, but not for posting online, e.g. Tableau Server or Tableau Online. This was an additional capability within Tableau from version 10 onwards. After showing how to cluster in Tableau, there is a comparison of the two results – clustering in Tableau vs. Alteryx.

 

Cluster Analysis in Tableau

Step One: Build out your view

Firstly build out a view that you’d like to group your data up from. Usually this is from a scatter plot, with an ID on detail; here individual Beers.

Setting up the view ahead of clustering in Tableau

 

There are some limitations when clustering in Tableau. It can’t be done when: other dimensions are within the view, table calculations have been used, blended data is used, using parameters/ groups/ sets/ bins, etc.

 

Step Two: Add in the clusters

Once the view has been set up, add in the clusters using the ‘Cluster’ part of the analytics pane. Click and drag it onto the view, similarly to adding reference lines.

Applying the ‘Cluster’ to the view                                                                              The Cluster menu

 

Applying clustering to the view brings up the Clusters menu. Tableau automatically assigns two clusters. If more are needed then this need to be manually inputted, e.g. four in the above example. This menu can also be brought up by right clicking on the Cluster pill and clicking ‘Edit clusters…’.

The Clusters can be saved as groups by dragging the Cluster pill in the view onto the data pane.

 

Step Three: Adding more variables to the classification

Similarly to Alteryx, Tableau can use more than two variables to build clusters, as opposed to just the two in the view when the cluster is dragged onto it. To do this it’s simply a case of clicking and dragging extra measures onto the Cluster menu box.

Dragging more measures to use for clustering

 

Information regarding the various clusters can be found by right clicking the Cluster pill and selecting ‘Describe clusters…’. This will bring up 2 tables of cluster metadata: Summary Diagnostics and Analysis of Model Variance.

Description of clusters in Tableau

Tableau also uses the k-means clustering algorithm like Alteryx, where the centre of each cluster is the is the mean value of all it’s members. It uses the squared Euclidean distances to create the clustering for each group/ cluster. Tableau then automatically scales each variable used in determining the clusters to account for the variety in values – similar to how Alteryx standardises the data using the z-score.

Visualising the Tableau generated clusters

 

Tableau vs. Alteryx

The clustering method seems very similar (k-means clustering) in both Alteryx and Tableau. However, when using the same variables (ABV, IBU and size) to cluster the data, they do result in slightly different groupings.

Comparison of clusters generated in Alteryx vs. Tableau

 

In the clustering method, however, the most common approach (and only approach in Tableau) takes the mean of the cluster and any new data coming is assigned to the cluster whose mean is closest to itself.

Alteryx doesn’t let you include categorical variables in the cluster analysis, only on continuous data. But… Tableau does.

…how does Tableau take a mean of a categorical variable? This doesn’t make sense, as it would be like asking to take the mean of cats and dogs, which is why Alteryx doesn’t allow it. In Tableau it used the mode of that category rather than the mean, incorporating representation. So any new data coming in is assigned to whatever cluster most closely represents that data.

Alteryx simply takes the mean or the median, and is, therefore, potentially a more relied upon clustering tool due to trust in it’s more clear cut numerical methodology.

 

So which should I use?

The best thing about clustering in Alteryx and Tableau is that it takes a relatively complicated statistical technique and allows it to be accessible to any user – not just those who understand statistics. Although, it is useful to roughly understand any statistical technique before applying it. This is because it is important to carry them out correctly to accurately interpret the results.

But when it comes down to which one to use,  Alteryx will often win due to the numerical clustering, rather than any categorical/ mixed inputs when using Tableau – although that can be useful at times.

Either way, it’s always good to visualise the clusters in a way that is useful for truly understanding their meaning – whether they were created in Tableau itself or indirectly using Alteryx.

Avatar

Ellie Mason

Fri 01 Feb 2019

Fri 01 Feb 2019

Fri 01 Feb 2019