In this post I’ll be using the superstore data set to introduce the idea of aggregation works within Tableau, and how it can be used in creating simple charts.
What is Aggregation?
Aggregation in simple terms is a calculation of some kind that is applied to the most granular level of data that helps to describe or summarize it in some way. Within Tableau both dimensions and measures can be aggregated, but it is more commonplace for aggregation to be applied to measures. Whenever a measure is dragged into the shelf in Tableau it is aggregated by default, normally using SUM, as is shown below.
There are other types of aggregation though and these can be accessed by hovering over the sales pill clicking the dropdown and then click the Measure option, this then generates a list the aggregations that can be applied.
The logic behind what is shown in chart below is that for each sub-category within a category all the sales are added up and displayed as a bar. Without aggregation each of the sales within a sub-category would be displayed as an individual point.
An example of aggregation at work.
In the following examples I will compare the profitability of the different sub-categories and look at how the aggregation and the level of granularity of the data will subtly change what is being displayed in the view; one of the chart showing profitability will be use aggregation and the other will not.
The first step in the example is to create the bar chart in the image above, this can be done by dragging the sales measure to the column shelf, and then dragging the category dimension to the row shelf followed by the sub-category dimension.
Profitability: no aggregation
Next create a calculated field by selecting the dropdown arrow at the top of the list of dimensions and selecting Create Calculated Field. In the text box that opens as a result, enter the formula below.
Once the field has been created drag it into the colour shelf and the bar chart this will create a bar chart as can be seen below.
To improve the readability right click on of the colours in the legend and select Edit Alias. Change the values true to Profitable and false to unprofitable respectively.
As this calculation doesn’t use aggregation what is happening behind the scenes is that for each row of data, the value of profit is checked for being either above or below 0 and then coloured accordingly based on whether the answer is true or false, so the calculation is happening at the most granular level of the data.
Visually what is being shown is that the total length of the bar represents the sum of sales for each sub-category, the length of the orange portion is the sum of sales where profit was greater than 0, and the blue is the sum of sales where profit was less than 0.
Profitability: with aggregation.
Once again click the dropdown arrow at the top of the dimensions list and create another calculated column, this time with the following calculation.
Once this field has been created then drag the column onto the colour shelf and the chart should now be coloured as below.
Notice that the length of the bars is the same but the reason the colouring is different is because of the aggregation happening in the calculation. What is happening here is that rather than working at the row level, for each sub-category the total profit is worked out and then checked to see if it is greater or less than 0. Sub-categories with orange bars, have a positive total profit and those with blue bars have a negative total profit.
NOTE – Because aggregation has been used to create this field it is not possible to edit the aliases of the colour. In order to have the colours named differently this must be done within the calculation as show below.
This calculation functions the same but gives a specific name to each outcome rather than simply true or false.
Hopefully this has given a brief introduction to how aggregation works in Tableau and the ways it can be used in analysis. The most important thing to remember when dealing with aggregations is to keep in mind what the lowest level of granularity in the data is and whether that is the level of detail required or if aggregation is necessary. As this is such a large topic, I will be writing further posts on this going into even more detail about how aggregations work.