Suppose we have a car dealership dataset that has a row for each week of the year and a column for each colour of car indicating how many cars of that colour were sold that week.

DATE | Red Cars | Silver Cars | Black Cars | Blue Cars

What if we then wanted to create an aggregated column which held the total number of cars sold that week?

DATE | Red Cars | Silver Cars | Black Cars | Blue Cars | TOTAL

There are two main ways to handle this:

  • Use a calculated field to sum the columns.
  • Pivot the data, aggregate across Date & Dealership, and join the totals back to the original date.

The first option is by far the quickest and retains the individual car colour sales values with no extra work. Within a Clean step, simply create a calculated field called “Total Cars Sold” and enter in [Red Cars] + [Silver Cars] + [Black Cars] + [Blue Cars]. Boom, problem solved. You can see a workflow that uses this solution below.

This works fine for a one-off report or a quick answer, however what if we start adding more categories of cars in the future? Each time you’d have to manually update this function. This is where the pivot & aggregate method comes in handy, albeit at the cost of setup time.

To use the pivot & aggregate method:

1: Create a Pivot step.

2: Make sure the pivot is set to “Columns to Rows”.

3: Click where it says to use a wildcard pivot and type “Cars”. This is a case sensitive search, so the “C” needs to be capitalized! Hit enter, and you should see your car columns pivot into rows and a new column appear called [Cars] which contains the values for each car colour.

4: Add an Aggregation step to the right of your Pivot step.

5: Group by [Date] and [Dealership] and aggregate by the sum of [Cars].

6: Add a Join step and join by [Dealership] and [Date] like below. This is so that we get both the total car sales whilst also keeping our original car colour sales breakdowns.

7: Finally, remove any duplicate fields and output the data!

You can see a full workflow using this method below.

The main danger of this method comes from the ambiguity of the wildcard search. If another field gets added which contains “Cars” but isn’t a type of car being sold you could be in trouble!