Often, instead of having one header per field, some datasets may have multiple rows of headers at the top. This can make it unnecessarily difficult to perform any meaningful analysis when writing formulae. In this blog, I’ll be showing you how to use Alteryx’s transpose and cross tab tools to resolve this common issue.
Our workflow template will be as follows:
- Separate the headers from the rest of the data
- Format and pivot the headers into the desired shape
- Join the two samples back together using a manually created common field.
As an example, I will be walking through Alteryx Weekly Challenges 49 and 63.
Multiple field names for each field - Challenge #49
The desired outcome:
- DATA CLEANSING TOOL: before we do any pivoting, you want to remove any leading/trailing whitespace. As the transpose/crosstab tools look for unique values, any additional whitespace can cause issues later on.
- SAMPLE: create two samples, one of the first three rows (containing our headers) and another of everything else (i.e. skipping the first three rows)
Stream 1 (Formatting the Headers)
- At this point, your headers should look something like this:
- TRANSPOSE: pivot all headers i.e. PRODUCT, MARKET and TYPE into one field (name field), and all values i.e. Product A, Product B into another (value field). Our key column is F1 as this contains what we want our fields to be named in our final output. The reason why we keep this un-pivoted is so that we can rotate it in the next step.
- As you can see, the types of market, product and type are now in a single vertical orientation. Now, we need to split these up into three columns, using the cross tab.
- CROSS TAB: the F1 field we retained earlier will now be pivoted horizontally, shifting our correct field names into place. Our values i.e. Product A, Market A etc. will form the values under each newly-pivoted header. The reason why we Group By Name i.e. the F-numbers is that this provides a ‘handle’ to join on when we merge the headers back with the original data.
Stream 2: the rest of the data
- This is more straightforward as we only need to pivot the data once
- TRANSPOSE: we still need to create a ‘handle’ for the headers to join onto. We retain F1 as a key this time because it contains our dates, which are already in the correct position and needn't be rotated.
The Join: adding the headers back to the rest of the data
- Now we have a ‘Name’ column in each table, we can finally join them together on Name = Name. What we’ve essentially created in the first stream is a lookup table where everything at the original ‘F2’ position was linked to Market A, Product A and Type A.
For example, Value 933 is assigned the F2, meaning that it is the figure for Market, Product and Type A, for 01/01/2005.
- JOIN: join on Name = Name, there should be no rows in your L or R output. The J output should look like this:
- Great! So now we have one field name for each metric. You can also use the embedded select in the Join tool to remove any unwanted fields.
Multiple Headers AND Merged Cells: Challenge #63
Fortunately, our headers in the first example were pretty simple, and nicely formatted. However, often when working with excel data, some people may use merged cells to reference several fields at once. When loaded into Alteryx, such data looks like this:
- Here we have two types of column header, dates, and types of Point of Sale. Ideally the data should look like this:
- The flow:
- SELECT RECORDS: the initial data set had several blank rows at the beginning containing information about the data itself, this wasn’t relevant to the analysis so I filtered them out.
- SAMPLE: create three samples (the select records tool can also be used interchangeably here) one of the dates i.e. the first type of header, one of the types of point of sale i.e. the second type of header, and one of everything else.
Stream 1: Date Headers
- TRANSPOSE: before we fill in the Null values, we need to pivot the data so that we can apply a multi-row formula. You could fill in the nulls without pivoting but it would require something like a long IF statement (IF ISNULL(F1) THEN F2 etc.)
- Key columns: F1 (this acts as a placeholder for our desired field name later on)
- Data columns: everything else i.e. F2-8
- Missing Columns: I’ve set this to Warning (which is the default)
- Now we have our ‘handle’ to join on later.
- MULTI-ROW FORMULA: set this to ‘Update Existing Field’ and select ‘Value’ from the drop-down. We don’t need to group by anything, and we can leave everything else at the default setting.
- Expression: IF Isnull([Value]) THEN [Row-1:Value] ELSE [Value]ENDIF
- This checks whether the value is Null, if it is, it fills it with the value from the row above, if the value is already populated with a date, it remains the same.
- FORMULA TOOL: earlier we left a placeholder for our desired column header, I updated this field with a simple string i.e. F1 = ‘DATE’
- CROSS TAB: as before, we need to shift the header into the correct place.
- Group by: Name (we want our F numbers to stay in the same position)
- Column Headers: F1 (as this is all the same string i.e. one unique value, it will only create one new column)
- Values for Columns: Value (this shifts the dates under our newly created ‘DATE’ header.
Stream 2: Types of Point of Sale
- SELECT RECORDS: I used this to isolate the second row only, this is a more efficient way than using sample. In Ranges, simply put the number 2.
- TRANSPOSE: again, pivot all the values into one field. The configuration remains the same as before with F1 as a key column, and everything else as a Data column.
- FORMULA: you needn’t use a multi-row formula here, as everything that is null represents the Total - Point of Sale. IIF(IsNull([Value]), 'Total-Point of Sale', [Value]) checks if the value is empty, and if it is, replaces it with our desired string, otherwise, it leaves the value unaltered.
- CROSS TAB - the configuration is the same as for the Date header.
Join the headers together first
- Join on Name = Name
- This is our Header lookup table. For example, a value at position F10, represents the total Point of Sale, for the week of April 11 2016.
All other rows need to be transposed, then joined to the header lookup table
- TRANSPOSE: F1 remains as a key column, everything else is a data column.
- Again, join these rows to the header lookup table on Name = Name
- At this point, use the embedded select to remove any fields (e.g. the F numbers we no longer need)
- Finally, I did some cleaning of the dates by parsing out the latter date using Regex and using formulae to trim my strings. F1 contains something called a DMA code, so I altered the field name using a select.
Pivot Point of Sale field to create 3 distinct fields
- TRANSPOSE: group by DMA and Week Of (we want these fields to stay in the same position), column headers should be POS/Total, Values should be Value and I set the method for aggregation to be SUM.