Scaffolding Data in Alteryx

by Molly Hatch

For the purpose of a visualization, you may need data that ‘is missing’ from your dataset. Or if you were looking into the activity of a support desk, it may be useful to understand how many tickets are open on certain days. However your data may only specify the date a ticket was opened and the date it was closed/ resolved. We can see an example of this data structure below:

To analyse this data in more detail, it would be useful to change the data structure so that each row represents a day that each ticket ID was open:

Notice that we now have 1 row for each day the ticket was open rather than 1 row for each ticket ID. From here we can do further analysis such as determine how many tickets were open on each day. We can use scaffolding to fill in these missing dates from our data.

Example

Input Data:

Our aim is to see how many open tickets there are on each day and determine which day has the largest number of open tickets. 

Output Data (our goal):

To achieve this we will need to scaffold our data to ensure that we have a row for each day the tickets are open. I.e if we take the first row as an example. We can see that the date the ticket was opened was 2019-02-22 and the date it was closed was 2019-02-25. It took 4 days for the ticket to be closed. However our data shows these 2 dates in 1 row as our data is structured so that each row represents 1 ticket ID. In order to answer our question we need 1 row to represent each day that a ticket is ‘live’. 

Step 1: Determine the number of days that each ticket is ‘live’

We can calculate this using the datediff function in the formula tool.

This function will give you the number of ‘days’ between the date the ticket was opened to the date the ticket was closed. This value will help us understand how many rows we need for each ticket. As the number of rows should = the number of days a ticket was live.

Step 2: Sort Data

To investigate our output from this tool, let’s sort the data to see the range of values. A quick way to do this is to click on the column header of the field you want to sort and click on the 3 dots that will appear.

After clicking on the dots, a window will appear where you can sort in ascending order.

We can see that some of our values within our DateDiff field are negative. This means that the ticket was closed before it was opened…. Which doesn’t make sense. Therefore filter out this incorrect data.

Step 3: Filter out wrong data

Configure the filter tool to filter out any values that are smaller than 0.

Now our results look like this:

We now have a column which shows the number of days between each ticket being opened and closed.

Step 4: Create 1 row for each day the tickets were open using Generate Rows Tool

We now need to scaffold our data to ensure that each row represents 1 day that each ticket was open. To do this, use the generate rows tool.

There are 4 parts to this tools configuration.

The first aspect of the configuration specifies whether you would like to update an existing field or create a new one. For this example we need a new field which indicates all the dates that each ticket is ‘live’.

Secondly, the initialisation expression. This is the expression that is applied to row 1 of our new column ‘Date’ which would be [Opened At] as this is the date that each ticket is opened.

I then think about the loop expression (at the bottom of this window) which is the expression that is applied to each row in this field. For this example, our aim is to add 1 day to the Opened At Date until we reach the Closed At Date. This would generate a row for each day the ticket was live. The expression would therefore be:

DateTimeAdd([Date],1,’days’)

Finally we want to write the Condition Expression which specifies when the loop expression will stop building rows. It’s a true or false statement that will stop building rows when the condition is false. For this example we want the loop expression to stop building rows when the Date (the new field) is less than or equal to the Closed at date. Therefore our expression would look like: 

[Date]<=[Closed_At]

We have now generated a new column called ‘Date’ which includes all the days that each ticket was open.

Looking at Ticket 1008 for example, it was created on the 2019-05-10 and closed on 2019-05-11 and therefore was open for 2 days. This is now demonstrated in our ‘Date’ column where we can see the 2 dates on individual rows. 

Our full workflow up to this point is 4 tools:

Step 5: Answer our initial question

We can now use our results to determine which day had the largest number of open tickets using the summarise tool. If we group by each date and count the number of ticket IDs we would get the following output:

Here we can clearly see that 2019-08-04 and 2019-08-05 were the dates with the largest number of open tickets: 3.

Our final workflow looks like this:

Photo by Josh Sorenson, StockSnap

Avatar

Molly Hatch

Fri 06 Mar 2020

Tue 03 Mar 2020