In our penultimate week of training at the Data School, DS18 and I had a great session with Dan Farmer regarding how to test Alteryx workflow, check their performance and how to troubleshoot both our workflows and macros. The section on speeding up workflows and improving performance I found particularly useful, so I thought i’d write a blog on the main points. Most of these performance boosts come from only very minor tweaks to the workflow, yet dramatically improve the run time. Knowing these tips earlier on would definitely have saved a few panic-filled hours waiting for client project workflows to run!
What can reduce the performance of a workflow?
- Using tools that have a slow run time because they have to examine all the data in the data set. Examples of these include:
- Join tools – an obvious culprit, these can be a real time consumer. They look through all the fields that are being joined upon and see if there is a match. In addition to this, there is the chance of duplication if the data that is being joined has a one to many relationship. If this is the case, it may explode your data and drastically increase the run time of the workflow.
- Sort tools
- Unique tools
- Summarise tools (certain aggregations)
- Reading in data in inefficient file formats – e.g xlsx instead of yxdb.
- The use of browse tools. These allow us to view all the data and profile it, but effectively has the same effect on performance as an output tool (slow).
- Spatial tools – spatial fields hold a huge amount of information and take a long time to process.
- Holding data in the workflow that isn’t used – e.g bringing through unnecessary fields and rows that aren’t used in the final output.
- Holding data as string values rather than numeric.
Use Performance Profiling to Help Troubleshoot Your Workflow
A great place to start when you want to improve the performance of your workflow is to turn on Performance Profiling in the workflow configuration window. This will list tools in order of performance (slowest first) in the results window, which will help you to pinpoint the troublesome tools. To turn this on, click on the canvas to bring up the configuration window, then click on the ‘Runtime’ tab and at the bottom of the page select ‘Enable performance profiling’. See below:
This then produces this example output in the results window below:
Techniques to Help Improve Performance
Now, for the bit you’ve all been waiting for – what can I actually do to improve my workflow performance? There are a number of things you can do to speed up run time, many of which won’t require much alteration of your workflow:
- Convert your files to yxdb if they are not already in this format. Yxdb files are the most efficient file type when reading and writing data in Alteryx and will dramatically improve your performance.
- Minimise the use of the previously mentioned problematic tools. If your workflow is reliant on these tools, try to reduce the data going into them. An example here could be to remove any sort tools – the only time these are really necessary are if you are then using a multi row formula tool. In addition, remove any browse tools – these are useful during the development stage to check what’s happening to your data but remove them once they have fulfilled their purpose.
- Drop unnecessary columns at the start of the workflow by using a select tool. This is obviously easier to do with hindsight, but once you have worked out what fields you won’t need, remove these early on from the workflow. In addition, do the same when joining by using the inbuilt select function to drop unnecessary fields.
- Drop unnecessary rows early on by adding in filters.
- Summarize before a join if possible – reducing the data that goes into a join will increase performance.
- Determine optimal field types early on in the workflow. A quick way of doing this is through using an Autofield tool, however once you have ascertained the correct data type, drop the Autofield and implement the changes with a select tool.
- Try to use integers over strings wherever possible. This is particularly important when comparing values in formulas and joins and helps to avoid errors.
- Drop spatial columns as soon as you are finished using them. Spatial fields store a huge amount of data and if they are held throughout the workflow after use, dramatically reduce performance. An example of this may be with the spatial match tool – after the match occurs there are often spatial fields produced in the result that may not be used and can be dropped.
And if You Don’t Believe me – Here’s a Great Example
During this performance session, Dan tasked us with improving a workflow by implementing these changes. I have included the workflow below which at the time of writing this blog took 1.07 minutes to run:
The flow below is my improved version and reduced the run time down to 0.7 seconds!
I made the following changes:
- Convert .xlsx files to yxdb.
- Remove the Autofield tool and use the select tool to drop unnecessary columns and change data types from both files.
- Insert the filter before the join.
- Drop unnecessary fields in the join.
- Remove first unnecessary sort tool.
- Drop unnecessary group by in the summarise tool.
- Drop final unnecessary select tool.
So as you can see from this example, these changes, even if they seem insignificant, can significantly boost the performance of your workflow. I hope this blog helped and thank you for reading!