Data preparation in Alteryx: where to start?

by Daniel Watt

When you get in a car, you have usually have one goal in mind to get you, your passengers and your cargo to you destination. There are certain things you need to do before you can begin your journey  (eg. start the car, take the handbrake off) and there are certain places you may pass through en-route to your destination. You wouldn’t just start driving without a clear idea of your destination or route and the same rule applies when working in Alteryx.  You must not start dragging tools onto the workflow until you have a clear idea of what you are trying to achieve and how you are going to get there.  Here are the steps I would use to break down a data preparation problem if we have the input and output data, or at least an example of the output data.  I prefer to start by browsing the output data as a lot of the input data may be irrelevant for where we need to get to.

1. Browse the output data

  • What are the field names / column headers?
  • Is the data grouped? Look for strings that are not repeated, or field names with group in the name
  • What is the format of the data?  Are there any data types that require special tools (such as spatial) or any strings which have been formatted?
  • Is the data sorted? If so by which field and in which order?

2. Browse the input data

  • Where are the field names?  Are they in the first row or do the appear in a column or in several rows throughout the data?
  • Where does the data start? Is the data in the second row and all further rows or is the pattern more complex?
  • Does it look as if there several tables within the data?
  • Which columns contain the data you need?
  • What format is the data in?  You need to be aware of the data type (eg. string) as well as any string formatting that may cause issues such as unnecessary suffixes and trailing spaces.

Once you have completed these two steps you may also need to think about any intermediate points you need to get to in order to reach your destination or output data.





Fri 06 Jan 2017

Thu 05 Jan 2017

Wed 04 Jan 2017