One of the most common tasks I’ve found so far (with my limited use of Alteryx) is cleaning up Date data. The majority of times Date data comes into Alteryx as a String and in many different formats, depending on the personal preference of the person who has created the original file.
If you need to use the Date data for any kind of analysis then as a minimum you will need to change the field type from string to date, so that the dates are properly recognised in analysis tools like Tableau. Then, depending on how you want your dates formatted you may want to change the format or ensure that all dates are formatted in the same way.
Cleaning up Date data in Alteryx is not difficult but it is a frequent issue, so there are some really useful skills to master. This is how I did it.
- Date Parse
“Transform Date / Time data to and from a variety of formats including both expression-friendly and human readable formats.”
Date parsing ensures that Date data will be in the correct format and type so that I can use it for analysis. In Alteryx there are a couple of different ways I could parse dates.
- Use the Date/Time Parse tool
- Use a Date Parse formula in the Formula tool
The Date/Time Parse tool is probably the easier way of transforming Dates, and within 3 clicks I will have a new Date field in the correct format. However, I can only use it if the format of my existing date is one of the pre-defined Alteryx formats in the list below.
If the format of my date is not within the list in the Date/Time Parse tool, then I can use a quick formula instead. When I drag a formula tool on the canvas and navigate to the configuration box the first thing I would do is create a new field to which the formatted date will be outputted. At this point I would also change the field type by selecting Date from the Type dropdown.
Next, I would insert a template formula that I can add my Date field and Date format into by clicking on Functions > DateTime and then double clicking on DateTimeParse(dt,f).
The template formula will appear in the Expression field below. The only parts to update are within the parenthesis. The ‘dt’ part is where I need to insert my Date field and the ‘f’ is where I need to define the existing format of the Date.
To make this easy, I can find the Date field by clicking on Variables > Existing Fields and a selecting my Date field. The field will be inserted within [ ] square brackets.
To insert the format of the Date field I need to know how Alteryx reads certain date parts; the Alteryx help page has a comprehensive list of different formats and different ‘Specifiers’, but I like to refer to the below for common date parts and how they are recognised by Alteryx:
Alteryx Friendly Format
|2-digit Day||20-01-01 (20th January 2001)|
20-01-01 (20th January 2001)
|2- digit Year||
20-01-01 (20th January 2001)
|4-digit Year||20-01-2001 (20th January 2001)|
The format of the example date below would be e.g. 01-Jan-2001
Using either the Date/Time Parse tool or the Formula tool is all well and good if all my dates are in the same original format. However, if they are not, then I need to do a little bit of work first to get them into the same format.
- Split out the Date parts
If there are multiple Date formats like the example below, the easiest way to convert them to the same is to break the Dates down in to each date part and tackle one thing at a time.
To this I can do a quick Text to Columns to split out the Date field into 3 new fields (1 for each date part) using the hyphen as a delimiter.
Now I have the days, months and years in separate columns. At this point I can see a number of issues to resolve:
A) Some of the Day dates are 2 digits, some are only
B) Some of the Months are capitalised, some are full names not shortened
C) Some of the Years are 4 digits, some are only 2
So…to rectify issue A) for any day that is only 1 digit I need to add a ‘0’ in front of it. To do this I can add a Multi-Field Formula tool to my workflow, and use a simple formula called ‘PadLeft’. This formula adds a / multiple character(s) to a string field, starting from the left. (To add characters to the right I can use the ‘PadRight’ formula.)
By double clicking on the PadLeft formula (in Functions > String) the template formula will appear.
PadLeft(String, len, char)
To use the formula, I need to add my Date field where ‘String’ is, add the length I want the String to be in ‘len’ and add the character I want to add to the String in ‘char’ e.g.
PadLeft([Date1], 2, “0”)
This formula is instructing Alteryx to look at my field called Date1, change any row that is not 2 characters in length to 2 by adding a 0 to the left.
4. Data Cleansing
To solve issue B) is very quick and simple. I used the Data Cleansing tool, and simply made all the characters in my Month field title case, so that just the first letter was capitalised.
Finally, to rectify issue C) there are two steps to complete. One to ensure all the months are the abbreviated names (i.e. 3 letters) and two to make sure they are all in the same sentence case. To do this, firstly I used a Formula tool to just select the first 3 characters (from the left) of each month, e.g. to take the J, A and N from January. There is another small issue here though, as some of the years in my data are in the 1900s and some in the 2000s, so I need to use a bit of common sense here to tell Alteryx whether to use ’19’ or ‘20’ as a prefix.
To do that I used a conditional formula that tells Alteryx to look at my Year field and if it is less than or equal to 16 (i.e. I’m assuming if the last two characters of the date are less than or equal to 16 then the date will be in the 2000s rather than the early 1900s) then to add ’20’ then the rest of the date. Or if the Year is equal to 2 characters in length then to add ‘19’ then the rest of the date. OR if there are any Years which don’t match those criteria (i.e. the ones which are already 4 characters) then to just return that date.
After running my workflow the Years go from this to this…
Now to put all the date parts back into 1 field I can use a Formula to add each field together. I still have the issue of some of the months being the first 3 characters and some (July) being the full name. I can rectify this in the same formula by just telling Alteryx to take the first 3 characters from the left for the Month field, using this formula:
I have also added hyphens in between each date part so the date is easier to understand. Finally, my date field (DateOut1) is ready to be used in any analysis, and the rest of the date fields can be deselected using the Select tool.
If you want to practise Date Parsing you can try out the Alteryx Date Parse Challenge here.