The first week at the DS has been brilliant! We've been welcomed warmly and encouraged to get stuck into Data Preparation tools. This Blog discusses my first DS project - Alteryx Weekly Challenge #87: Losing Daylight, which focuses on Daylight hours throughout September.
Input to prep (up to 30th Sept):
Output Desired (including a total row at the bottom for Day Length Diff):
The first thing that immediately jumps out at me is the row of nulls. Using the FILTER tool, set a custom filter to keep all rows where the columns “Sep”, “Sunrise” and “Sunset” were NOT null.
In this case I could have selected a basic filter and just filtered one of my columns to keep all values that are not null. However, I deemed it better practise going forwards to ensure I filter out rows in which ALL columns have null values.
To split out my sunrise and sunset columns to just show time, I used two TEXT TO COLUMNS tools, one for each column. Split into 2 columns using “/ “ as the delimiter. Note the space after the /. This delimiter splits a column on spaces and splitting to only two columns means that the sunrise and sunset column will be split on the first space it finds, leaving me two columns with my desired times. Use a SELECT tool to remove extra columns and rename the split columns so I’m left with something looking like this:
Time in the format above doesn’t exist in Alteryx. I used two DATETIME tools to convert both my Sunrise and Sunset strings into times using a custom incoming format “hh:mm”, naming my new columns simply “Sunrise2” and “Sunset2”. I rename them like this since I want to keep my original Sunrise and Sunset columns for the final output.
Sunset should be an afternoon/evening time of day. However, I currently have sunset at 7am so I need to find a way of adding 12 hours to this time.
Using the FORMULA tool, I added 12 hours to the time in the Sunset2 column using DateTimeAdd.
The DateTimeAdd function allows you to add a specific interval to a Date-Time value.
The layout of the function is DateTimeAdd(dt, i, u).
- Dt refers to the column we are applying the function to. In this case it’s the Sunset2 column. However, the data type of Sunset2 is Time, not a DateTime, so within the formula we require a dummy date (2000-01-01 - this can be any date) so that the formula recognises the format and knows where to apply the interval.
- i refers to the interval that we want to add, in this case, 12 for 12 hours.
- u represents the unit of DateTime we want to apply this interval to, in this case it is the hour. I found help with DateTimeAdd using this tool!
Using another FORMULA tool, we can easily calculate the difference between our Sunrise2 and Sunset2 columns to find out the number of minutes of daylight we have for each day. This time we use the DateTimeDiff function which is in the format DateTime(dt1, dt2, u).
This tool helped me out with the DateTime function.
This is saying we will calculate the difference in a specific unit (e.g. minutes) between the start time and the end time. Our formula is DateTimeDiff([Sunset2], [Sunrise2], “minutes”).
The last column we require is the Day Length Difference. For this, I used a MULTI-ROW FORMULA tool. Creating a new field called “Date Length Diff” and ensuring that values for Rows that don’t exist are set to NULL, in the expression field I input:
[Day Length Minutes] - [Row-1:Day Length Minutes]
This simply takes the value of the row above away from the current row. Since the top row doesn’t have a row above it, this value will be left as null.
Using the SELECT tool, we can remove columns Sunrise2 and Sunset2.
The output required has a Total Row at the bottom of the Output that sums up the Day Length Diff values. What I’m going to do is branch off from my select tool and summarise these values using the SUMMARIZE tool. I will then UNION this branch back to the previous select tool to obtain my output.
Using the SUMMARIZE tool, Sum the Day Length Diff and ensure it remains named "Day Length Diff". The Total row in the desired output includes the text “September Total” in the Sep Column. To include this text, I add a formula using the FORMULA tool creating a new column called “Sep” and in the text pane, write “September Total”. My branched table now looks like this:
Use the UNION tool to bring the tables together. Since the column names are the same, the values will fit into the correct columns. Use the SELECT tool just to tidy up your order of columns and rename "Sep" to "Date".
Use your OUTPUT tool to save the result in your required format and voila.
Something I have found with Alteryx is that there is no one way to specifically solve a problem but thinking logically will certainly help you on track. You can find Alteryx Weekly Challenge #87 here.