Alteryx Weekly Challenge 182 Ultimate Guide

by Ozlem Sigbeku

You’re probably asking yourself, “Where is Ozlem’s latest blog?” And you would be right to ask. In these two days, I have set an expectation of all my readers that I would be writing every day. However, the reason I missed yesterday is the topic of this very blog.

This story starts with a suggestion Robbin, our coach and Hardstyle enthusiast, made as our Alteryx fundamental training drew to a close: “Alteryx weekly challenge 182 is a good challenge to start with”.

On the train ride home, I downloaded 182 and got to it. Here is the approach I took and my thinking during the task. If you have not done the challenge yet, please hold off on reading any further, until you have solved the challenge yourself. Or skip to the spoiler-free TLDR at the bottom.

<Spoilers>

THE FIRST CHALLENGE

Challenge 182 is a word sleuthing challenge on the Adventures of Sherlock Holmes. Essentially, taking a book or a text apart word by word. Firstly, I want to point out, I hate to do this to a great literary piece but maybe Holmes would be proud of this pursuit of knowledge.

When running the input, the message board in Alteryx will complain that there are truncated fields. I googled the error messages to find out what truncated means and how to get rid of them. That proved rather easy. I put a data cleanse to get rid of all the null rows, and the leading and trailing whitespace. I also thought it would also be wise to remove all punctuation, as we were interested in words only and I titled-cased all words to ensure that when I counted words later, there wouldn’t be a repeat of words due to the case (e.g. ‘the’ and ‘The’).

I then googled how to turn a long string into rows in Alteryx but I wasn’t too sure what I was looking for. I realized I was already using the tool but had split to columns instead of split to rows! Very nooby slip up. The delimiter was ' ' i.e. the space left between words.

Now, for the more complicated part – I knew I needed a join at some point to add a sum word count back into the table and a count of some sort and another sum of count for each word too. But I wasn’t entirely sure of the order. I did a sum of the total word count while actually trying to do a count of the occurrence of each word. I finally got there by googling how to count per variable in fields.

After this step, I noticed I had some concatenated numbers and words included in fields. I decided, since words were the focus here, to go back to the first data cleanse and remove all numbers. Even after I had done that, I was still left with null values.

I tried to correct it with the first data cleanse but was still left with null values. So I put another data cleanse after text to rows tool to filter out those final stubborn nulls.

Now that I had a cleaned-up list of words, a total word count and a count of occurrences of each word, I sorted the list from descending order of occurrence. In the descriptions of join tools, the most appropriate tool turned out to be an append field. At first, I mixed up the target and the source and Alteryx complained once more: ‘Append Fields (47)   There were more than 16 records in the source.’ While reordering the fields, I finally understood why it was complaining. When I switched the inputs, the Alteryx was finally at peace. I had succeeded.

The final step was a formula to calculate a percentage. While I was writing that, I just checked the data types to make sure they were numerical. I capped it off with a nice output that had been prepared for challenge 182.

I learnt, the hard way, that I should have heeded Robbin’s advice to annotate as I went along. I will no longer be making the mistake of annotating after I have finished a workflow. It was as annoying as adding references after writing a university essay; lesson sorely learnt.

It felt good to finish my first Alteryx challenge. It took me almost my whole evening, but I got a workflow and result I am happy with. I have already posted to the Alteryx community challenge page. My answers don’t match the output, but this is because my results do not have duplicated words due to different case. That’s just me being overly concerned with accuracy.

You can view and download my workflow here.

I will spare you the description of my toiling and stumbling through countless Alteryx community posts to find out how to download data from a URL and read it into a field, but I finally got there too. I am planning my next blog on how to use the download tool. I personally didn't find the community posts on the tool very easy to understand. I want to download (get it?) what I learnt about the tool in my almost hour of struggling with it.

TLDR: I created a working solution for Alteryx Weekly Challenge 182: Word Sleuthing. I learned a few things about the Download tool which I will be sharing in my next blog.

Update: I asked for feedback on Convo and Liu helpfully suggested I had one two many cleanse, and to shuffle things around to prevent any duplication of work. I have since updated my reply on the challenge page online.

1 mins read

Wed 31 Aug 2022

2 mins read

Tue 03 Jan 2023

Mon 06 Jun 2022