Dashboard Week - Day 2

by Lemis Tufail

Our task for today was to answer the following question using this data source (surveys from 1992-2019)

What Does the Tobacco Use Supplement Data Say About Tobacco Use?

The file was in DAT format and included around 1.7 million rows. I struggled to get it into Alteryx because I'd never worked with a file in that format before. After a time, I realised I needed to import it into Alteryx as a fixed width file. The file looked like this at this point:

I limited the number of rows to 100,000 because it was taking too long to run the tools even with the workflow cached.

I started parsing the data with a text to columns tool with a space (\s) as the delimiter, but the data structure didn't look right.

It took me some time to understand but the numbers in the file corresponded to answers to the questions in this document, and the spaces in the file were simply null values, therefore every character in the file represented an answer or part of an answer.

At this point I knew I had to use RegEx, so I began by parsing the data through the tool as shown below:

However I soon realised that this was going to take too long, so after a lot of trail and error I tokenized every character of the row like this:

This left me with 700 fields, where there were individual fields which contained an answer to a question from the document and the others had to be joined with multiple other fields to contain the complete answer. An example is shown below:

I then decided to use a formula tool to add up the fields to answer each individual question, I decided to exclude the fields that contained answers for some of the questions that didn't contribute to my analysis. At this point the workflow looked like this:

Finally I had to include all 1.7 million rows of data into the workflow since I limited it to 100,000 earlier on. This took over 35 minutes to run during the time which I started writing this blog.

When I tried to startup Tableau, I was prompted with this error:

This meant I had to uninstall and reinstall Tableau; however, my laptop froze at this point, and I had to restart everything and reinstall Tableau. I then launched the Alteryx workflow, and made the mistake of running it, because it was now overwriting the Tableau hyper file I had created when I initially ran it.

So, now I had to wait until the Alteryx workflow finished running again, which took another 20 minutes.

By this time, I had 15 minutes left to do the dashboard and here's what I came up with.

And that's how my second day of dashboard week went.

Avatar

Lemis Tufail

Wed 29 Dec 2021

Sat 11 Dec 2021

Sat 23 Oct 2021

Sat 23 Oct 2021