Dashboard Week Day 1: The Fun Begins

by Hanna Nykowska

If you don’t know what a dashboard week is, check out Andy’s post. Basically, we get a data set and we’re supposed to prepare the data, make a viz and write a blog before 5pm.

get a data

is a bit of an overstatement. Today we were given link and the task was to get all data sets from all 391 data sources.

This task was not easy. I decided to look at the data first just by downloading random files manually.

First Red Flag: there were different data types.

Second Red Flag: if you open them, you’ll see that the schema is different between the same file types.

This is quite a big one since we have to put the files together. Somehow. Some of the files have extra headings instead of data and the headings may go over the different number of lines. This means that the data starts at the first, third, fourth or maybe even other row depending on the file. Didn’t see any consistency there…

The field names have different names and not all fields are present in all of the files.

Third Red Flag: no description for the fields available.

The data is supposed to be for expenses over £25K but some of the entries were significantly lower than that. You could think that the Invoice Amount would sum up to over £25K for the same invoice but then in may files the Invoice Number field has plenty of Null values which means no values at all. So I couldn’t sum up the values for the same invoice… I tried to find some description of the data: what do the fields mean and what are the requirements to be included in this data sets. Sadly, I didn’t succeed.

Let’s do some work anyway

Aware of the many things going wrong I decided to plan out what I want to create. So quickly had a look at what data is available and thought I might have a look at top spenders across the Trusts and Top Expense Areas by the Spend Amount.

This needed a reality check because after a good few hours of work in Alteryx, we were pitied by Andy and so were allowed to only do the dashboard for a single trust. My plan changed to top Expense Areas.

Thankfully, the file schema was pretty much the same for the same trust (at least some of them) so it went smoother from here.

You can find my dashboard below:

interactive viz

It’s rather simple although I did squeeze in some set actions and sheet swap with parameters. There are no extra insights though.

Summary

It sure wasn’t easy, mostly because the data is not nicely structured. As I mentioned above, the file types and fields were inconsistent. We also found out that if you use ‘Encode URL Text’ in Alteryx Download Tool it may very much mess up the links you’re getting, so I’ll definitely pay more attention to this from now on.

Having said that I have still quite enjoyed today despite many frustrations.

Let’s see what tomorrow brings.

Avatar

Hanna Nykowska

Fri 24 May 2019

Thu 23 May 2019