Our first challenge for our (shorter than normal) dashboard week is the (harder than normal) challenge of obtaining NHS Trust data for expenditures over £25,000 and visualising that data in Tableau. The full details can be found in Andy’s post.
The main issue we all faced with today’s challenge is that there is pretty much no consistency in data format. This is in terms of file format, field names, junk headers, number of fields, date formats, and more! It was a huge data preparation challenge. Furthermore file types were labelled incorrectly and even file extensions were incorrect! This resulted in a low success rate in obtaining and parsing data.
Whilst it was a frustrating challenge, I definitely have learnt a lot. In particularly, I was able to achieve a dynamic process in Alteryx that could read and standardise this garbled incoming data. Through the use of batch Macros, dynamic inputs, dynamic selects, and filters to send data down different paths I was able to parse both Excel and CSVs, standardize the fields down to a common five, and output this to a .hyper file that was actually usable! In the end I was able to pull down nearly 1.3 records from 36 Trusts.
Whilst I’m not necessarily pleased with how much there is left to fix-up on my final viz for today, considering that the data preparation took a vast majority of the day I am pleased with what I was able to produce. Using some of the set-actions learnt over the last two weeks I’ve made a dashboard that lets you explore the expenditures over time and top expenditures for each Trust. While I was able to pull down data from 36 trusts, some of the dates seemed to be before 2017. As I was only aiming to download data in the 2017-2018 range and didn’t have the time to investigate why these dates were faulty I decided to filter them out leaving me with the 22 trusts in the final viz.