Day 3 of Alteryx challenges and practice week (somewhat ironically named “Dashboard Week”) continues with downloading 800 .csv files from 800 different (but similar) URLs on the NCD Risk Factor Collaboration website. As a recap, previous challenges this week have included working with an API and building apps and macros on day 1 and prepping an Excel file on day 2.
The star tool of today was therefore Alteryx’s download tool, and today’s main learning point involved configuring inputs for the download tool to account for the changes in URL that were involved. Another key point was to figuring out how best to handle the output (both in terms of configuring the download tool but also thinking about the best format in which to make such a large and multi-faceted data set Tableau-friendly.) Examples of questions that crossed my mind included whether to have one file per risk factor or one file for all the risk factors? Long and tall or go wide? .csv or .tde?
I ended up with one large file that was long and tall (.tde) before deciding to go back to Alteryx and generating a new output of 3 .tde and 1 .csv files (one file would not show anything in Tableau as a .tde presumably due to special characters, but outputting this as a .csv seemed to solve the problem). My new output was wide as that seemed easier to work with in Tableau with this data set, and there was one file for each of the risk factors.
With my workflow and input into the Download tool as above, the configuration of the download tool ended up being straightforward, without the use of the Headers or Payload functionality and outputting to .csv files in a specified directory.
Once again, this data set was rich in percentages, largely looking at the prevalence of various conditions in populations. Once again, this raised the limitation (challenge!) of being unable to aggregate the data, whether across populations or across time and thereby testing my creativity in terms of visualization potential.
Another sleepless night ahead and the hope that tomorrow’s (final) data set does not require the use of percentages.