Today is Thursday, which means that we have two more days to go. The challenge for today was to visualize the drought conditions in America. The data are available on US Drought Monitor and goes back to 2000.
First work was to retrieve the data from the REST API. The requests were actually not that clear, and there was several difficulties. First, I tried to extract XML files but the XML parsing tool in Alteryx. Ravi helped me with dynamic download, and helped me get all the API URL needed.
But then, I had difficulties with the reformatting as the file turned out to not be recognized as XML but JSON by an online tool, but it was seen as a sort of hybrid by Alteryx, which made the parsing very complicated.
Brian then gave me another idea to directly request the data as JSON and create a Macro to gain some time. Which in theory is an excellent idea. But by the time, I was quite frustrated already and the day felt like a tough one.
I guess this is why I spent so much time waiting for Alteryx to process the workflow I built. The amount of data we had to manipulate was HUGE. My final file was more than 3 millions rows, which makes every modification of the workflow very long and painful. Also, I messed up with the Appending tool, which multiplied the data even more. The consequence was a wait of 25 minutes, to come back to a 2% completion only. I stopped everything of course, and chnaged strategy!
My screen for quite a large part of my afternoon.
Quite frustrating, isn’t ?
Again, Brian had encountered the issue earlier of not being able to test the workflow without having to spend long minutes of waiting and came up with a nice solution, which simply was to take a sample of the data, work with them before running the workflow on every single row. Which sounds quite obvious when you write it down, but almost feels like a Eureka moment when you have your head stuck for a few hours on a workflow…
This experience proved that every single detail counts when it comes to large data set, from collecting, parsing and formatting phases. Also that you want to make sure that you have the right request on your API and that it makes you gain a lot of time.
My daily self assessment:
- What to keep doing for the next time: Not much, except that I have been taking the time to actually look at the data set before rushing in. It was a bit of a forced move though, since I had to build my API request.
- What to improve: Realized the Pomodoro method doesn’t always work. Well, it works when you don’t have peak and off peak times in your work. When my Alteryx workflow was running for long, long minutes, I was either trying to fill the gap with readings on the topic, or waiting and chatting with people around. Not so much to do, really. But then, during my productive moments, it would have been a shame to take those breaks. At some point, my breaks were useless as I was picturing myself working. Conclusion of that method today, it would be interesting to use it in a more appropriate situation.
- What to stop doing: I had a lot of difficulties working through that day overall. I think the tiredness of the week is becoming harder and harder to cope with.
That is all for me for today, thanks for reading and I will be back tomorrow, fresh and rested!