So here we go. The infamous dashboard week and for the first day I am dueling with data from the Environment Protection Agency (EPA) in the US. Let’s see who lives to see another day.
1. There’s a lot of data. 1.2gb zipped file for all states. Over 12gb once extracted.
2. I wanted to change file types to something smaller and easier to use. I created a very manual workflow outputting hypers, realised I don’t want hyper files. Then I made a Macro to process the file more dynamically to reduce how much clicking I had to do. Had issues. Solved them. Hour and a half into today now…
3. Then I was an idiot and got confused why I couldn’t join facilities to their programs. Both tables contained a registry ID so it should work right?! Well yes, but I was doing a sample at the beginning of my flow to reduce the volume of data. I was losing 99+% of my data in a join. But of course they didn’t join well because I had filtered out most of the facilities.
4. At this point I’m thinking of making something that is an exploratory tool for finding out what permits apply where. It may be a little boring though as it’s just descriptive.
5. We found a data source that is national and has a selection of the information from the other tables included. It’s 290mb but it’s taking FOREVER to download.
6. Well suddenly it’s midday and I have very little to show for it. Think I will go for a walk around st pauls. There’s training on so rip any chance of lunch outside of a training room in the office. There are eating noises so headphones are on my head now.
7. Found a pdf with the different ‘interest types’ and some seem interesting. I had a search for these interesting ones and none were present. Fantastic…
8. Wanted to load in the interest types definitions (as in 7.) but the data is in a pdf. Then I went down a rabbit hole on pdf inputs. That was a waste of 20 minutes. I’ll stop that path.
9. I cannot find a single interesting thing with this data set and am not enjoying the process of trying to find something to present.
10. My neck hurts from sitting at these desks on a laptop all day.
11. I take it back about the lunch thing. I got some free lunch out of the training!
12. I gave into the pain and hunted down a keyboard and laptop stand to try and make things more comfortable, but it’s kind of too late.
13. There is some information on radioactive waste which is more interesting (which I missed when I wrote 7., but I’m struggling to find a way to make this interesting too.
14:38, still nothing interesting made and few technical challenges faced
15. My internet connection is very very slow. I can barely do a google search and this is sad.
16. I am yet again so pleasantly surprised
17. (this was an image about how bad the network was but it literally didn’t upload)
18. I FOUND SOMETHING GOOD BUT IT IS SO LATE IN THE DAY (15:45)
19. I need to restructure my dataset to make this work and that feels like a bad idea
20. I’ve run out of time and now panic dashboarding (16:18)
21. I’ve made something but it’s not very good.
It may be nice and green, with a dark map, but that is all it is. I’ve broken up the facilities across the US into different categories. There are the SIC and NAIC codes so you can explore an area and see find your local hazardous waste facilities?
22. So what? What do I get from the dashboard? Well, not much. If you’re from the US it may be a bit more interesting. If I has pulled in the information about violations it would have been way more interesting.
23. Hurting my neck this morning made the day pretty unpleasant. I got a keyboard and laptop stand half way through which helped.
But there we go, dashboard week day 1 done. Now I just have to present it to my colleagues and try to say something positive. I’m sure the coaches will have nothing negative to say…
Having short deadlines has taught me not to make anything great, but just make something and make sure you finish it. Challenges faced today means that I have ended up with something ‘finished’ but something that is not very good at anything