Jon's DS13 Week 7 Recap

by Jonathan Allenby

Week 7 of DS13 saw us receiving a break from client projects and instead was back to focusing entirely on learning with a particular focus on spatial data and data preparation.

Day 1

We spent Monday covering more Tableau Advanced topics with Dan Farmer. This involved revisiting some basic topics such as chart choices for specific types of data. A particularly interesting section of this was based around visualising time-based data. We covered old favourites such as sparklines and slope charts, but also learnt how to make a few new charts such as bump charts and control charts!

Day 2

Day 2 kicked off with an interesting Data Security & Ethics with Gwilym Lockwood. I think these were two areas we had really overlooked up until this point. It was shocking how differently you approach data if you simply just rename the fields from representing retail stores to representing hospital patients. There’s a lot less pressure on analysing data to determine which stores to close than analysing data to determine which patients don’t receive treatment.

Following this we dropped back into Tableau Advanced with Dan, where we began to cover spatial visualizations. Highlights of this were learning how to make hexmaps from scratch using Tableau and Alteryx and learning how to import Mapbox maps which provide way more options than the Tableau built-in maps.

Day 3

Wednesday was pretty much start to finish data prep challenges with Dan and Alteryx. The Data School is a purveyor of diabolically messy and poorly formatted data sets for the sole purpose of providing some incredibly challenging and practical data preparation challenges. This was intermingled with bits of web scraping and learning how to parse HTML which seemed complicated at first but with a few tools and a bit of RegEx actually becomes fairly easy.

Day 4

On day 4 we spent the day working with spatial data with Andy, both in Alteryx and Tableau. The spatial tools in Alteryx were still a little foreign to many of us so it was good to over them. One of the highlights of the day was learning how to make custom map-grids in Alteryx and using these to view distributions of individual data points. Another cool thing we got up to was using Alteryx to parse and plot data from Strava. We were able to plot whole running routes as well as visualise pace, heart-rate, and many other factors over the course of each run.

Day 5

Since we didn’t have a client project this week we had an Andy-devised project. Following the hints provided throughout the week, the project this week was to take a bunch of Andy’s Strava data and finds interesting ways to both analyse and visualise it. Evidently Andy exercises a heck of a lot so we had a lot of data to play with! One key issue we found was Strava data gets exported as GPX files which need to be read into Alteryx as XML files which then need to be parsed to get all the data. The other main point of caution we learnt was to be careful when working with large amounts of spatial data. Spatial data is quite large, so one or two errant joins can result in the size of your data ballooning into the hundreds of gigs. We all took very different approaches to this project from viewing Andy’s progress towards his marathon goals, to visualising the most common shapes of his runs, to seeing how long it’ll take Andy to literally run into space!