Belated blog post – as I’ve been away in the US – but on day 3 of #DashboardWeek, Andy tasked us with a city each to go away and provide some insight on passenger usage of their transit system.
Fortunately, I got Chicago. This was fortunate as Chicago has an impressive collection of data impeccably organised for such analysis. Their data included shape files – which is useful for overlaying on maps to show the network as you would see it on any map.
On the face of it, this challenge would be straightforward – however, it turned out to be a real pain as, despite the data being available, it wasn’t in the format I needed to show what I wanted to show in Tableau. Therefore, most of the morning and some of the afternoon was spent manipulating the data and shape files in Alteryx. Here’s my original sketch of ideas:
So much data!
From the below workflow – you can see I have 3 main input files:
- A station locations shape file;
- A railway line shape file, and;
- Ridership information for each of the stations i.e. how many people ‘tapped in’ at each station.
First things first, I wanted to overlay the two shape files to create one shape file that showed stations and rail lines. I did this originally but had to go back and use the spatial match tool to ensure Tableau could understand that the stations belonged on certain lines – and therefore colour them accordingly. In practice, this sounds simple – but I hadn’t had much experience of working with shape files previously and so time was eaten up as I learnt by doing – we got there in the end though.
Joining all that great data.
Joining the ridership information to their respective stations was much more straightforward – as I did a simple match on station name. I say straightforward, but I had to spend some time beforehand ensuring that stations were named the same in both files – as it turned out that they weren’t when I had originally run the procedure and had about 40 stations with missing information. In hindsight, I think this is what the ‘Fuzzy Match’ tool in Alteryx would have helped me with – however, having no experience using this tool and having spent ages completing the first part, I used a trusty VLOOKUP in Excel to show me which station names I had to amend.
So, I had my data – now to the viz. I had started the day with a clear idea of what I wanted the final design to look like – you can see my rudimentary sketches below. However, as the day wore on, and general fatigue crept in, I couldn’t see the wood for the trees and my plans got scatty.
Therefore, I ended up with a viz that focused on showing each of the lines ridership figures, highlighting hotspots on the line – but I felt it didn’t quite capture everything that I needed – also, it was markedly different from my original plan. Here is that original viz:
I slept on it for a couple of nights and then, as I had a few spare hours to kill in NY’s JFK airport in transit, I had a go at a redo much more akin to my original idea. This was based around a much simpler idea of one network map that you could either hover/click on stations/lines and a more focused picture of occupancy difference between weekday/weekend. Final viz is below: