Diego's journal (DS12) - Week Two

Week Two (10/12/18 – 14/12/18)

Week two was about data visualisation and we finally got to use Tableau. However, it wasn’t until Wednesday when we finally used the software, as the previous days were more about the theory behind good data visualisation.

What did we do this week?

Monday was very theoretical.

The day started with Carl teaching us about cognitive load. Basically, during this class we learnt about the brain, its different types of memory and its reaction towards different data visualisation cues. Also, we discussed about measures and dimensions. As a group we did an exercise in which we had to order by relevance which properties (colour, size, shape, etc.) were more relevant when visualising discrete or continuous data. The conclusion was that ‘it depends’. In the afternoon we spent around an hour doing a makeover from a dashboard and then some of us presented what we did. For the last couple of hours of the day we had the opportunity to see the client project presentations from DS11. This was a great opportunity to have an idea of what lies ahead for us.

Tuesday was very different.

This was a special day, mainly because for more than 90% of the day we didn’t have to use our computers. The whole day we had the fortune of having Caroline Beavon, a visualisation expert, teaching us good practices and techniques. For the first activity she asked us to scribble 12 different ways to present an infographic about our life. The point of the activity was about thinking about many ways of visualising our information before starting to build a dashboard. After that, we had to choose what we believed was our best idea and give it a deeper development. Having this process of reflection allow us to, chose the most suitable idea and create alternatives in case the first one didn’t work. For the next exercise we had to discuss about context, audience and message. She would present us with different situations and we would have to discuss which would be the best ways of presenting our information under that specific context. Having discussed our ideas, we had to create an infographic using pen and paper. However, before writing anything in the actual infographic we had to write each part of the data that we would use in different sticky notes. By using this technique, we could move the different pieces of data around the infographic before deciding on a final version. Later in the day we looked at different infographics and discussed the good and bad visualisation practices within them and finished with a review of different tools to build basic infographics.

Wednesday and Thursday were for Tableau fundamentals.

We finally got to use Tableau and we spent two days learning all the basic stuff. We started by looking at the most important things in the interface and quickly moved into building a variety of basic charts, create filters and learn table calculations. The classes consisted mainly in Andy showing us different things you could do with Tableau and then asking us to recreate it. We did this until Thursday, when we had the opportunity to explore the data set we would use on Friday for our projects.

Friday and Big Data.

The project was about Big Data and when I say Big Data I mean BIG. I was assigned a data set about New York taxis that had 1.2 billion data points. The database came from Exasol which made the connection super-fast. However, whilst Exasol had no problems with the database, Tableau constantly struggled to handle this amount of data. When you get this amount of data you feel powerful and you may feel inclined to attempt to use it all. This was my first mistake. I wanted to see how the drop-off and pick-up locations changed during different times of the day using page shelves and density maps. BAD IDEA. If you attempt to put 1.2 billion points in a map in Tableau, it is very likely it will crash. In my computer, anything over 100,000 thousand points presented a risk and therefore I would have to filter A LOT if I wanted to use a map. My first idea was to compare a New Year’s Eve and a 4^th of July. For this to work I also had to filter the type of cabs to only Green Cabs and limit my data set to just the selected dates and times. After doing my density map I realised there wasn’t much analysis there, but I was finally ready to let go the idea of a map. At the end I just did a line chart with some calculations and reference lines which in its simplicity, it still had a great analysis and allowed me to deal with much more data than if I would have used a map.

If you have any doubts or comments feel free to use the box below or contact me in Twitter @DiegoTParker

Author:

Diego Parker

View Profile