Custom SQL in Tableau and Unfamiliar Data | A Friday Project

by Chris Meardon

Introduction

This Friday is a single-day project, which is something we do on Fridays when there aren’t client project presentations. We have from 9am-3pm to do the task and then present from 3pm to our colleagues. The projects vary in content with the Head Coaches setting today’s out as follows.

The Project:

ar @DS18 😈,

One of the things you wanted to work on last week was exploring unfamiliar data. @Carl Allchin (UK) and I have the perfect project for you.

You will be using the Northwind database on Stacked. You are not allowed to use Alteryx nor Prep. You will need to look at the tables using SQL (limit or top N are you friends) to get an idea of what fields are available. From there, you will need to develop an idea, write the sql to create the data you need, use that as custom sql in Tableau, then build your dashboard.

If you approach this correctly, it shouldn’t be too bad. You have until 3pm (you can sit out the DS19 presentations). TIP: Keep your queries simple, think about aggregations, plan ahead.

With all of my love,

Coach Andy ❤

P.S. Here’s a cheat sheet for you to join the tables (http://www.unife.it/ing/informazione/Basi_dati/lucidi/materiali-di-laboratorio/esercizi-sql-base-di-dati-nothwind )


My initial reaction

Some background as to why this is challenging:

  1. I’ve never written SQL in Tableau
  2. I’ve spend a couple hours total writing SQL in my life

So how the Prep am I going to tackle this?

  • Write a blog of my progress because that won’t eat up my time at all will it
  • Try to stay calm within the corona virus bunker
  • Find a simple idea and try to do it well
  • Test my SQL in dBeaver before Tableau so I can experiment
  • Then port that code into Tableau
  • Realise nothing works and present half of the simple idea

Okay… so how do I even access this database?

Much confusion here for 30 minutes. Turns out in dBeaver I needed to change my active catalog to Northwind and then drill down into the schema(?) to find tables and their columns. Cool.

Now what?

SQL writing time. Now I need to write a script in SQL to access the database and do things to it.

It’s time for bugs

SQL Issues

Yeah there are too many issues to write them down. I can get back one table, but I am having issues with joining tables together. I did have it working but now it isn’t… It’s 11am. How am I two hours in?!

I HAD AN EMPTY LINE OF SCRIPT AND THIS WAS THE ISSUE ALL ALONG I HATE IT

So now that escapade is over, let’s actually write some new script and get some things going. I think I’ll try make some kind of report on employees and what orders they’re associated with. Then maybe add some information in about products. I’m basically treating it like it’s the superstore dataset in tableau.

It would probably be wise at this point to try and put some simple SQL in Tableau Desktop and see how that all works.

To write custom SQL in Tableau desktop you need to first connect to your database. Then in the data connection window you can drag the “New Custom SQL” button onto the right.

Why did I expect it to be easy

Back to debugging…

Turns out Tableau doesn’t like field names that are the same, whereas dBeaver doesn’t seem to mind. Easy fix.

Maybe this is actually okay? 11:21

So here is what my query is looking like as of now (11:40):

select 
     o.OrderID,
     d.ProductID,
     d.UnitPrice,
     d.Quantity,
     d.Discount,
     o.EmployeeID,
     e.LastName,
     e.FirstName,
     e.Title,
     e.TitleOfCourtesy,
     e.BirthDate,
     e.HireDate,
     e.Address,
     e.City,
     e.Region,
     e.PostalCode,
     e.Country,
     e.HomePhone,
     p.ProductName,
     p.SupplierID,
     p. CategoryID,
     p.QuantityPerUnit,
     p.UnitPrice as 'pUnitPrice',
     p.Discontinued,
     c.CategoryName,
     c.Description,
     t.TerritoryDescription,
     r.RegionDescription
 FROM Orders o
 left join [Order Details] as d
     on o.OrderID = d.OrderID    
 left join Employees as e 
     on e.EmployeeID = o.EmployeeID
 left join Products as p 
     on p.ProductID = d.ProductID
 left join EmployeeTerritories as et 
      on et.EmployeeID = o.EmployeeID
 left join Categories as c 
     on c.CategoryID = p.CategoryID
 left join Territories as t 
     on t.TerritoryID = et.TerritoryID
 left join Region as r 
     on r.RegionID = t.RegionID

Lunchtime (12:00)

I’m back (12:44)

chris you forgot to write something on that entry…

I’m back again (13:34)

I’m trying to find that story and I haven’t found much of interest yet. I was exploring shipping delay but it was not so fruitful.

My eyes hurt and I’ve not got anything to show (14:08)

I went down the route of making a mobile dashboard that is intended to give employees a monthly report.

I’ve made a thing and it’s presentation time (15:01)

Do I like what I made? Not really no. I like the KPIs, I feel like they can quickly show the information that would be important in a weekly report. Making them is somewhat time intensive as I essentially have to create four calculations for each measure, none of which being that challenging though. Read about them here. They could perhaps be improved by showing how this monthly report generally breaks down by week or day so that the employee could get a little more context for their sales figure that week (i.e. noticing Friday was a massive day and that is why it is so high). This image by Luke Stanke could serve as good inspiration. Another improvement I’d have like to have done is to have the reference numbers (i.e. the previous month could instead by the average month in the last 4 months or similar).

The top 5 charts do not add much to this dashboard in my opinion aside from a little more detail about their sales figures. I struggled to come up with any further useful analytics in the time I had. Something I was planning on trying was adding a reference line to each of the bars to compare the sales in each of the categories to a previous period (perhaps the monthly average of the last 4 months). Because it’s really the difference in sales to a similar period that I would assume to be important to the employee. Perhaps then the ranking (i.e. top 5) could then be the top 5 improved categories and the bottom 5 instead of top 5 total absolute value. This would then be identifying successes and failures in sales of different products across their categories, territories etc.

The SQL was fun and smoother than I expected though, so it was good experience writing it. I think I had a good idea what I was doing from the beginning and roughly knew what I wanted to achieve when writing the SQL. I think I made a mistake though as I was missing some countries in the end. This part took the longest but that is appropriate because it is new to me and I needed the time to experiment, learn and debug.

So what was the problem?
I started making a dashboard about shipping delays, but this turned out to not be a very good direction to take. I expected to find difference across time and categories, perhaps identifying problems with their logistics and to then have an exploratory dashboard to explore this measure. However, I couldn’t get anything worth presenting out of it so I changed track. The final direction I took the dashboard was one that didn’t have to be that interesting and instead had a clear purpose, which made things easier, but I only had an hour once I had got here. If I had made this decision at the beginning, sure I could have spent all the dashboarding time making it and would have gotten further. I had to do this exploration to be able to make the topic decision. In our training, we have been focussing on exploring data early on to focus the preparation to only what you need and then already have an idea of how to design a dashboard around it. This is challenging to do well. Perhaps I could have explored the table in the database to explore any patterns in shipping delay at the start and then made a decision not to pursue it. Something to work on.

The data prep, without using data prep software, took me 4 or so hours of the day, which is by far the lions share, but I knew this was going to be the case. I wrote right at the start of this blog that I was going to try and create something simple and I did try this. Putting this into practice can be quite challenging though.

Take aways?

  • custom SQL is Tableau is a thing that can be done
  • SQL isn’t so scary
  • I can work on my planning to be more efficient with my time
  • working from home makes me miss my colleagues

Overall it was a good project and I’m glad to have written to some custom SQL in Tableau Desktop

Breaking Your Design Habits

5 mins read

Fri 27 Mar 2020

How to create gifs for your blogs

3 mins read

Fri 27 Mar 2020

Blogging - How to get the ball rolling and my experience

5 mins read

Fri 27 Mar 2020

Alteryx Macros vs Python Loops

2 mins read

Tue 24 Mar 2020