Ever even heard of the Tableau data model? I hadn't. However understanding the fundamentals of what it is and how it will influence the way in which your data is queried in Tableau is essential to reaching your desired data set-up prior to analysis. This blog aims to provide a basic understanding of the model.
What is the Tableau Data Model?
It is a model created for every data source connected to in Tableau and consists of each table added to the canvas in the data source page. Sometimes this might be a very simple model due to just one table being dragged in, or conversely this can become quite complex when combining several tables (this might be sheets from Excel) together through either: blending, unions, joins, or relationships (note relationships are only available from Tableau version 2020.2 onwards).
Before heading into our types of data combining, it is important to understand the layers involved in the data model.
- Logical Layer: this is the default view you see in the data source page canvas. Tables can easily be dragged and dropped here however, the only way to combine data is via a relationship.
2. Physical Layer: this layer exists under the logical layer whereby if you double clicked the table 'Order' in the image above it will drill down to the physical layer. Here, unions and joins are possible. It is important to note that each table within the logical layer (known as logical tables), has it's own physical layer (where the physical tables exist).
What are relationships?
- Default method for combining data from version 2020.2 onwards.
- They define how the data source tables relate to one another based on common fields (columns), but remain as separate tables.
- Setting up a relationship requires you to specify the common fields (columns) between the two tables you have dragged and dropped on the Tableau data source page.
- They are commonly termed, "noodles" due to the connection line being curved.
The outcome of a relationship?
- All column headers from each table will appear in the data pane to be used in analysis from one data source.
- No joining occurs and the tables remain separate from one another. This means Tableau only combines the relevant tables at the time of analysis, removing any pre-aggregation required and preserving the right level of detail. Overall, offering an easy and flexible alternative to combining data.
- A union is the act of combining two or more tables by stacking themselves on top of each other (your data should increase in row size, not column size).
- You can form a union by dragging a table into the data source pane and hovering below an already existing table. A union box will appear, as demonstrated in the image below. The images also show two other ways you can create unions.
- Tableau also offers the ability to union data via a wildcard. A wildcard is characterised with an asterisk (*) in which it will match a sequence or pattern of characters in file names from a specific location.
- This location is determined by the selected connection in the 'connections' section of the data source page. However, if you wish to extend the scope of your search the bottom two checkboxes allow for this.
What are the requirements for a union?
- A union requires tables to come from the same data connection, have the same number of fields, and related fields must have matching field names and data types.
Note: if field names do not match, fields in the union will contain null values. To overcome this either make sure column headers have been cleaned prior to the union, or Tableau offers the ability to merge mismatched fields into a single field.
- A join allows you to combine data from multiple places into one table. It uses one or more common fields between the datasets to combine them. This is different to a join because it will result in more column headers.
- Tableau allows for cross database joins, however this might yield lower performance.
- A join will combine the two tables together first, and aggregate the data second. This tends to result in a much faster result than a blend (introduced later) which carries out the opposite.
- However, the downfall to a join is the risk of data loss and duplication if tables are at different levels of detail.
What are the requirements of a join?
- The data source cannot be published.
- There must be a common field to join on at the row level and they must be the same data type i.e. the field name can be different but the information and type of data within that column (the rows) must match.
- The last type is blending, which unlike relationships and joins, never truly combines data. Instead, it queries and aggregates data sources independently and then combines them within the view. Therefore, blending aggregates and then combines, which in comparison to other methods can make performance slower.
- However, a blend is a great method to use when your two different datasets are at different levels of aggregation because it does not require alteration to the aggregation level in either dataset. They remain as separate data sources.
- A blend works on a sheet-by-sheet basis by forming a connection between a common field between the two separate data sources. A common field is shown with a paper clip, of which only appears when the field has been dragged and dropped into the view.
- Whichever data source is used first, becomes the primary data source and is given a blue tick. The second data source used within the view is the secondary data source and is given an orange tick.
- A great positive of a blend is that you can simply break the paper clip (and connection) by clicking on it. Therefore, offering easily cutting off the blend and updating the view, an option no other data combination method provides.
Use blends sparingly... what are the limitations?
- They only offer a left join (primary data source acts as the left table), which may result in missing data from secondary data source.
- The data sources cannot be published as one unit, they remain separate.
- Data from secondary data sources must always be aggregated in calcs.
- They perform slower in comparison to joins.
This was a brief and rapid overview of the Tableau Data Model...in truth, there is a lot more to say. There is an immense amount of information through the Tableau help pages that I recommend accessing. However, I wrote this blog to consolidate my knowledge prior to my Tableau Certification tomorrow...hopefully it helps!