Reducing the size of a Tableau Data Extract

by Vivien Ho

Why take an extract

Extracts allow you to save and work with data locally, this generally leads to better performance and offline access. When an extract is created, this moves the data source into a .hyper format, which compatibility-wise, with tableau is quicker. Meanwhile, if for example, you were connected to a SQL database, this brings in the need for a driver i.e. something translating between the SQL database and Tableau, which takes time.

However, note that you can switch between a live connection and an extract whenever you want, or you can just refresh the extract to bring in new data.

As the extract will be saved locally and could potentially come from a very large dataset, reducing the size can aid in efficiency.

Creating an extract

In the Data Pane, right click on the data source and click extract data. This brings up this configuration window.



Physical or Logical Tables

This applies to when you have joined, or related two or more tables together.

For tables that are joined, Physical or Logical tables relates to whether those tables are stored separately or as one big table. Logical stores the one joined table, physical tables stores the individual component tables.

For tables connected using the relationship model, Physical Tables will be greyed out; as this only creates a logical table.

Generally, physical tables is advantageous when you have data sources that explode when you join them. For example, if you had 10 rows in one table which are to be joined to each row in another 100-row table, the extract if stored logically would be 1000 rows. By storing as a 10, and 100-row table, this reduces the extract size. They will be joined later on when necessary.

Extract Filters

Extract filters are at the top of the Order of Operations.

When an extract is taken, it will only retain e.g. the East region’s data, reducing the size of the extract. Similarly, this is useful if you have loads of historical data and only need the last 6 months. However, once the extract filter is applied, you can’t choose to view another region/time period when you're building your views, but you can switch to live/refresh the extract at any point.

Aggregation

Say you have products and their sales split out across several rows. The Aggregate data for visible dimensions option will aggregate up those lines, reducing the amount of rows. This option also adds a count of rows allowing you to do averages, if needed.

If there are multiple dimensions in the dataset, it looks at all the dimensions to find duplicates, only aggregating where the values in said dimensions are all the same.

In the first table, as one of the dimensions (Product) differs, these will not be aggregated, the second table will as all of the dimensions are the same, only the measures differ.

Rolling up dates


If you had loads of day-level data, but you were only interested in monthly figures, you can aggregate your extract to the month level. These five rows would be aggregated into one.

Hiding unused fields

You can also can hide the fields you don’t need, e.g. if you have only used 10 of the 95 fields in the original dataset. Note that you shouldn’t click this when you’re creating an extract for the first time and haven’t built anything. As you won’t have used any fields, all of them will be hidden. Generally, create an extract, then when you have all your charts built, take a new extract and hide all unused fields at that point.

Fri 30 Apr 2021

Fri 26 Mar 2021

Thu 25 Mar 2021

Wed 24 Mar 2021