Fact & Dimension Tables

A core concept in data preparation is understanding data sources and architecture. A key part of this is understanding fact and dimension tables, which are used to build schemas like the star schema and snowflake schema. These models are designed to make data analysis faster and more intuitive by organising data in a structured way.

So, what are Fact and Dimension Tables?

In reality, data rarely comes from a single, perfectly prepared source. Instead data comes from various sources and has to be connected. Fact and dimension tables provide a framework for these connections, separating the data into two distinct types.

Fact Tables: These tables hold the records of the data set. Typically storing quantitative data, or metrics with keys that link to the dimension table. See example below

Fact Table: Sales

SaleID	DateID	ProductID	CustomerID	Quantity	TotalPrice
101	20230101	501	12	2	25.98
102	20230102	502	13	1	99.99
103	20230103	501	12	3	38.97
...	...	...	...	...	...

Dimension Tables: These tables hold the details about the categorical fields in the data set. The data is usually qualitative, or attributes that provide context to the metrics in the fact table. See example below

Dimension Table: Product

ProductID	ProductName	Category	Price
501	T-Shirt	Apparel	12.99
502	Headphones	Electronics	99.99
...	...	...	...

Why Use Them?

The separation of data into fact and dimension tables is used to build efficient data models. This structure helps to reduce data redundancy, improve query performance, and make it easier for people to understand and analyse the data.

Author:

James Gastaldello

View Profile