Understanding a schema

Out of all the new concepts introduced during my first week at The Data School, schemas were one I found especially confusing. Hopefully this blog helps simplify the idea for anyone in the same position.

In simple terms, a schema is the structure behind your analysis. It maps out how different datasets connect to one another by defining the pieces of data and the relationships between them. You can think of it as the skeleton of your data model, providing the framework that allows your analysis to function properly.

Schemas are often visualised using a fact table surrounded by dimension tables.

The fact table sits at the centre. This is typically the largest and most detailed dataset, containing the key events you want to analyse. This table normally includes measurable values, such as sales revenue or order quantities.

Around the fact table sit dimension tables, which add descriptive context to the events stored in the fact table. These tables might contain information such as product details or locations of purchases.

To make this clearer, I created a simple example using some fake purchasing data.

In this example, we have three tables:

A Purchases table
A Product table
A Customer table

The Purchases table is the most detailed table, containing individual purchase records. It records the actual transactions (the measurable values), so becomes the fact table.

The other two tables provide descriptive information (dimensions):

The Product table contains details about each product.
The Customer table contains information about the customers

To connect these tables, we use shared fields.

Product ID appears in both the Purchases and Product tables.
Customer ID appears in both the Purchases and Customer tables.

These shared fields allow us to join the tables together, enhancing the transaction data with additional context. For example, we can connect a purchase record to both the product that was bought and the customer who bought it.

I mapped this out visually using a simple star schema, which shows how the fact table links to the surrounding dimension tables through these shared fields.

While this example is intentionally simple, actual schemas can involve many more tables and relationships which can make them feel overwhelming. However, breaking the structure down into one central fact table and supporting dimension tables makes the concept much easier to understand.

For me, creating a small example like this helped clarify that schemas are not as complicated as they first appear. At their core they are simply a structured way of organising data so that different datasets can work together in analysis.

Author:

George Rycroft

View Profile