What is Data Vault 2.0 in context?

When learning about the type of data modelling Data Vault 2.0, it can help to think in terms of an actual example. We used the Star Wars API (SWAPI) to get a hands-on feel for how data vault modelling works, and how naturally it can emerge when working with real-world (or real-galaxy) datasets.

What is Data Vault 2.0?

At its core, Data Vault 2.0 is a way of modelling data that keeps track of what changed, when, and where it came from. It separates business concepts (called hubs), relationships (links), and descriptive or changing information (satellites).

Hubs represent unique business entities. Think: film, person, or species.
Links connect hubs to each other. Think: a person appearing in a film.
Satellites store descriptive information (often slowly changing) about hubs or links, along with metadata like source and timestamp.

The key design idea: separate what something is, how it relates to other things, and what we know about it. That way, data is flexible, historical, and scalable.

Where does it sit with other data modelling styles?

In an oversimplified way, Data Vault 2.0 can be mostly thought of a multi-fact table, where fact tables are joined to each other via links, and satellites are dimension tables.
Satellites are expected to be timestamped, source aligned, and slowly changing (keep historic data).
Hubs are business logic separated.

Data Vault 2.0 is closest to the extract layer - you would not separate a denormalised table into a data vault architecture like you might separate Sample Superstore into normalised tables - you would extract it already business separated by the systems itself.

Hash Keys (HKs): Making Unique IDs

In Data Vault 2.0, unique IDs are often made using a hash function. If two people hash the same combination of values, they’ll get the same ID—so it’s deterministic, portable, and consistent.

For example, instead of using raw IDs like person_id = 3, we might use a hash of "people|3" .

Lots of SQL servers have hashing functions! You can make use of these!

The Star Wars API: A Natural Fit

The Star Wars API returns JSON data for objects like films, people, and vehicles. Each item includes:

Plain values like "title": "A New Hope"
Lists of links to other items, like: "characters": ["https://swapi.info/api/people/1", "https://swapi.info/api/people/2"]

From this, a data vault structure naturally emerges.

Example: Films Endpoint

For each film:

A hub called Film holds the film’s identity (like title or episode).
A satellite called Film_Descriptive holds changing data (like director, producer).
A link like Film_People_Link connects films to characters, based on the "characters" array in the JSON.

We can repeat this for People, Species, and more.

Turning Nested JSON into Tables

When converting from JSON:

Each item becomes a row.
Simple key-value pairs become columns.
Any array field (like a list of vehicles or species) becomes a linking table.
Each API category (films, people, species...) becomes a hub.
You build links between hubs based on the cross-references.

The result is a multi-fact star structure where facts (films, people) are connected through links. It’s like a highly structured galaxy of data.

Visual Example: Films, People, and Species

This is a basic vault model for just three of the Star Wars API endpoints. Even with just these, you can see the value of hubs and links in keeping things clean and extensible.

Going Even Further

You can add additional normalisation to your satellites or fact-like tables if needed. That might help if you want to remove duplication, prepare for updates, or optimise performance in a warehouse.

But remember: Data Vault is already focused on capturing history and separating concerns. Over-normalising can make analysis harder, so balance clarity with performance and purpose.

Conclusion

Whether you’re a data modeler or a Jedi in training, Data Vault 2.0 offers a clear framework for managing changing data with purpose. And if you want a fun way to practise, the Star Wars API is a great galaxy to explore.

Let the metadata be with you.

Author:

Jeffrey Brian Thompson

View Profile