What are they?
CTE stands for Common Table Expression. Common Table Expressions are temporary data objects in SQL which present the result set of a defined query. CTEs can be referenced in other queries and exist only in the context of the query which defines them.
It may be more useful not think of them in technical terms but in metaphors. Think of CTE’s as virtual tables or bookmarks.
Syntax
The exact syntax of a CTE can vary between SQL dialects but the basic structur contains of only 3 elements and is fairly simple:
- the WITH clause
- the query
- an alias
WITH (
SELECT *
FROM *table_name*
WHERE *condition*
)Why use CTEs?
Having pre-defined bookmarks can really help in everyday life. But in SQL it gets better. Not only can CTE’s be defined and renamed they can also be linked. This might seem small but can make for an even more useful combination. Putting names on what code sections alone is a huge improvement to the status quo ante.
But above that using CTEs makes it obsolete to re-write whole code sections again and again. We can write a code sections once and if we need it again it can simply be referenced. Now we can use a CTE just like module and re-use modules where applicable.
This results in way less code lines, less complexity and more readability. But more than that modularity makes a code more structured and also more maintainable. You now have a chance to find bugs and errors because you can limit it to certain code modules compared with a giant monolithic script. Furthermore it clears up depencies in your queries since you declare them more explicit instead of having them implicitly present in your code. It may hence not surprise that CTEs are considered best practice for most scenarios.
Also, the allow you to approach data transformations in steps. Data transformation can containe of many actions, be convoluted and prone to errors. Doing it step by step is therefore immensely helpful. They are therefore the base for further applications and concepts which build on CTE’s for example dbt leverages CTE’s heavily.
Example: Comparison of customers in terms of order amount and lifetime values
(Note: The following examples stem from dbt learning materials and are from one script which is divided over here over multiple snippets for better comprensibility. Hence there is only one WITH clause at the top. You might encounter elements like {{ ref() }} which is simply another way to reference tables in dbt. Queries can be written in upper and lower case.)
This example is a good a demonstration of a stepwise extraction and transformation process. The goal is to compare the lifetime values of different customer values. We assume some kind of normalized database structure. So the information we need to compare customers are stored in different tables.
Our first CTE’s is to extract all customer data.
with customers as (
select *
from {{ ref('stg_jaffle_shop__customers') }}
),We also extract all order data.
orders as (
select *
from {{ ref('fct_orders') }}
),We then transform the order data by:
- selecting the relevant fields
- creating necessary aggreations
- grouping the data by customer id so that each row of the result set should stand for one customer (if data quality checks have been applied before)
customer_orders as (
select
customer_id,
min(order_date) as first_order_date,
max(order_date) as most_recent_order_date,
count(order_id) as number_of_orders,
sum(amount) as lifetime_value
from orders
group by 1
),We continue with a second transformation step where we:
- select and rename the relevant customer fields from our very first query
- check for blanks in numerical fields and replace thes with zeroes if present
- left join our already transformed customer_orders data. (This result in augmenting our customer result set with order data if the latter exist. A customer will continue to be in the list even without orders.)
final as (
select
customers.customer_id,
customers.first_name,
customers.last_name,
customer_orders.first_order_date,
customer_orders.most_recent_order_date,
coalesce(customer_orders.number_of_orders, 0) as number_of_orders,
coalesce(customer_orders.lifetime_value, 0) as lifetime_value
from customers
left join customer_orders using (customer_id)
)Finally we select all fields from your last and fully prepped result set.
select * from final
Funny sidenote: CTEs are recursive which means they can call themselves 😋
