WITH¶
A common table expression (CTE) defines a named query that can referenced multiple times within the scope of a SQL statement. CTEs are visible within the query block in which they are defined, and they can be nested.
Both non-recursive and recursive CTEs are supported.
Syntax¶
Parameters¶
RECURSIVE
Optional keyword to enable recursive functionality. When specified, the CTE can reference itself in its query definition.
TRAVERSAL ORDER
Optional clause for recursive CTEs that specifies the traversal strategy. This clause appears before the final SELECT statement:
pre_order
- Depth-first search traversallevel_order
- Breadth-first (level-order) traversalIf not specified, the optimizer chooses the best traversal strategy (default)
cteName
The name of the common table expression.
columnAlias
The alias of the corresponding column of the CTE query.
cteQuery
The common table expression query. For recursive CTEs, this must be structured as an anchor query UNION ALL recursive query.
query
The query that can optionally reference the CTE query via its given cteName.
Important Notes¶
For all CTEs:
CTEs are visible within the query block in which they are defined
Multiple CTEs can be defined in a single WITH clause, separated by commas
CTEs can be nested within other CTEs
For Recursive CTEs:
Must use
UNION ALL
(notUNION
) between the anchor and recursive queriesThe recursive query must reference the CTE name to establish the recursive relationship
Warning: The system does not automatically prevent infinite recursion. Ensure your recursive queries have proper
termination conditions to avoid infinite loops that may cause the system to crash * Traversal strategy affects the order of results but not the final set of results * Column names and types must be consistent between anchor and recursive queries
Examples¶
Simple CTE¶
Suppose we have the following table representing an employee hierarchy:
CREATE TABLE employees(id BIGINT, name STRING, manager_id BIGINT, dept STRING, PRIMARY KEY(id))
INSERT INTO employees VALUES
(1, 'Alice', NULL, 'Executive'),
(2, 'Bob', 1, 'Engineering'),
(3, 'Carol', 1, 'Marketing'),
(4, 'David', 2, 'Engineering'),
(5, 'Eve', 2, 'Engineering'),
(6, 'Frank', 3, 'Marketing')
Here is a simple CTE query to find all engineering employees:
WITH engineering_team(name) AS (
SELECT name FROM employees WHERE dept = 'Engineering'
)
SELECT * FROM engineering_team
Result:
|
---|
|
|
|
Recursive CTEs¶
Recursive CTEs allow traversing hierarchical relationships. Consider the employee hierarchy as a tree structure:
Alice (CEO, id: 1)
├── Bob (Engineering Manager, id: 2)
│ ├── David (Engineer, id: 4)
│ └── Eve (Engineer, id: 5)
└── Carol (Marketing Manager, id: 3)
└── Frank (Marketing Associate, id: 6)
Finding All Subordinates¶
Find all employees who report to Alice (directly or indirectly):
WITH RECURSIVE subordinates AS (
-- Anchor: Start with Alice
SELECT id, name, manager_id FROM employees WHERE id = 1
UNION ALL
-- Recursive: Find direct reports of current level
SELECT e.id, e.name, e.manager_id
FROM subordinates AS s, employees AS e
WHERE s.id = e.manager_id
)
SELECT name FROM subordinates WHERE id != 1
Result with optimizer-chosen traversal:
|
---|
|
|
|
|
|
Finding Management Chain¶
Find the management chain from an employee up to the CEO:
WITH RECURSIVE management_chain AS (
-- Anchor: Start with a specific employee
SELECT id, name, manager_id FROM employees WHERE name = 'Eve'
UNION ALL
-- Recursive: Find manager of current employee
SELECT e.id, e.name, e.manager_id
FROM management_chain AS mc, employees AS e
WHERE mc.manager_id = e.id
)
SELECT name FROM management_chain
Result:
|
---|
|
|
|
Level-Order Traversal¶
Using level traversal to process the hierarchy breadth-first:
WITH RECURSIVE org_levels AS (
SELECT id, name, manager_id, 0 AS level FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, ol.level + 1
FROM org_levels AS ol, employees AS e
WHERE ol.id = e.manager_id
)
TRAVERSAL ORDER level_order
SELECT name, level FROM org_levels ORDER BY level, name
This processes all employees at level 0 (CEO), then level 1 (direct reports), then level 2, etc.
Result with level traversal (breadth-first):
|
|
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Preorder Traversal¶
Using explicit preorder traversal to process the hierarchy depth-first:
WITH RECURSIVE subordinates AS (
SELECT id, name, manager_id FROM employees WHERE id = 1
UNION ALL
SELECT e.id, e.name, e.manager_id
FROM subordinates AS s, employees AS e
WHERE s.id = e.manager_id
)
TRAVERSAL ORDER pre_order
SELECT name FROM subordinates WHERE id != 1
Result with preorder traversal (depth-first):
|
---|
|
|
|
|
|
Notice how preorder explores Bob’s entire subtree (David, Eve) before moving to Carol’s subtree (Frank), while level traversal processes all level 1 employees (Bob, Carol) before any level 2 employees.
Nested Recursive CTEs¶
Complex hierarchical operations using nested recursive CTEs to analyze relationships across multiple organizational divisions:
Consider a more complex scenario with multiple organizational divisions:
CREATE TABLE employees(id BIGINT, name STRING, manager_id BIGINT, dept STRING, PRIMARY KEY(id))
INSERT INTO employees VALUES
-- Tech Division (Alice is CEO)
(1, 'Alice', NULL, 'Executive'),
(2, 'Bob', 1, 'Engineering'),
(3, 'Carol', 1, 'Engineering'),
(4, 'David', 2, 'Engineering'),
(5, 'Eve', 2, 'Engineering'),
(6, 'Frank', 3, 'Engineering'),
(7, 'Grace', 3, 'Engineering'),
-- Sales Division (Helen is independent CEO)
(10, 'Helen', NULL, 'Sales'),
(11, 'Ivan', 10, 'Sales'),
(12, 'Julia', 10, 'Sales'),
(13, 'Kevin', 11, 'Sales'),
(14, 'Lisa', 11, 'Sales'),
-- Marketing Division (Mike is independent CEO)
(20, 'Mike', NULL, 'Marketing'),
(21, 'Nina', 20, 'Marketing'),
(22, 'Oscar', 21, 'Marketing')
This creates a forest of three separate organizational hierarchies:
Tech Division: Sales Division: Marketing Division:
Alice (CEO) Helen (CEO) Mike (CEO)
├── Bob ├── Ivan └── Nina
│ ├── David │ ├── Kevin └── Oscar
│ └── Eve │ └── Lisa
└── Carol └── Julia
├── Frank
└── Grace
Find all employees who report to any of Bob’s managers (colleagues in Bob’s management chain):
WITH RECURSIVE colleagues_of_bob AS (
-- First: Find all managers in Bob's chain of command (ancestors)
WITH RECURSIVE bob_managers AS (
SELECT id, name, manager_id FROM employees WHERE name = 'Bob'
UNION ALL
SELECT e.id, e.name, e.manager_id
FROM bob_managers AS bm, employees AS e
WHERE bm.manager_id = e.id
)
SELECT id, name, manager_id FROM bob_managers
UNION ALL
-- Second: Find all employees who report to any of Bob's managers
SELECT e.id, e.name, e.manager_id
FROM colleagues_of_bob AS c, employees AS e
WHERE c.id = e.manager_id AND NOT EXISTS (SELECT name FROM c WHERE name = e.name)
)
SELECT name FROM colleagues_of_bob WHERE name != 'Bob'
This query demonstrates several advanced concepts. The nested recursive CTEs allow the inner bob_managers
CTE to
find Bob’s management chain (ancestors), while the outer colleagues_of_bob
CTE uses those results to find all
related employees. The NOT EXISTS (SELECT name FROM c WHERE name = e.name)
predicate prevents adding employees
who are already in the result set - without this, employees could be added multiple times through different management
paths. The query uses two-phase processing: first finding Bob’s management chain (Bob → Alice), then finding all
employees who report to anyone in that chain.
Result:
|
---|
|
|
|
|
|
|
The forest structure ensures we only get employees from Bob’s organizational division (Tech), not from Sales or Marketing divisions. The existential predicate is crucial for preventing infinite loops and duplicate results in complex recursive hierarchies.
Performance Characteristics¶
The choice between LEVEL and PREORDER traversal strategies can impact query performance, but the behavior is heavily affected by the specific hierarchy profile and usage patterns. Users should carefully consider these factors when selecting a traversal strategy:
Key Considerations:
Hierarchy Profile: The structure of your data (wide vs. deep hierarchies) significantly influences which strategy performs better
Continuation Usage: If you plan to use query continuations for pagination, the continuation size differences are substantial
Dataset Size: Performance characteristics may vary significantly based on the total number of nodes in your hierarchy
Continuation Size Behavior:
One clear difference between the strategies is continuation size:
PREORDER: Generates very small continuation sizes (typically <50 bytes)
LEVEL: Produces much larger continuation sizes (often several KB to hundreds of KB)
This difference can be critical if you’re implementing pagination or need to serialize query state.
Execution Time Observations:
Initial benchmarking suggests performance differences exist between the strategies, but results vary significantly based on hierarchy characteristics. Some observations from testing:
Dataset Size |
Tree Structure |
LEVEL Time |
PREORDER Time |
Notes |
---|---|---|---|---|
1,000 nodes |
10 levels, wide branching |
405ms |
898ms |
LEVEL faster for wide trees |
10,000 nodes |
10 levels, wide branching |
3,097ms |
7,620ms |
Pattern continues |
Deep hierarchies |
100+ levels, narrow branching |
Variable |
Variable |
Results depend on specific structure |
Recommendations:
Test with your data: Performance characteristics are highly dependent on your specific hierarchy structure
Consider continuation needs: If you need small continuation sizes for pagination, PREORDER has a significant advantage
Monitor memory usage: Different strategies may have varying memory requirements based on your dataset
Benchmark your use case: Execution time performance is still under investigation and may vary significantly based on your specific queries and data patterns
The performance behavior of recursive CTEs is an active area of development and optimization. Always test with representative data before making final decisions about traversal strategies.