Working with CTEs (Common Table Expressions)¶
In this Topic:
What is a CTE?¶
A CTE (common table expression) is a named subquery defined in a WITH
clause. You can think of the CTE as a
temporary view for use in the statement that defines the CTE. The CTE defines
the temporary view’s name, an optional list of column names, and a query expression (i.e. a SELECT statement). The
result of the query expression is effectively a table. Each column of that table corresponds to a column in the
(optional) list of column names.
Here is an example of a query that uses a CTE:
with
my_cte (cte_col_1, cte_col_2) AS (
select col_1, col_2
from ...
)
select ... from my_cte;
In this example, the CTE is the following portion:
my_cte (cte_col_1, cte_col_2) AS (
select col_1, col_2
from ...
)
A CTE can be recursive or non-recursive. A recursive CTE is a CTE that references itself. A recursive CTE can join a table to itself as many times as necessary to process hierarchical data in the table.
CTEs can help make queries more modular and thus easier to maintain.
Recursive CTEs and Hierarchical Data¶
Recursive CTEs enable you to process hierarchical data, such as a parts explosion (component, sub-components) or a management hierarchy (manager, employees). For more information about hierarchical data, and other ways to query hierarchical data, see Querying Hierarchical Data.
A recursive CTE allows you to join all the levels of a hierarchy without knowing in advance how many levels there are.
Overview of Recursive CTE Syntax¶
This section provides only an overview of the syntax and how the syntax relates to the way that the recursion works. The syntax is described in more detail in the documentation of the WITH clause.
Note that, despite the name “Recursive CTE”, recursive CTEs use iteration, not recursion.
WITH [ RECURSIVE ] <cte_name> AS
(
<anchor_clause>
UNION ALL
<recursive_clause>
)
SELECT ... FROM ...;
The anchor_clause
selects an initial row or set of rows that represent the top of the hierarchy. For
example, if you are trying to display all the employees in a company, the anchor clause would select the President
of the company. The anchor clause is a SELECT statement and can contain any SQL construct allowed in a
SELECT
clause. The anchor clause cannot reference the cte_name
.
The recursive_clause
selects the next layer of the hierarchy based on the previous layer. In the first
iteration, the previous layer is the result set from the anchor clause. In subsequent iterations, the previous
layer is the most recent completed iteration.
The recursive_clause
is a SELECT statement. The recursive_clause
is restricted to projections,
joins, and filters. It must not contain aggregate functions, window functions, GROUP BY
, ORDER BY
, LIMIT
,
or DISTINCT
. The recursive clause can reference the cte_name
like a regular table or view.
Logically, the recursive CTE is evaluated as follows:
The
anchor_clause
is evaluated and its result is written to both the final result set and to a working table. Thecte_name
is effectively an alias to that working table; in other words, a query referencing thatcte_name
reads from that working table.While the working table is not empty:
The
recursive_clause
is evaluated, using the current contents of the working table wherevercte_name
is referenced.The result of
recursive_clause
is written to both the final result set and a temp table.The working table is overwritten by the content of the temp table.
Effectively, the output of the previous iteration is stored in a working
table named cte_name
, and that table is then one of the inputs to
the next iteration. The working table contains only the result of the most
recent iteration. The accumulated results from all iterations so far are
stored elsewhere.
After the final iteration, the accumulated results are available to the main
SELECT
statement by referencing the cte_name
.
Recursive CTE Considerations¶
Potential for Infinite Loops¶
In theory, constructing a recursive CTE incorrectly can cause an infinite loop. In practice, Snowflake prevents this by limiting the number of iterations that the recursive clause will perform in a single query. The MAX_RECURSIONS parameter limits the number of iterations. To change MAX_RECURSIONS, contact customer support.
Non-Contiguous Hierarchies¶
This topic described hierarchies and how parent-child relationships can be used by recursive CTEs. In all of this topic’s examples, the hierarchies were contiguous.
For information about non-contiguous hierarchies, see Non-Contiguous Hierarchies.
Examples¶
This section shows both non-recursive CTEs and recursive CTEs to contrast the two.
Using a Non-Recursive Two-Level Self-join¶
This example uses a table of employees and managers:
CREATE OR REPLACE TABLE employees (title VARCHAR, employee_ID INTEGER, manager_ID INTEGER);INSERT INTO employees (title, employee_ID, manager_ID) VALUES ('President', 1, NULL), -- The President has no manager. ('Vice President Engineering', 10, 1), ('Programmer', 100, 10), ('QA Engineer', 101, 10), ('Vice President HR', 20, 1), ('Health Insurance Analyst', 200, 20);
A two-level self-join of this employee table looks like:
SELECT emps.title, emps.employee_ID, mgrs.employee_ID AS MANAGER_ID, mgrs.title AS "MANAGER TITLE" FROM employees AS emps LEFT OUTER JOIN employees AS mgrs ON emps.manager_ID = mgrs.employee_ID ORDER BY mgrs.employee_ID NULLS FIRST, emps.employee_ID; +----------------------------+-------------+------------+----------------------------+ | TITLE | EMPLOYEE_ID | MANAGER_ID | MANAGER TITLE | |----------------------------+-------------+------------+----------------------------| | President | 1 | NULL | NULL | | Vice President Engineering | 10 | 1 | President | | Vice President HR | 20 | 1 | President | | Programmer | 100 | 10 | Vice President Engineering | | QA Engineer | 101 | 10 | Vice President Engineering | | Health Insurance Analyst | 200 | 20 | Vice President HR | +----------------------------+-------------+------------+----------------------------+
The query above shows all the employees. Each manager’s employees appear near that manager in the report. However, the report doesn’t visually show the hierarchy as a hierarchy. Without looking carefully at the data, you don’t know how many levels there are in the organization, and you need to read each row in order to see which employees are associated with a specific manager.
A recursive CTE can display this hierarchical data as a sideways tree, as shown in the next section.
Using a Recursive CTE to Indent Output¶
Below are two examples of using a recursive CTE. The first uses indentation to show the different levels of the hierarchy. To simplify this example, the code does not produce the rows in a particular order. The second example uses indentation and shows each manager’s employees immediately below that manager.
Unordered Output¶
Here is the first example:
-- This is the WITH clause, of course. WITH RECURSIVE managers -- Column names for the "view"/CTE (indent, employee_ID, manager_ID, employee_title) AS -- Common Table Expression ( -- Anchor Clause SELECT '' AS indent, employee_ID, manager_ID, title AS employee_title FROM employees WHERE title = 'President' UNION ALL -- Recursive Clause SELECT indent || '--- ', employees.employee_ID, employees.manager_ID, employees.title FROM employees JOIN managers ON employees.manager_ID = managers.employee_ID ) -- This is the "main select". SELECT indent || employee_title AS Title, employee_ID, manager_ID FROM managers ; +----------------------------------+-------------+------------+ | TITLE | EMPLOYEE_ID | MANAGER_ID | |----------------------------------+-------------+------------| | President | 1 | NULL | | --- Vice President Engineering | 10 | 1 | | --- Vice President HR | 20 | 1 | | --- --- Programmer | 100 | 10 | | --- --- QA Engineer | 101 | 10 | | --- --- Health Insurance Analyst | 200 | 20 | +----------------------------------+-------------+------------+
The CTE contains two SELECT statements.
The first of those SELECT statements is the anchor clause. This is executed once and provides the set of rows from the first (top) level of the hierarchy.
The second SELECT in the CTE is the recursive clause. This clause can reference the CTE. You can think of the query as iterating, with each iteration building on the previous iterations’s query results.
In the manager/employee example, the anchor clause emits the first row, which is the row that describes the company president.
In the next iteration of the recursive clause, the recursive clause finds all the rows whose manager is the company president (i.e. it finds all of the vice presidents). The 3rd iteration finds all the employees whose manager is one of the vice presidents. Iteration continues until there is an iteration in which all of the rows retrieved are rows of leaf-level employees who do not manage anyone. The statement does one more iteration, looking for (but not finding) any employees whose managers are leaf-level employees. That iteration produces 0 rows, and the iteration halts.
Throughout these iterations, the UNION ALL accumulates the results.
The results of each iteration are added to the results of the previous iterations.
After the last iteration completes, the accumulated rows (like any rows
produced in a WITH
clause) are made available to the main SELECT
clause in the query. That main SELECT
can then query those rows.
This particular example query uses indentation to show the hierarchical nature of the data. If you look at the output, you’ll see that the lower the level of the employee, the further that employee’s data is indented.
The indentation is controlled by the column named “indent”. The indentation starts at 0 characters (an empty string in the anchor clause), and increases by 4 characters (“— “) for each iteration, i.e. for each level in the hierarchy.
Not surprisingly, it’s very important to construct the join(s) correctly, and to select the correct columns in the recursive clause. The columns in the SELECT of the recursive clause must correspond correctly to the columns in the anchor clause. Remember that the query starts with the President, then selects the Vice Presidents, and then selects the people who report directly to the Vice Presidents, etc. Each iteration looks for employees whose manager_ID field corresponds to one of the managers.employee_ID values produced in the previous iteration. Expressed another way, the employee_ID in the managers “view” is the manager ID for the next level of employees. The employee_IDs must progress downward through the hierarchy (President, Vice President, senior manager, junior manager…) during each iteration. If the employee_IDs don’t progress, then the query can infinite loop (if the same manager_ID keeps appearing in the managers.employee_ID column in different iterations), or skip a level, or fail in other ways.
Ordered Output¶
The previous example had no ORDER BY clause, so even though each employee’s record is indented properly, each employee did not necessarily appear directly underneath her manager. The example below generates output with correct indentation, and with each manager’s employees directly underneath that manager.
The query’s ORDER BY clause uses an additional column, named “sort_key”. The sort key accumulates as the recursive clause iterates; you can think of the sort key as a string that contains the entire chain of command above you (your manager, your manager’s manager, etc.). The most senior person in that chain of command (the President) is at the beginning of the sort key string. Although you normally wouldn’t display the sort key, the query below includes the sort key in the output so that it’s easier to understand the output.
Each iteration should increase the length of the sort key by the same amount
(same number of characters), so the query uses a UDF (user-defined function)
named SKEY
to generate consistent-length segments of the sort key. The
definition of SKEY
is:
CREATE OR REPLACE FUNCTION skey(ID VARCHAR) RETURNS VARCHAR AS $$ SUBSTRING('0000' || ID::VARCHAR, -4) || ' ' $$ ;Here is an example of output from the
SKEY
function:SELECT skey(12); +----------+ | SKEY(12) | |----------| | 0012 | +----------+
Here is the final version of the query. This puts each manager’s employees immediately underneath that manager, and indents based on the “level” of the employee:
WITH RECURSIVE managers -- Column list of the "view" (indent, employee_ID, manager_ID, employee_title, sort_key) AS -- Common Table Expression ( -- Anchor Clause SELECT '' AS indent, employee_ID, manager_ID, title AS employee_title, skey(employee_ID) FROM employees WHERE title = 'President' UNION ALL -- Recursive Clause SELECT indent || '--- ', employees.employee_ID, employees.manager_ID, employees.title, sort_key || skey(employees.employee_ID) FROM employees JOIN managers ON employees.manager_ID = managers.employee_ID ) -- This is the "main select". SELECT indent || employee_title AS Title, employee_ID, manager_ID, sort_key FROM managers ORDER BY sort_key ; +----------------------------------+-------------+------------+-----------------+ | TITLE | EMPLOYEE_ID | MANAGER_ID | SORT_KEY | |----------------------------------+-------------+------------+-----------------| | President | 1 | NULL | 0001 | | --- Vice President Engineering | 10 | 1 | 0001 0010 | | --- --- Programmer | 100 | 10 | 0001 0010 0100 | | --- --- QA Engineer | 101 | 10 | 0001 0010 0101 | | --- Vice President HR | 20 | 1 | 0001 0020 | | --- --- Health Insurance Analyst | 200 | 20 | 0001 0020 0200 | +----------------------------------+-------------+------------+-----------------+
The next query shows how to reference a field from the previous (higher) level in the hierarchy; pay particular attention to the mgr_title column:
WITH RECURSIVE managers -- Column names for the "view"/CTE (employee_ID, manager_ID, employee_title, mgr_title) AS -- Common Table Expression ( -- Anchor Clause SELECT employee_ID, manager_ID, title AS employee_title, NULL AS mgr_title FROM employees WHERE title = 'President' UNION ALL -- Recursive Clause SELECT employees.employee_ID, employees.manager_ID, employees.title, managers.employee_title AS mgr_title FROM employees JOIN managers ON employees.manager_ID = managers.employee_ID ) -- This is the "main select". SELECT employee_title AS Title, employee_ID, manager_ID, mgr_title FROM managers ORDER BY manager_id NULLS FIRST, employee_ID ; +----------------------------+-------------+------------+----------------------------+ | TITLE | EMPLOYEE_ID | MANAGER_ID | MGR_TITLE | |----------------------------+-------------+------------+----------------------------| | President | 1 | NULL | NULL | | Vice President Engineering | 10 | 1 | President | | Vice President HR | 20 | 1 | President | | Programmer | 100 | 10 | Vice President Engineering | | QA Engineer | 101 | 10 | Vice President Engineering | | Health Insurance Analyst | 200 | 20 | Vice President HR | +----------------------------+-------------+------------+----------------------------+
Parts Explosion¶
Manager/employee hierarchies are not the only type of variable-depth hierarchies that you can store in a single table and process with a recursive CTE. Another common example of hierarchical data is a “parts explosion”, in which each component can be listed with its sub-components, each of which can be listed with its sub-sub-components.
For example, suppose that your table contains hierarchical data, such as the components of a car. Your car probably contains components such as an engine, wheels, etc. Many of those components contain sub-components (e.g. an engine might contain a fuel pump). The fuel pump might contain a motor, tubing, etc. You could list all the components and their sub-components using a recursive CTE.
For an example of a query that produces a parts explosion, see WITH.
Troubleshooting a Recursive CTE¶
Recursive CTE Query Runs Until it Times Out or Reaches the Maximum Iteration Limit¶
- Cause 1
Your data hierarchy might have a cycle (i.e. it might not be a true tree).
- Solution 1
Possible solutions include:
Correct the data, if it is not supposed to contain a cycle.
Limit the query in some way; e.g. limit the number of rows of output. For example:
WITH RECURSIVE t(n) AS ( SELECT 1 UNION ALL SELECT N + 1 FROM t ) SELECT n FROM t LIMIT 10;
Do not use a query, such as a recursive CTE, that expects hierarchical data.
- Cause 2
You might have created an infinite loop. This can happen if:
The projection clause in the
recursive_clause
outputs a value from the “parent” (the previous iteration) instead of the “child” (the current iteration) and then the next iteration uses that value in a join when it should use the current iteration’s value in the join.The following pseudo-code shows an approximate example of this:
CREATE TABLE employees (employee_ID INT, manager_ID INT, ...); INSERT INTO employees (employee_ID, manager_ID) VALUES (1, NULL), (2, 1); WITH cte_name (employee_ID, manager_ID, ...) AS ( -- Anchor Clause SELECT employee_ID, manager_ID FROM table1 UNION ALL SELECT manager_ID, employee_ID -- <<<<< WRONG! FROM table1 JOIN cte_name ON table1.manager_ID = cte_name.employee_ID ) SELECT ...
In this example, the recursive clause passes its parent value (the manager_ID) in the column that should have the current/child value (the employee_ID). The parent will show up as the “current” value in the next iteration, and will be passed again as the “current” value to the following generation, so the query never progresses down through the levels; it keeps processing the same level each time.
Step 1:
Suppose that the anchor clause selects the values
employee_ID = 1
andmanager_ID = NULL
.CTE:
employee_ID manager_ID ----------- --------- 1 NULL
Step 2:
During the first iteration of the recursive clause, in
table1
,employee_ID = 2
andmanager_ID = 1
.START:
CTE:
employee_ID manager_ID ----------- ---------- 1 NULL
table1:
employee_ID manager_ID ----------- ---------- ... 2 1 ...
Result of join in recursive clause:
table1.employee_ID table1.manager_ID cte.employee_ID cte.manager_ID ----------------- ----------------- --------------- -------------- ... 2 1 1 NULL ...
Projection:
employee_ID manager_ID ----------- ---------- ... 2 1 ...
But because the
employee_ID
andmanager_ID
columns are reversed in the projection, the actual output of the query (and thus the content of the CTE at the start of the next iteration) is:employee_ID manager_ID ----------- ---------- ... 1 2 -- Because manager and employee IDs reversed ...
Step 3:
During the second iteration of the recursive clause:
START:
CTE:
employee_ID manager_ID ----------- ---------- 1 2
table1:
employee_ID manager_ID ----------- ---------- ... 2 1 ...
Result of join in recursive clause:
table1.employee_ID table1.manager_ID cte.employee_ID cte.manager_ID ----------------- ----------------- --------------- -------------- ... 2 1 1 2 ...
Projection:
employee_ID manager_ID ----------- ---------- ... 2 1 ...
Result of query (contents of CTE at start of next iteration):
employee_ID manager_ID ----------- ---------- ... 1 2 -- Because manager and employee IDs reversed ...
As you can see, at the end of the second iteration, the row in the CTE is the same as it was at the start of the iteration:
employee_ID
is1
.manager_ID
is2
.
Thus, the result of the join during the next iteration will be the same as the result of the join during the current iteration, and the query infinite loops.
- Solution 2
Make sure that the recursive clause passes the correct variables in the correct order.
Also make sure that the
JOIN
condition in the recursive clause is correct. In a typical case, the parent of the “current” row should be joined to the child/current value of the parent row.