Working with CTEs (Common Table Expressions)

In this Topic:

What is a CTE?

A CTE (common table expression) is a named subquery defined in a WITH clause. You can think of the CTE as a temporary view for use in the statement that defines the CTE. The CTE defines the temporary view’s name, an optional list of column names, and a query expression (i.e. a SELECT statement). The result of the query expression is effectively a table. Each column of that table corresponds to a column in the (optional) list of column names.

Here is an example of a query that uses a CTE:

with
    my_cte (cte_col_1, cte_col_2) AS (
        select col_1, col_2
            from ...
    )
select ... from my_cte;

In this example, the CTE is the following portion:

my_cte (cte_col_1, cte_col_2) AS (
    select col_1, col_2
        from ...
)

A CTE can be recursive or non-recursive. A recursive CTE is a CTE that references itself. A recursive CTE can join a table to itself as many times as necessary to process hierarchical data in the table.

CTEs can help make queries more modular and thus easier to maintain.

Recursive CTEs and Hierarchical Data

Recursive CTEs enable you to process hierarchical data, such as a parts explosion (component, sub-components) or a management hierarchy (manager, employees). For more information about hierarchical data, and other ways to query hierarchical data, see Querying Hierarchical Data.

A recursive CTE allows you to join all the levels of a hierarchy without knowing in advance how many levels there are.

Overview of Recursive CTE Syntax

This section provides only an overview of the syntax and how the syntax relates to the way that the recursion works. The syntax is described in more detail in the documentation of the WITH clause.

Note that, despite the name “Recursive CTE”, recursive CTEs use iteration, not recursion.

WITH [ RECURSIVE ] <cte_name> AS
(
  <anchor_clause>
  UNION ALL
  <recursive_clause>
)
SELECT ... FROM ...;

The anchor_clause selects an initial row or set of rows that represent the top of the hierarchy. For example, if you are trying to display all the employees in a company, the anchor clause would select the President of the company. The anchor clause is a SELECT statement and can contain any SQL construct allowed in a SELECT clause. The anchor clause cannot reference the cte_name.

The recursive_clause selects the next layer of the hierarchy based on the previous layer. In the first iteration, the previous layer is the result set from the anchor clause. In subsequent iterations, the previous layer is the most recent completed iteration.

The recursive_clause is a SELECT statement. The recursive_clause is restricted to projections, joins, and filters. It must not contain aggregate functions, window functions, GROUP BY, ORDER BY, LIMIT, or DISTINCT. The recursive clause can reference the cte_name like a regular table or view.

Logically, the recursive CTE is evaluated as follows:

  • The anchor_clause is evaluated and its result is written to both the final result set and to a working table. The cte_name is effectively an alias to that working table; in other words, a query referencing that cte_name reads from that working table.

  • While the working table is not empty:

    • The recursive_clause is evaluated, using the current contents of the working table wherever cte_name is referenced.

    • The result of recursive_clause is written to both the final result set and a temp table.

    • The working table is overwritten by the content of the temp table.

Effectively, the output of the previous iteration is stored in a working table named cte_name, and that table is then one of the inputs to the next iteration. The working table contains only the result of the most recent iteration. The accumulated results from all iterations so far are stored elsewhere.

After the final iteration, the accumulated results are available to the main SELECT statement by referencing the cte_name.

Recursive CTE Considerations

Potential for Infinite Loops

In theory, constructing a recursive CTE incorrectly can cause an infinite loop. In practice, Snowflake prevents this by limiting the number of iterations that the recursive clause will perform in a single query. The MAX_RECURSIONS parameter limits the number of iterations. To change MAX_RECURSIONS, contact customer support.

Non-Contiguous Hierarchies

This topic described hierarchies and how parent-child relationships can be used by recursive CTEs. In all of this topic’s examples, the hierarchies were contiguous.

For information about non-contiguous hierarchies, see Non-Contiguous Hierarchies.

Examples

This section shows both non-recursive CTEs and recursive CTEs to contrast the two.

Using a Non-Recursive Two-Level Self-join

This example uses a table of employees and managers:

CREATE OR REPLACE TABLE employees (title VARCHAR, employee_ID INTEGER, manager_ID INTEGER);
INSERT INTO employees (title, employee_ID, manager_ID) VALUES
    ('President', 1, NULL),  -- The President has no manager.
        ('Vice President Engineering', 10, 1),
            ('Programmer', 100, 10),
            ('QA Engineer', 101, 10),
        ('Vice President HR', 20, 1),
            ('Health Insurance Analyst', 200, 20);

A two-level self-join of this employee table looks like:

SELECT
     emps.title,
     emps.employee_ID,
     mgrs.employee_ID AS MANAGER_ID, 
     mgrs.title AS "MANAGER TITLE"
  FROM employees AS emps LEFT OUTER JOIN employees AS mgrs
    ON emps.manager_ID = mgrs.employee_ID
  ORDER BY mgrs.employee_ID NULLS FIRST, emps.employee_ID;
+----------------------------+-------------+------------+----------------------------+
| TITLE                      | EMPLOYEE_ID | MANAGER_ID | MANAGER TITLE              |
|----------------------------+-------------+------------+----------------------------|
| President                  |           1 |       NULL | NULL                       |
| Vice President Engineering |          10 |          1 | President                  |
| Vice President HR          |          20 |          1 | President                  |
| Programmer                 |         100 |         10 | Vice President Engineering |
| QA Engineer                |         101 |         10 | Vice President Engineering |
| Health Insurance Analyst   |         200 |         20 | Vice President HR          |
+----------------------------+-------------+------------+----------------------------+

The query above shows all the employees. Each manager’s employees appear near that manager in the report. However, the report doesn’t visually show the hierarchy as a hierarchy. Without looking carefully at the data, you don’t know how many levels there are in the organization, and you need to read each row in order to see which employees are associated with a specific manager.

A recursive CTE can display this hierarchical data as a sideways tree, as shown in the next section.

Using a Recursive CTE to Indent Output

Below are two examples of using a recursive CTE. The first uses indentation to show the different levels of the hierarchy. To simplify this example, the code does not produce the rows in a particular order. The second example uses indentation and shows each manager’s employees immediately below that manager.

Unordered Output

Here is the first example:

-- This is the WITH clause, of course.
WITH RECURSIVE managers 
      -- Column names for the "view"/CTE
      (indent, employee_ID, manager_ID, employee_title) 
    AS
      -- Common Table Expression
      (

        -- Anchor Clause
        SELECT '' AS indent, employee_ID, manager_ID, title AS employee_title
          FROM employees
          WHERE title = 'President'

        UNION ALL

        -- Recursive Clause
        SELECT indent || '--- ',
            employees.employee_ID, employees.manager_ID, employees.title
          FROM employees JOIN managers 
            ON employees.manager_ID = managers.employee_ID
      )

  -- This is the "main select".
  SELECT indent || employee_title AS Title, employee_ID, manager_ID
    FROM managers
  ;
+----------------------------------+-------------+------------+
| TITLE                            | EMPLOYEE_ID | MANAGER_ID |
|----------------------------------+-------------+------------|
| President                        |           1 |       NULL |
| --- Vice President Engineering   |          10 |          1 |
| --- Vice President HR            |          20 |          1 |
| --- --- Programmer               |         100 |         10 |
| --- --- QA Engineer              |         101 |         10 |
| --- --- Health Insurance Analyst |         200 |         20 |
+----------------------------------+-------------+------------+

The CTE contains two SELECT statements.

The first of those SELECT statements is the anchor clause. This is executed once and provides the set of rows from the first (top) level of the hierarchy.

The second SELECT in the CTE is the recursive clause. This clause can reference the CTE. You can think of the query as iterating, with each iteration building on the previous iterations’s query results.

In the manager/employee example, the anchor clause emits the first row, which is the row that describes the company president.

In the next iteration of the recursive clause, the recursive clause finds all the rows whose manager is the company president (i.e. it finds all of the vice presidents). The 3rd iteration finds all the employees whose manager is one of the vice presidents. Iteration continues until there is an iteration in which all of the rows retrieved are rows of leaf-level employees who do not manage anyone. The statement does one more iteration, looking for (but not finding) any employees whose managers are leaf-level employees. That iteration produces 0 rows, and the iteration halts.

Throughout these iterations, the UNION ALL accumulates the results. The results of each iteration are added to the results of the previous iterations. After the last iteration completes, the accumulated rows (like any rows produced in a WITH clause) are made available to the main SELECT clause in the query. That main SELECT can then query those rows.

This particular example query uses indentation to show the hierarchical nature of the data. If you look at the output, you’ll see that the lower the level of the employee, the further that employee’s data is indented.

The indentation is controlled by the column named “indent”. The indentation starts at 0 characters (an empty string in the anchor clause), and increases by 4 characters (“— “) for each iteration, i.e. for each level in the hierarchy.

Not surprisingly, it’s very important to construct the join(s) correctly, and to select the correct columns in the recursive clause. The columns in the SELECT of the recursive clause must correspond correctly to the columns in the anchor clause. Remember that the query starts with the President, then selects the Vice Presidents, and then selects the people who report directly to the Vice Presidents, etc. Each iteration looks for employees whose manager_ID field corresponds to one of the managers.employee_ID values produced in the previous iteration. Expressed another way, the employee_ID in the managers “view” is the manager ID for the next level of employees. The employee_IDs must progress downward through the hierarchy (President, Vice President, senior manager, junior manager…) during each iteration. If the employee_IDs don’t progress, then the query can infinite loop (if the same manager_ID keeps appearing in the managers.employee_ID column in different iterations), or skip a level, or fail in other ways.

Ordered Output

The previous example had no ORDER BY clause, so even though each employee’s record is indented properly, each employee did not necessarily appear directly underneath her manager. The example below generates output with correct indentation, and with each manager’s employees directly underneath that manager.

The query’s ORDER BY clause uses an additional column, named “sort_key”. The sort key accumulates as the recursive clause iterates; you can think of the sort key as a string that contains the entire chain of command above you (your manager, your manager’s manager, etc.). The most senior person in that chain of command (the President) is at the beginning of the sort key string. Although you normally wouldn’t display the sort key, the query below includes the sort key in the output so that it’s easier to understand the output.

Each iteration should increase the length of the sort key by the same amount (same number of characters), so the query uses a UDF (user-defined function) named SKEY to generate consistent-length segments of the sort key. The definition of SKEY is:

CREATE OR REPLACE FUNCTION skey(ID VARCHAR)
  RETURNS VARCHAR
  AS
  $$
    SUBSTRING('0000' || ID::VARCHAR, -4) || ' '
  $$
  ;

Here is an example of output from the SKEY function:

SELECT skey(12);
+----------+
| SKEY(12) |
|----------|
| 0012     |
+----------+

Here is the final version of the query. This puts each manager’s employees immediately underneath that manager, and indents based on the “level” of the employee:

WITH RECURSIVE managers 
      -- Column list of the "view"
      (indent, employee_ID, manager_ID, employee_title, sort_key) 
    AS 
      -- Common Table Expression
      (
        -- Anchor Clause
        SELECT '' AS indent, 
            employee_ID, manager_ID, title AS employee_title, skey(employee_ID)
          FROM employees
          WHERE title = 'President'

        UNION ALL

        -- Recursive Clause
        SELECT indent || '--- ',
            employees.employee_ID, employees.manager_ID, employees.title, 
            sort_key || skey(employees.employee_ID)
          FROM employees JOIN managers 
            ON employees.manager_ID = managers.employee_ID
      )

  -- This is the "main select".
  SELECT 
         indent || employee_title AS Title, employee_ID, 
         manager_ID, 
         sort_key
    FROM managers
    ORDER BY sort_key
  ;
+----------------------------------+-------------+------------+-----------------+
| TITLE                            | EMPLOYEE_ID | MANAGER_ID | SORT_KEY        |
|----------------------------------+-------------+------------+-----------------|
| President                        |           1 |       NULL | 0001            |
| --- Vice President Engineering   |          10 |          1 | 0001 0010       |
| --- --- Programmer               |         100 |         10 | 0001 0010 0100  |
| --- --- QA Engineer              |         101 |         10 | 0001 0010 0101  |
| --- Vice President HR            |          20 |          1 | 0001 0020       |
| --- --- Health Insurance Analyst |         200 |         20 | 0001 0020 0200  |
+----------------------------------+-------------+------------+-----------------+

The next query shows how to reference a field from the previous (higher) level in the hierarchy; pay particular attention to the mgr_title column:

WITH RECURSIVE managers 
      -- Column names for the "view"/CTE
      (employee_ID, manager_ID, employee_title, mgr_title) 
    AS
      -- Common Table Expression
      (

        -- Anchor Clause
        SELECT employee_ID, manager_ID, title AS employee_title, NULL AS mgr_title
          FROM employees
          WHERE title = 'President'

        UNION ALL

        -- Recursive Clause
        SELECT 
            employees.employee_ID, employees.manager_ID, employees.title, managers.employee_title AS mgr_title
          FROM employees JOIN managers 
            ON employees.manager_ID = managers.employee_ID
      )

  -- This is the "main select".
  SELECT employee_title AS Title, employee_ID, manager_ID, mgr_title
    FROM managers
    ORDER BY manager_id NULLS FIRST, employee_ID
  ;
+----------------------------+-------------+------------+----------------------------+
| TITLE                      | EMPLOYEE_ID | MANAGER_ID | MGR_TITLE                  |
|----------------------------+-------------+------------+----------------------------|
| President                  |           1 |       NULL | NULL                       |
| Vice President Engineering |          10 |          1 | President                  |
| Vice President HR          |          20 |          1 | President                  |
| Programmer                 |         100 |         10 | Vice President Engineering |
| QA Engineer                |         101 |         10 | Vice President Engineering |
| Health Insurance Analyst   |         200 |         20 | Vice President HR          |
+----------------------------+-------------+------------+----------------------------+

Parts Explosion

Manager/employee hierarchies are not the only type of variable-depth hierarchies that you can store in a single table and process with a recursive CTE. Another common example of hierarchical data is a “parts explosion”, in which each component can be listed with its sub-components, each of which can be listed with its sub-sub-components.

For example, suppose that your table contains hierarchical data, such as the components of a car. Your car probably contains components such as an engine, wheels, etc. Many of those components contain sub-components (e.g. an engine might contain a fuel pump). The fuel pump might contain a motor, tubing, etc. You could list all the components and their sub-components using a recursive CTE.

For an example of a query that produces a parts explosion, see WITH.

Troubleshooting a Recursive CTE

Recursive CTE Query Runs Until it Times Out or Reaches the Maximum Iteration Limit

Cause 1

Your data hierarchy might have a cycle (i.e. it might not be a true tree).

Solution 1

Possible solutions include:

  • Correct the data, if it is not supposed to contain a cycle.

  • Limit the query in some way; e.g. limit the number of rows of output. For example:

    WITH RECURSIVE t(n) AS
        (
        SELECT 1
        UNION ALL
        SELECT N + 1 FROM t
       )
     SELECT n FROM t LIMIT 10;
    
  • Do not use a query, such as a recursive CTE, that expects hierarchical data.

Cause 2

You might have created an infinite loop. This can happen if:

  • The projection clause in the recursive_clause outputs a value from the “parent” (the previous iteration) instead of the “child” (the current iteration) and then the next iteration uses that value in a join when it should use the current iteration’s value in the join.

    The following pseudo-code shows an approximate example of this:

    CREATE TABLE employees (employee_ID INT, manager_ID INT, ...);
    INSERT INTO employees (employee_ID, manager_ID) VALUES
            (1, NULL),
            (2, 1);
    
    WITH cte_name (employee_ID, manager_ID, ...) AS
      (
         -- Anchor Clause
         SELECT employee_ID, manager_ID FROM table1
         UNION ALL
         SELECT manager_ID, employee_ID   -- <<<<< WRONG!
             FROM table1 JOIN cte_name
               ON table1.manager_ID = cte_name.employee_ID
      )
    SELECT ...
    

In this example, the recursive clause passes its parent value (the manager_ID) in the column that should have the current/child value (the employee_ID). The parent will show up as the “current” value in the next iteration, and will be passed again as the “current” value to the following generation, so the query never progresses down through the levels; it keeps processing the same level each time.

Step 1:

Suppose that the anchor clause selects the values employee_ID = 1 and manager_ID = NULL.

CTE:

employee_ID  manager_ID
-----------  ---------
      1         NULL

Step 2:

During the first iteration of the recursive clause, in table1, employee_ID = 2 and manager_ID = 1.

START:

CTE:

employee_ID  manager_ID
-----------  ----------
       1         NULL

table1:

employee_ID  manager_ID
-----------  ----------
 ...
       2         1
 ...

Result of join in recursive clause:

table1.employee_ID  table1.manager_ID  cte.employee_ID  cte.manager_ID
-----------------   -----------------  ---------------  --------------
 ...
       2                   1                 1                NULL
 ...

Projection:

employee_ID  manager_ID
-----------  ----------
 ...
       2         1
 ...

But because the employee_ID and manager_ID columns are reversed in the projection, the actual output of the query (and thus the content of the CTE at the start of the next iteration) is:

employee_ID  manager_ID
-----------  ----------
 ...
       1         2        -- Because manager and employee IDs reversed
 ...

Step 3:

During the second iteration of the recursive clause:

START:

CTE:

employee_ID  manager_ID
-----------  ----------
       1         2

table1:

employee_ID  manager_ID
-----------  ----------
 ...
       2         1
 ...

Result of join in recursive clause:

table1.employee_ID  table1.manager_ID  cte.employee_ID  cte.manager_ID
-----------------   -----------------  ---------------  --------------
 ...
       2                   1                 1                2
 ...

Projection:

employee_ID  manager_ID
-----------  ----------
 ...
       2         1
 ...

Result of query (contents of CTE at start of next iteration):

employee_ID  manager_ID
-----------  ----------
 ...
       1         2        -- Because manager and employee IDs reversed
 ...

As you can see, at the end of the second iteration, the row in the CTE is the same as it was at the start of the iteration:

  • employee_ID is 1.

  • manager_ID is 2.

Thus, the result of the join during the next iteration will be the same as the result of the join during the current iteration, and the query infinite loops.

Solution 2

Make sure that the recursive clause passes the correct variables in the correct order.

Also make sure that the JOIN condition in the recursive clause is correct. In a typical case, the parent of the “current” row should be joined to the child/current value of the parent row.