Using Sequences in Queries

Sequences are used to generate unique numbers across sessions and statements, including concurrent statements. They can be used to generate values for a primary key or any column that requires a unique value.

In this Topic:

Referencing Sequences in Queries

Gaps in Sequences

Snowflake sequences give no guarantees about gaps in sequence numbers. Users requiring a contiguous set of numbers must implement this functionality on their own.

Sequence Semantics

Snowflake sequences currently provide the following guarantees:

  • All values provided by a sequence are globally unique provided the sign of the sequence interval does not change. Concurrent queries never observe the same value, and values within a single query are always distinct. Changing the sequence interval from positive to negative, or negative to positive (e.g., from 1 to -1) may result in duplicates (first query fetches sequence values 1, 2, 3, interval is changed from 1 to -1, next statement returns values 3, 2, 1).
  • Each generated sequence value additionally reserves values depending on the sequence interval. The reserved values span from the sequence to value + (sign(step) * abs(step)). Thus, if the value 100 is generated: given a step of 2, the value 101 is reserved in addition to 100; a step of 10, values 100-109 are reserved; a step of -5, values 96-100 are reserved. A reserved value is never generated by the sequence provided the internal is never modified.
  • Values generated by a sequence are guaranteed to be greater than the maximum value produced by a previous statement if the previous statement completed, and an acknowledgment was received, prior to submitting the current statement. This guarantee does not hold if the sign of the interval is changed (positive to negative or negative to positive).

There is no guarantee that values from a sequence are contiguous (gap-free) nor that the sequence values are assigned in a particular order. There is, in fact, no way to assign values from a sequence to rows in a specified order other than to use single-row statements (this still provides no guarantee about gaps).

A sequence value may represent a 64-bit two’s compliment integer (-2^63 to 2^63 - 1). If the internal representation of a sequence’s next value exceeds this range (in either direction) an error will result and the query will fail (these sequence values may be lost!). In this solution you must either use a smaller (in magnitude) increment value or create a new sequence with a smaller start value. As gaps may occur the internal representation of the next value may exceed the allowable range even if the returned sequence values are all within the allowable range. We do not provide an explicit guarantee on how to avoid this error, but commit to supporting sequence objects that correctly provide unique values. A sequence object created with a start value of 1 and an increment value of 1 should never exhaust the allowable range of sequence values.

Currval

While many databases provide a “currval” sequence reference, Snowflake does not. Currval is typically used to create primary-foreign key relationships between tables – a first statement inserts a single row into the fact table using a sequence to create a key. Subsequent statements insert rows into the dimension tables using currval to refer to the fact table’s key. This pattern is contrary to Snowflake best practices – Bulk queries should be preferred over small, single-row queries. The same task can be better accomplished using multi-table insert and sequence references in nested subqueries. See below for examples.

Sequences as Expressions

Sequences may be accessed in queries as expressions of the form sequence_name.nextval. Each occurrence of a sequence generates a set of distinct values. This is contrary to what many other databases provide, where multiple references to nextval of a sequence return the same value for each row.

For example:

CREATE SEQUENCE seq;
SELECT seq.nextval a, seq.nextval b FROM DUAL;

Return distinct values for columns a and b. If you want to return two columns with the same generated sequence value use nested subqueries and views:

CREATE SEQUENCE seq;
SELECT seqRef.a a, seqRef.a b FROM (SELECT seq.nextval a FROM DUAL) seqRef;

Nested subqueries generate as many distinct sequence values as rows returned by the subquery (so a sequence reference in a query block with several joins refers not to any of the joined objects, but the output of the query block). These generated values may not be observed if the associated rows are later filtered out, or the values may be observed twice (as in the above example) if the sequence column or the inline view are referred to multiple times.

Note

For multi-table insert, insert values may be provided both in the VALUES clauses and in the SELECT input (Oracle restricts sequence references to VALUES clauses):

  • VALUES clauses referring to a sequence value aliased from the input SELECT receive the same value.
  • VALUES clauses containing a direct reference to a sequence nextval receive distinct values.

Sequences as Table Functions

Nested queries with sequence references are often difficult to understand and verbose — any shared reference (where two columns of a row should receive the same sequence value) requires an additional level of query nesting. To simplify nested-query syntax Snowflake provides an additional method to generate sequences using the table function GETNEXTVAL, as in the following example:

CREATE SEQUENCE seq;
CREATE TABLE foo (n NUMBER);
INSERT INTO foo VALUES (100), (101), (102);
SELECT n, s.nextval FROM foo, TABLE(GETNEXTVAL(seq)) s;

GETNEXTVAL is a 1-row table function that generates a unique value (and joins this value) to other objects in the SELECT statement. A call to GETNEXTVAL must be aliased, otherwise the generated values cannot be referenced. Multiple columns may refer to a generated value by accessing this alias. The GETNEXTVAL alias contains an attribute also named nextval.

The GETNEXTVAL table function additionally allows precise control over sequence generation when many tables are joined together. The order of objects in the FROM clause determines where values are generated. Sequence values are generated over the result of joins between all objects listed prior to GETNEXTVAL in the FROM clause. The resulting rows are then joined to the objects to the right. There is an implicit lateral dependence between GETNEXTVAL and all other objects in the FROM clause. Joins may not reorder around GETNEXTVAL. This is an exception in SQL, as typically the order of objects does not affect the query semantics.

Consider the following example with tables t1, t2, t3, and t4:

CREATE SEQUENCE seq;

SELECT t1.*, t2.*, t3.*, t4.*, s.nextval FROM t1, t2, TABLE(GETNEXTVAL(seq)) s, t3, t4;

This query will join t1 to t2, generate a unique value of the result, and then join the resulting relation against t3 and t4. The order of joins between the post-sequence relation, t3, and t4 is not specified since inner joins are associative.

These semantics can be tricky. It is recommended that GETNEXTVAL be used at the end of the FROM clause, when possible and appropriate, to avoid confusion.

Using Sequences to Create Default Column Values

Sequences are intended to be used to generate primary keys for columns of a table. The following tools provide a simple way to do this.

Column Default Expressions

The column default expression can be a sequence reference. Omitting the column in an insert statement or setting the value to DEFAULT in an insert or update statement will generate a new sequence value for the row.

For example:

CREATE SEQUENCE seq;
CREATE TABLE foo (k NUMBER DEFAULT seq.nextval, v NUMBER);

-- insert (unique key, 100)
INSERT INTO foo (v) VALUES (100);

-- insert (unique key, 101)
INSERT INTO foo VALUES (DEFAULT, 101);

-- insert (unique key, 100), (unique key, 101) -- new unique keys are distinct from pre-existing
INSERT INTO foo (v)
SELECT v FROM foo;

-- set v to unique values that are distinct from values in column k
UPDATE foo SET v = DEFAULT;

-- insert row, value for column k is specified
INSERT INTO foo VALUES (1000, 1001);

The advantage of using sequences as a column default value is that the sequence can be referenced in other locations, and even be the default value for multiple columns and in multiple tables. If a sequence is named as the default expression of a column and then subsequently dropped any attempt to insert/update the table using the default value will result in an error saying the identifier cannot be found.

Examples

Ingesting and Normalizing Denormalized Data

Consider a schema with two tables, people and contact:

  • The people table contains:
    • A primary key unique identifier: id
    • Two string columns: firstName and lastName
  • The contact table contains:
    • A primary key unique identifier: id
    • A foreign key linking this contact entry to a person: p_id
    • Two string columns: c_type (the type of contact, e.g. ‘email’ and ‘phone’) and data (the actual contact information).

Data in this format frequently is denormalized for ingestion or while processing semi-structured data.

This example illustrates ingesting JSON data, denormalizing it to extract the desired data, and normalizing the data as it is inserted into tables. At the same time, it is important to create unique identifiers on rows while maintaining the intended relations across rows of tables. We accomplish this with sequences.

  1. First, we set up the tables and sequences used in the example:

    -- primary data tables
    
    CREATE OR REPLACE TABLE people (id number, firstName string, lastName string);
    CREATE OR REPLACE TABLE contact (id number, p_id number, c_type string, data string);
    
    -- sequences to produce primary keys on our data tables
    
    CREATE OR REPLACE SEQUENCE people_seq;
    CREATE OR REPLACE SEQUENCE contact_seq;
    
    -- staging table for json
    
    CREATE OR REPLACE TABLE input (json variant);
    
  2. Next, we insert data from table json:

    INSERT INTO input SELECT parse_json(
    '[
     {
       firstName : \'John\',
       lastName : \'Doe\',
       contacts : [
         {
           contactType : \'phone\',
           contactData : \'1234567890\',
         }
         ,
         {
           contactType : \'email\',
           contactData : \'jdoe@acme.com\',
         }
        ]
       }
    ,
      {
       firstName : \'Mister\',
       lastName : \'Smith\',
       contacts : [
         {
           contactType : \'phone\',
           contactData : \'0987654321\',
         }
         ,
         {
           contactType : \'email\',
           contactData : \'msmith@acme.com\',
         }
         ]
       }
     ,
       {
       firstName : \'George\',
       lastName : \'Washington\',
       contacts : [
         {
           contactType : \'phone\',
           contactData : \'1231231234\',
         }
         ,
         {
           contactType : \'email\',
           contactData : \'gwashington@acme.com\',
         }
       ]
     }
    ]'
    );
    
  3. Then, we parse and flatten the JSON, generate unique identifiers for each person and contact entry, and insert the data while preserving relations between people and contact entries:

    INSERT ALL
      WHEN 1=1 THEN
        INTO contact VALUES (c_next, p_next, contact_value:contactType, contact_value:contactData)
      WHEN contact_index = 0 THEN
        INTO people VALUES (p_next, person_value:firstName, person_value:lastName)
    
    SELECT * FROM
    (
      SELECT f1.value person_value, f2.value contact_value, f2.index contact_index, p_seq.nextval p_next, c_seq.nextval c_next
      FROM input, LATERAL FLATTEN(input.json) f1, TABLE(GETNEXTVAL(people_seq)) p_seq,
        LATERAL FLATTEN(f1.value:contacts) f2, table(GETNEXTVAL(contact_seq)) c_seq
    );
    
  4. This produces the following data (unique IDs may change):

    SELECT * FROM people;
    
    ----+-----------+------------+
     ID | FIRSTNAME |  LASTNAME  |
    ----+-----------+------------+
     1  | John      | Doe        |
     2  | Mister    | Smith      |
     3  | George    | Washington |
    ----+-----------+------------+
    
    SELECT * FROM contact;
    
    ----+------+--------+-----------------------------------------+
     ID | P_ID | C_TYPE |                  DATA                   |
    ----+------+--------+-----------------------------------------+
     1  | 1    | phone  | 1234567890                              |
     2  | 1    | email  | jdoe@acme.com                           |
     3  | 2    | phone  | 0987654321                              |
     4  | 2    | email  | msmith@acme.com                         |
     5  | 3    | phone  | 1231231234                              |
     6  | 3    | email  | gwashington@acme.com                    |
    ----+------+--------+-----------------------------------------+
    

As you can see, rows are linked, and can be joined, between people.id and contact.p_id. If additional data is added, new rows continue to receive unique IDs:

 TRUNCATE TABLE input;

 INSERT INTO input SELECT PARSE_JSON(
 '[
  {
    firstName : \'Genghis\',
    lastName : \'Khan\',
    contacts : [
      {
        contactType : \'phone\',
        contactData : \'1111111111\',
      }
      ,
      {
        contactType : \'email\',
        contactData : \'gkahn@acme.com\',
      }
   ]
 }
,
 {
    firstName : \'Julius\',
    lastName : \'Caesar\',
    contacts : [
      {
        contactType : \'phone\',
        contactData : \'2222222222\',
      }
      ,
      {
        contactType : \'email\',
        contactData : \'gcaesar@acme.com\',
      }
    ]
  }
 ]'
 );

 INSERT ALL
   WHEN 1=1 THEN
     INTO contact VALUES (c_next, p_next, contact_value:contactType, contact_value:contactData)
   WHEN contact_index = 0 THEN
     INTO people VALUES (p_next, person_value:firstName, person_value:lastName)
 SELECT * FROM
 (
   SELECT f1.value person_value, f2.value contact_value, f2.index contact_index, p_seq.nextval p_next, c_seq.nextval c_next
   FROM input, LATERAL FLATTEN(input.json) f1, table(GETNEXTVAL(people_seq)) p_seq,
     LATERAL FLATTEN(f1.value:contacts) f2, table(GETNEXTVAL(contact_seq)) c_seq
 );

 SELECT * FROM people;

 ----+-----------+------------+
  ID | FIRSTNAME |  LASTNAME  |
 ----+-----------+------------+
  4  | Genghis   | Kahn       |
  5  | Julius    | Caesar     |
  1  | John      | Doe        |
  2  | Mister    | Smith      |
  3  | George    | Washington |
 ----+-----------+------------+

 SELECT * FROM contact;

 ----+------+--------+-----------------------------------------+
  ID | P_ID | C_TYPE |                  DATA                   |
 ----+------+--------+-----------------------------------------+
  7  | 4    | phone  | 1111111111                              |
  8  | 4    | email  | gkhan@acme.com                          |
  9  | 5    | phone  | 2222222222                              |
  10 | 5    | email  | jcaesar@acme.com                        |
  1  | 1    | phone  | 1234567890                              |
  2  | 1    | email  | jdoe@acme.com                           |
  3  | 2    | phone  | 0987654321                              |
  4  | 2    | email  | msmith@acme.com                         |
  5  | 3    | phone  | 1231231234                              |
  6  | 3    | email  | gwashington@acme.com                    |
 ----+------+--------+-----------------------------------------+