Using Sequences

Sequences are used to generate unique numbers across sessions and statements, including concurrent statements. They can be used to generate values for a primary key or any column that requires a unique value.

Important

Snowflake does not guarantee generating sequence numbers with no gaps.

In this Topic:

Sequence Semantics

Snowflake sequences currently utilize the following semantics:

  • All values generated by a sequence are globally unique as long as the sign of the sequence interval does not change. Concurrent queries never observe the same value, and values within a single query are always distinct. Changing the sequence interval from positive to negative (e.g. from 1 to -1), or vice versa may result in duplicates (i.e. if the first query returns sequence values 1, 2, 3, then the interval is changed from 1 to -1, the next statement returns values 3, 2, 1).

  • Each generated sequence value additionally reserves values depending on the sequence interval, also referenced to as “step”. The reserved values span from the sequence to <value> + (sign(<step>) * abs(<step>)). Thus, if the value 100 is generated:

    • With a step of 2, values 100 and 101 are reserved.
    • With a step of 10, values 100 to 109 are reserved.
    • With a step of -5, values 96 to 100 are reserved.

    A reserved value is never generated by the sequence as long as the step/interval is never modified.

  • Values generated by a sequence are guaranteed to be greater than the maximum value produced by a previous statement if the previous statement completed, and an acknowledgment was received, prior to submitting the current statement. This guarantee does not hold if the sign of the interval is changed (positive to negative or negative to positive).

There is no guarantee that values from a sequence are contiguous (gap-free) or that the sequence values are assigned in a particular order. There is, in fact, no way to assign values from a sequence to rows in a specified order other than to use single-row statements (this still provides no guarantee about gaps).

A sequence value can represent a 64-bit two’s complement integer (-2^63 to 2^63-1). If the internal representation of a sequence’s next value exceeds this range (in either direction) an error will result and the query will fail. Note that this may result in losing these sequence values.

In this situation, you must either use a smaller (in magnitude) increment value or create a new sequence with a smaller start value. As gaps may occur, the internal representation of the next value may exceed the allowable range even if the returned sequence values are all within the allowable range. We do not provide an explicit guarantee regarding how to avoid this error, but we commit to supporting sequence objects that correctly provide unique values. A sequence object created with a start value of 1 and an increment value of 1 should never exhaust the allowable range of sequence values.

Referencing Sequences

Currval

Many databases provide a currval sequence reference; however, Snowflake does not. currval in other systems is typically used to create primary-foreign key relationships between tables — a first statement inserts a single row into the fact table using a sequence to create a key. Subsequent statements insert rows into the dimension tables using currval to refer to the fact table’s key.

This pattern is contrary to Snowflake best practices — bulk queries should be preferred over small, single-row queries. The same task can be better accomplished using multi-table insert and sequence references in nested subqueries. See Ingesting and Normalizing Denormalized Data (in this topic) for a detailed example.

Sequences as Expressions

Sequences may be accessed in queries as expressions of the form <sequence_name>.nextval. Each occurrence of a sequence generates a set of distinct values. This is different from what many other databases provide, where multiple references to nextval of a sequence return the same value for each row.

For example, the following query returns distinct values for columns a and b:

CREATE OR REPLACE SEQUENCE seq;

SELECT seq.nextval a, seq.nextval b FROM DUAL;

If you want to return two columns with the same generated sequence value, use nested subqueries and views:

CREATE OR REPLACE SEQUENCE seq;

SELECT seqRef.a a, seqRef.a b FROM (SELECT seq.nextval a FROM DUAL) seqRef;

Nested subqueries generate as many distinct sequence values as rows returned by the subquery (so a sequence reference in a query block with several joins refers not to any of the joined objects, but the output of the query block). These generated values may not be observed if the associated rows are later filtered out, or the values may be observed twice (as in the above example) if the sequence column or the inline view are referred to multiple times.

Note

For multi-table insert, insert values may be provided both in the VALUES clauses and in the SELECT input:

  • VALUES clauses referring to a sequence value aliased from the input SELECT receive the same value.
  • VALUES clauses containing a direct reference to a sequence nextval receive distinct values.

In contrast, Oracle restricts sequence references to VALUES clauses only.

Sequences as Table Functions

Nested queries with sequence references are often difficult to understand and verbose — any shared reference (where two columns of a row should receive the same sequence value) requires an additional level of query nesting. To simplify nested-query syntax, Snowflake provides an additional method to generate sequences using the table function GETNEXTVAL, as in the following example:

CREATE OR REPLACE SEQUENCE seq;

CREATE OR REPLACE TABLE foo (n NUMBER);

INSERT INTO foo VALUES (100), (101), (102);

SELECT n, s.nextval FROM foo, TABLE(GETNEXTVAL(seq)) s;

GETNEXTVAL is a special 1-row table function that generates a unique value (and joins this value) to other objects in the SELECT statement. A call to GETNEXTVAL must be aliased, otherwise the generated values cannot be referenced. Multiple columns may refer to a generated value by accessing this alias. The GETNEXTVAL alias contains an attribute also named nextval.

The GETNEXTVAL table function additionally allows precise control over sequence generation when many tables are joined together. The order of objects in the FROM clause determines where values are generated. Sequence values are generated over the result of joins between all objects listed prior to GETNEXTVAL in the FROM clause. The resulting rows are then joined to the objects to the right. There is an implicit lateral dependence between GETNEXTVAL and all other objects in the FROM clause. Joins may not reorder around GETNEXTVAL. This is an exception in SQL, as typically the order of objects does not affect the query semantics.

Consider the following example with tables t1, t2, t3, and t4:

CREATE OR REPLACE SEQUENCE seq;

SELECT t1.*, t2.*, t3.*, t4.*, s.nextval FROM t1, t2, TABLE(GETNEXTVAL(seq)) s, t3, t4;

This query will join t1 to t2, generate a unique value of the result, and then join the resulting relation against t3 and t4. The order of joins between the post-sequence relation, t3, and t4 is not specified because inner joins are associative.

Note

These semantics can be tricky. We recommend using GETNEXTVAL at the end of the FROM clause, when possible and appropriate, to avoid confusion.

Using Sequences to Create Default Column Values

Sequences can be used in tables to generate primary keys for table columns. The following tools provide a simple way to do this.

Column Default Expressions

The column default expression can be a sequence reference. Omitting the column in an insert statement or setting the value to DEFAULT in an insert or update statement will generate a new sequence value for the row.

For example:

CREATE OR REPLACE SEQUENCE seq;

CREATE OR REPLACE TABLE foo (k NUMBER DEFAULT seq.nextval, v NUMBER);

-- insert rows with unique keys (generated by seq) and explicit values
INSERT INTO foo (v) VALUES (100);
INSERT INTO foo VALUES (DEFAULT, 101);

-- insert rows with unique keys (generated by seq) and reused values
-- new keys are distinct from preexisting keys
INSERT INTO foo (v) SELECT v FROM foo;

-- insert row with explicit values for both columns
INSERT INTO foo VALUES (1000, 1001);

SELECT * FROM foo;

+------+------+
|    K |    V |
|------+------|
|    1 |  100 |
|    2 |  101 |
|    3 |  100 |
|    4 |  101 |
| 1000 | 1001 |
+------+------+

The advantage of using sequences as a column default value is that the sequence can be referenced in other locations, and even be the default value for multiple columns and in multiple tables. If a sequence is named as the default expression of a column and then subsequently dropped any attempt to insert/update the table using the default value will result in an error saying the identifier cannot be found.

Ingesting and Normalizing Denormalized Data

Consider a schema with two tables, people and contact:

  • The people table contains:
    • A primary key unique identifier: id
    • Two string columns: firstName and lastName
  • The contact table contains:
    • A primary key unique identifier: id
    • A foreign key linking this contact entry to a person: p_id
    • Two string columns: c_type (the type of contact, e.g. ‘email’ or ‘phone’) and data (the actual contact information).

Data in this format frequently is denormalized for ingestion or while processing semi-structured data.

This example illustrates ingesting JSON data, denormalizing it to extract the desired data, and normalizing the data as it is inserted into tables. At the same time, it is important to create unique identifiers on rows while maintaining the intended relationships across rows of tables. We accomplish this with sequences.

  1. First, we set up the tables and sequences used in the example:

    -- primary data tables
    
    CREATE OR REPLACE TABLE people (id number, firstName string, lastName string);
    CREATE OR REPLACE TABLE contact (id number, p_id number, c_type string, data string);
    
    -- sequences to produce primary keys on our data tables
    
    CREATE OR REPLACE SEQUENCE people_seq;
    CREATE OR REPLACE SEQUENCE contact_seq;
    
    -- staging table for json
    
    CREATE OR REPLACE TABLE input (json variant);
    
  2. Next, we insert data from table json:

    INSERT INTO input SELECT parse_json(
    '[
     {
       firstName : \'John\',
       lastName : \'Doe\',
       contacts : [
         {
           contactType : \'phone\',
           contactData : \'1234567890\',
         }
         ,
         {
           contactType : \'email\',
           contactData : \'jdoe@acme.com\',
         }
        ]
       }
    ,
      {
       firstName : \'Mister\',
       lastName : \'Smith\',
       contacts : [
         {
           contactType : \'phone\',
           contactData : \'0987654321\',
         }
         ,
         {
           contactType : \'email\',
           contactData : \'msmith@acme.com\',
         }
         ]
       }
     ,
       {
       firstName : \'George\',
       lastName : \'Washington\',
       contacts : [
         {
           contactType : \'phone\',
           contactData : \'1231231234\',
         }
         ,
         {
           contactType : \'email\',
           contactData : \'gwashington@acme.com\',
         }
       ]
     }
    ]'
    );
    
  3. Then, we parse and flatten the JSON, generate unique identifiers for each person and contact entry, and insert the data while preserving relationships between people and contact entries:

    INSERT ALL
      WHEN 1=1 THEN
        INTO contact VALUES (c_next, p_next, contact_value:contactType, contact_value:contactData)
      WHEN contact_index = 0 THEN
        INTO people VALUES (p_next, person_value:firstName, person_value:lastName)
    
    SELECT * FROM
    (
      SELECT f1.value person_value, f2.value contact_value, f2.index contact_index, p_seq.nextval p_next, c_seq.nextval c_next
      FROM input, LATERAL FLATTEN(input.json) f1, TABLE(GETNEXTVAL(people_seq)) p_seq,
        LATERAL FLATTEN(f1.value:contacts) f2, table(GETNEXTVAL(contact_seq)) c_seq
    );
    
  4. This produces the following data (unique IDs may change):

    SELECT * FROM people;
    
    +----+-----------+------------+
    | ID | FIRSTNAME | LASTNAME   |
    |----+-----------+------------|
    |  1 | John      | Doe        |
    |  2 | Mister    | Smith      |
    |  3 | George    | Washington |
    +----+-----------+------------+
    
    SELECT * FROM contact;
    
    +----+------+--------+----------------------+
    | ID | P_ID | C_TYPE | DATA                 |
    |----+------+--------+----------------------|
    |  1 |    1 | phone  | 1234567890           |
    |  2 |    1 | email  | jdoe@acme.com        |
    |  3 |    2 | phone  | 0987654321           |
    |  4 |    2 | email  | msmith@acme.com      |
    |  5 |    3 | phone  | 1231231234           |
    |  6 |    3 | email  | gwashington@acme.com |
    +----+------+--------+----------------------+
    

As you can see, rows are linked, and can be joined, between people.id and contact.p_id.

If additional data is added, new rows continue to receive unique IDs. For example:

 TRUNCATE TABLE input;

 INSERT INTO input SELECT PARSE_JSON(
 '[
  {
    firstName : \'Genghis\',
    lastName : \'Khan\',
    contacts : [
      {
        contactType : \'phone\',
        contactData : \'1111111111\',
      }
      ,
      {
        contactType : \'email\',
        contactData : \'gkahn@acme.com\',
      }
   ]
 }
,
 {
    firstName : \'Julius\',
    lastName : \'Caesar\',
    contacts : [
      {
        contactType : \'phone\',
        contactData : \'2222222222\',
      }
      ,
      {
        contactType : \'email\',
        contactData : \'gcaesar@acme.com\',
      }
    ]
  }
 ]'
 );

 INSERT ALL
   WHEN 1=1 THEN
     INTO contact VALUES (c_next, p_next, contact_value:contactType, contact_value:contactData)
   WHEN contact_index = 0 THEN
     INTO people VALUES (p_next, person_value:firstName, person_value:lastName)
 SELECT * FROM
 (
   SELECT f1.value person_value, f2.value contact_value, f2.index contact_index, p_seq.nextval p_next, c_seq.nextval c_next
   FROM input, LATERAL FLATTEN(input.json) f1, table(GETNEXTVAL(people_seq)) p_seq,
     LATERAL FLATTEN(f1.value:contacts) f2, table(GETNEXTVAL(contact_seq)) c_seq
 );

 SELECT * FROM people;

 +----+-----------+------------+
 | ID | FIRSTNAME | LASTNAME   |
 |----+-----------+------------|
 |  4 | Genghis   | Khan       |
 |  5 | Julius    | Caesar     |
 |  1 | John      | Doe        |
 |  2 | Mister    | Smith      |
 |  3 | George    | Washington |
 +----+-----------+------------+

 SELECT * FROM contact;

 +----+------+--------+----------------------+
 | ID | P_ID | C_TYPE | DATA                 |
 |----+------+--------+----------------------|
 |  1 |    1 | phone  | 1234567890           |
 |  2 |    1 | email  | jdoe@acme.com        |
 |  3 |    2 | phone  | 0987654321           |
 |  4 |    2 | email  | msmith@acme.com      |
 |  5 |    3 | phone  | 1231231234           |
 |  6 |    3 | email  | gwashington@acme.com |
 |  7 |    4 | phone  | 1111111111           |
 |  8 |    4 | email  | gkahn@acme.com       |
 |  9 |    5 | phone  | 2222222222           |
 | 10 |    5 | email  | gcaesar@acme.com     |
 +----+------+--------+----------------------+