Categories:

Utility & Hash Functions

HASH

Returns a signed 64-bit hash value. Note that HASH never returns NULL, even for NULL inputs.

Possible uses for the HASH function include:

  • Convert skewed data values to values that are likely to be more randomly or more evenly distributed.

    For example, you can hash a group of highly skewed values and generate a set of values that are more likely to be randomly distributed or evenly distributed.

  • Put data in buckets. Because hashing can convert skewed data values to closer-to-evenly distributed values, you can use hashing to help take skewed values and create approximately evenly-sized buckets.

    If hashing alone is not sufficient to get the number of distinct buckets that you want, you can combine hashing with the ROUND or WIDTH_BUCKET functions.

Note

HASH is a proprietary function that accepts a variable number of input expressions of arbitrary types and returns a signed value. It is not a cryptographic hash function and should not be used as such.

Cryptographic hash functions have a few properties which this function does not, for example:

  • The cryptographic hashing of a value cannot be inverted to find the original value.

  • Given a value, it is infeasible to find another value with the same cryptographic hash.

For cryptographic purposes, use the SHA families of functions (in String & Binary Functions).

See also

HASH_AGG

Syntax

HASH( <expr> [ , <expr2> ... ] )

HASH(*)

Arguments

exprN

The expression may be a general expression of any Snowflake data type.

Returns

Returns a signed 64-bit value as NUMBER(19,0).

HASH never returns NULL, even for NULL inputs.

Usage Notes

  • HASH is stable in the sense that any two values of type NUMBER that compare equally are guaranteed to hash to the same hash value, even if the respective types have different precision and/or scale. This guarantee extends to values of type FLOAT which can be converted to NUMBER(38, 0) without loss of precision. For example, the following all return the same hash value:

    HASH(10::NUMBER(38,0))
    HASH(10::NUMBER(5,3))
    HASH(10::FLOAT)
  • Furthermore, TIMESTAMP_TZ values that compare equally are guaranteed to hash to the same hash value, even if the timestamps are from different time zones.

  • These guarantees also apply to NUMBER, FLOAT, and TIMESTAMP_TZ values within a VARIANT column.

  • This guarantee does not apply to other combinations of types, even if implicit conversions exist between the types. For example, with overwhelming probability, the following will not return the same hash values even though 10 = '10' after implicit conversion:

    HASH(10)
    HASH('10')
  • HASH(*) means to create a single hashed value based on all columns in the row.

  • Do not use HASH() to create unique keys. HASH() has a finite resolution of 64 bits, and is guaranteed to return non-unique values if more than 2^64 values are entered, e.g. for a table with more than 2^64 rows. In practice, if the input is on the order of 2^32 rows (approximately 4 billion rows) or more, the function is reasonably likely to return at least one duplicate value.

Collation Details

No impact.

  • Two strings that are identical but have different collation specifications have the same hash value. In other words, only the string, not the collation specification, affects the hash value.

  • Two strings that are different, but compare equal according to a collation, might have a different hash value. For example, two strings that are identical using punctuation-insensitive collation will normally have different hash values because only the string, not the collation specification, affects the hash value.

Examples

SELECT HASH(SEQ8()) FROM TABLE(GENERATOR(rowCount=>10));

----------------------+
     HASH(SEQ8())     |
----------------------+
 -6076851061503311999 |
 -4730168494964875235 |
 -3690131753453205264 |
 -7287585996956442977 |
 -1285360004004520191 |
 4801857165282451853  |
 -2112898194861233169 |
 1885958945512144850  |
 -3994946021335987898 |
 -3559031545629922466 |
----------------------+
SELECT HASH(10), HASH(10::number(38,0)), HASH(10::number(5,3)), HASH(10::float);

---------------------+------------------------+-----------------------+---------------------+
      HASH(10)       | HASH(10::NUMBER(38,0)) | HASH(10::NUMBER(5,3)) |   HASH(10::FLOAT)   |
---------------------+------------------------+-----------------------+---------------------+
 1599627706822963068 | 1599627706822963068    | 1599627706822963068   | 1599627706822963068 |
---------------------+------------------------+-----------------------+---------------------+
SELECT HASH(10), HASH('10');

---------------------+---------------------+
      HASH(10)       |     HASH('10')      |
---------------------+---------------------+
 1599627706822963068 | 3622494980440108984 |
---------------------+---------------------+
SELECT HASH(null), HASH(null, null), HASH(null, null, null);

---------------------+--------------------+------------------------+
     HASH(NULL)      |  HASH(NULL, NULL)  | HASH(NULL, NULL, NULL) |
---------------------+--------------------+------------------------+
 8817975702393619368 | 953963258351104160 | 2941948363845684412    |
---------------------+--------------------+------------------------+

The example below shows that even if the table contains multiple columns, HASH(*) returns a single value per row.

CREATE TABLE orders (order_ID INTEGER, customer_ID INTEGER, order_date ...);

...

SELECT HASH(*) FROM orders LIMIT 10;

----------------------+
       HASH(*)        |
----------------------+
 -3527903796973745449 |
 6296330861892871310  |
 6918165900200317484  |
 -2762842444336053314 |
 -2340602249668223387 |
 5248970923485160358  |
 -5807737826218607124 |
 428973568495579456   |
 2583438210124219420  |
 4041917286051184231  |
----------------------+