Categories:

Aggregate Functions , Window Functions , Utility & Hash Functions

HASH_AGG

Returns an aggregate signed 64-bit hash value over the (unordered) set of input rows. HASH_AGG never returns NULL, even if no input is provided. Empty input “hashes” to 0.

One use for aggregate hash functions is to detect changes to a set of values without comparing the individual old and new values. HASH_AGG can compute a single hash value based on many inputs; almost any change to one of the inputs is likely to result in a change to the output of the HASH_AGG function. Comparing two lists of values typically requires sorting both lists, but HASH_AGG produces the same value regardless of the order of the inputs. Because the values don’t need to be sorted for HASH_AGG, performance is typically much faster.

Note

HASH_AGG is not a cryptographic hash function and should not be used as such.

For cryptographic purposes, use the SHA families of functions (in String & Binary Functions).

See also

HASH

Syntax

Aggregate function

HASH_AGG( [ DISTINCT ] <expr> [ , <expr2> ... ] )

HASH_AGG(*)

Window function

HASH_AGG( [ DISTINCT ] <expr> [ , <expr2> ... ] ) OVER ( [ PARTITION BY <expr3> ] )

HASH_AGG(*) OVER ( [ PARTITION BY <expr3> ] )

Usage Notes

  • The form HASH_AGG(*) calculates over all columns. This works for both the aggregate function and the window function.

  • HASH_AGG computes a “fingerprint” over an entire table or query result or window. Any change to the input will influence the result of HASH_AGG with overwhelming probability. This can be used to quickly detect changes to table contents or query results.

    Note that it is possible, though very unlikely, that two different input tables will produce the same result for HASH_AGG. If you need to make sure that two tables or query results which produce the same HASH_AGG result really contain the same data, you must still compare the data for equality (e.g. by using the MINUS operator). For more details, see Set Operators.

  • HASH_AGG is not order-sensitive (i.e. the order of rows in an input table or query result does not influence the result of HASH_AGG). However, changing the order of input columns does change the result.

  • HASH_AGG hashes individual input rows using the HASH function. The salient features of this function carry over to HASH_AGG. In particular, HASH_AGG is stable in the sense that any two rows which compare equal and which have compatible types are guaranteed to hash to the same value (i.e. they influence the result of HASH_AGG in the same way).

    For example, changing the scale and precision of a column that is part of some table, does not change the result of HASH_AGG over that table. See HASH for details.

  • In contrast to most other aggregate functions, HASH_AGG does not ignore NULL inputs (i.e. NULL inputs influence the result of HASH_AGG).

  • For both the aggregate function and the window functions, duplicate rows, including duplicate all-NULL rows, influence the result. The DISTINCT keyword can be used to suppress the effect of duplicate rows.

  • When used as a window function:

    • This function does not support:

      • ORDER BY sub-clause in the OVER() clause.

      • Window frames.

Collation Details

No impact.

  • Two strings that are identical but have different collation specifications have the same hash value. In other words, only the string, not the collation specification, affects the hash value.

  • Two strings that are different, but compare equal according to a collation, might have a different hash value. For example, two strings that are identical using punctuation-insensitive collation will normally have different hash values because only the string, not the collation specification, affects the hash value.

Examples

NULLs are never ignored:

select hash_agg(null), hash_agg(null, null), hash_agg(null, null, null);

----------------------+----------------------+----------------------------+
    HASH_AGG(NULL)    | HASH_AGG(NULL, NULL) | HASH_AGG(NULL, NULL, NULL) |
----------------------+----------------------+----------------------------+
 -5089618745711334219 | 2405106413361157177  | -5970411136727777524       |
----------------------+----------------------+----------------------------+

Empty input hashes to 0:

select hash_agg(null) where 0 = 1;

----------------+
 HASH_AGG(NULL) |
----------------+
 0              |
----------------+

Use HASH_AGG(*) to conveniently aggregate over all input columns:

select hash_agg(*) from orders;

---------------------+
     HASH_AGG(*)     |
---------------------+
 1830986524994392080 |
---------------------+

Grouped aggregation is supported:

select year(o_orderdate), hash_agg(*) from orders group by 1 order by 1;

-------------------+----------------------+
 YEAR(O_ORDERDATE) |     HASH_AGG(*)      |
-------------------+----------------------+
 1992              | 4367993187952496263  |
 1993              | 7016955727568565995  |
 1994              | -2863786208045652463 |
 1995              | 1815619282444629659  |
 1996              | -4747088155740927035 |
 1997              | 7576942849071284554  |
 1998              | 4299551551435117762  |
-------------------+----------------------+

Suppress duplicate rows using DISTINCT (duplicate rows influence results of HASH_AGG):

select year(o_orderdate), hash_agg(o_custkey, o_orderdate) from orders group by 1 order by 1;

-------------------+----------------------------------+
 YEAR(O_ORDERDATE) | HASH_AGG(O_CUSTKEY, O_ORDERDATE) |
-------------------+----------------------------------+
 1992              | 5686635209456450692              |
 1993              | -6250299655507324093             |
 1994              | 6630860688638434134              |
 1995              | 6010861038251393829              |
 1996              | -767358262659738284              |
 1997              | 6531729365592695532              |
 1998              | 2105989674377706522              |
-------------------+----------------------------------+

select year(o_orderdate), hash_agg(distinct o_custkey, o_orderdate) from orders group by 1 order by 1;

-------------------+-------------------------------------------+
 YEAR(O_ORDERDATE) | HASH_AGG(DISTINCT O_CUSTKEY, O_ORDERDATE) |
-------------------+-------------------------------------------+
 1992              | -8416988862307613925                      |
 1993              | 3646533426281691479                       |
 1994              | -7562910554240209297                      |
 1995              | 6413920023502140932                       |
 1996              | -3176203653000722750                      |
 1997              | 4811642075915950332                       |
 1998              | 1919999828838507836                       |
-------------------+-------------------------------------------+

This final example computes the number of days on which the corresponding sets of customers with orders with status not equal 'F' and status not equal 'P', respectively, are identical:

select count(distinct o_orderdate) from orders;

-----------------------------+
 COUNT(DISTINCT O_ORDERDATE) |
-----------------------------+
 2406                        |
-----------------------------+

select count(o_orderdate)
from (select o_orderdate, hash_agg(distinct o_custkey)
      from orders
      where o_orderstatus <> 'F'
      group by 1
      intersect
      select o_orderdate, hash_agg(distinct o_custkey)
      from orders
      where o_orderstatus <> 'P'
      group by 1);

--------------------+
 COUNT(O_ORDERDATE) |
--------------------+
 1143               |
--------------------+

Note that the query does not account for the possibility of hash collisions, so the actual number of days may be slightly lower.