Categories:
Aggregate Functions , Window Functions , Utility & Hash Functions

HASH_AGG

Returns an aggregate signed 64-bit hash value over the (unordered) set of input rows. HASH_AGG never returns NULL, even if no input is provided. Empty input “hashes” to 0.

One use for aggregate hash functions is to detect changes to a set of values without comparing the individual old and new values. HASH_AGG can compute a single hash value based on many inputs; almost any change to one of the inputs is likely to result in a change to the output of the HASH_AGG function. Comparing two lists of values typically requires sorting both lists, but HASH_AGG produces the same value regardless of the order of the inputs. Because the values don’t need to be sorted for HASH_AGG, performance is typically much faster.

Note

HASH_AGG is not a cryptographic hash function and should not be used as such.

For cryptographic purposes, use the SHA families of functions (in String & Binary Functions).

Collation:No impact
See also:HASH

Syntax

HASH_AGG( [ DISTINCT ] <expr> [ ... ] )

HASH_AGG(*)

Usage Notes

  • HASH_AGG computes a “fingerprint” over an entire table or query result. Any change to the input will influence the result of HASH_AGG with overwhelming probability. This can be used to quickly detect changes to table contents or query results.

    Note that it is possible, though very unlikely, that two different input tables will produce the same result for HASH_AGG. If you need to make sure that two tables or query results which produce the same HASH_AGG result really contain the same data, you must still compare the data for equality (e.g. by using the MINUS operator). For more details, see Set Operators.

  • HASH_AGG is not order-sensitive (i.e. the order of rows in an input table or query result does not influence the result of HASH_AGG). However, changing the order of input columns does change the result.

  • HASH_AGG hashes individual input rows using the HASH function. The salient features of this function carry over to HASH_AGG. In particular, HASH_AGG is stable in the sense that any two rows which compare equal and which have compatible types are guaranteed to hash to the same value (i.e. they influence the result of HASH_AGG in the same way).

    For example, changing the scale and precision of a column that is part of some table, does not change the result of HASH_AGG over that table. See HASH for details.

  • In contrast to most other aggregate functions, HASH_AGG does not ignore NULL inputs (i.e. NULL inputs influence the result of HASH_AGG).

  • Duplicate rows, including duplicate all-NULL rows, influence the result. The DISTINCT keyword can be used to suppress the effect of duplicate rows.

Examples

NULLs are never ignored:

select hash_agg(null), hash_agg(null, null), hash_agg(null, null, null);

----------------------+----------------------+----------------------------+
    HASH_AGG(NULL)    | HASH_AGG(NULL, NULL) | HASH_AGG(NULL, NULL, NULL) |
----------------------+----------------------+----------------------------+
 -5089618745711334219 | 2405106413361157177  | -5970411136727777524       |
----------------------+----------------------+----------------------------+

Empty input hashes to 0:

select hash_agg(null) where 0 = 1;

----------------+
 HASH_AGG(NULL) |
----------------+
 0              |
----------------+

Use HASH_AGG(*) to conveniently aggregate over all input columns:

select hash_agg(*) from orders;

---------------------+
     HASH_AGG(*)     |
---------------------+
 1830986524994392080 |
---------------------+

Grouped aggregation is supported:

select year(o_orderdate), hash_agg(*) from orders group by 1 order by 1;

-------------------+----------------------+
 YEAR(O_ORDERDATE) |     HASH_AGG(*)      |
-------------------+----------------------+
 1992              | 4367993187952496263  |
 1993              | 7016955727568565995  |
 1994              | -2863786208045652463 |
 1995              | 1815619282444629659  |
 1996              | -4747088155740927035 |
 1997              | 7576942849071284554  |
 1998              | 4299551551435117762  |
-------------------+----------------------+

Suppress duplicate rows using DISTINCT (duplicate rows influence results of HASH_AGG):

select year(o_orderdate), hash_agg(o_custkey, o_orderdate) from orders group by 1 order by 1;

-------------------+----------------------------------+
 YEAR(O_ORDERDATE) | HASH_AGG(O_CUSTKEY, O_ORDERDATE) |
-------------------+----------------------------------+
 1992              | 5686635209456450692              |
 1993              | -6250299655507324093             |
 1994              | 6630860688638434134              |
 1995              | 6010861038251393829              |
 1996              | -767358262659738284              |
 1997              | 6531729365592695532              |
 1998              | 2105989674377706522              |
-------------------+----------------------------------+

select year(o_orderdate), hash_agg(distinct o_custkey, o_orderdate) from orders group by 1 order by 1;

-------------------+-------------------------------------------+
 YEAR(O_ORDERDATE) | HASH_AGG(DISTINCT O_CUSTKEY, O_ORDERDATE) |
-------------------+-------------------------------------------+
 1992              | -8416988862307613925                      |
 1993              | 3646533426281691479                       |
 1994              | -7562910554240209297                      |
 1995              | 6413920023502140932                       |
 1996              | -3176203653000722750                      |
 1997              | 4811642075915950332                       |
 1998              | 1919999828838507836                       |
-------------------+-------------------------------------------+

This final example computes the number of days on which the corresponding sets of customers with orders with status not equal 'F' and status not equal 'P', respectively, are identical:

select count(distinct o_orderdate) from orders;

-----------------------------+
 COUNT(DISTINCT O_ORDERDATE) |
-----------------------------+
 2406                        |
-----------------------------+

select count(o_orderdate)
from (select o_orderdate, hash_agg(distinct o_custkey)
      from orders
      where o_orderstatus <> 'F'
      group by 1
      intersect
      select o_orderdate, hash_agg(distinct o_custkey)
      from orders
      where o_orderstatus <> 'P'
      group by 1);

--------------------+
 COUNT(O_ORDERDATE) |
--------------------+
 1143               |
--------------------+

Note that the query does not account for the possibility of hash collisions, so the actual number of days may be slightly lower.