Aggregate Functions¶
Aggregate functions operate on values across rows to perform mathematical calculations such as counting, minimum/maximum/average values, ranking, standard deviation, and estimation, as well as other non-mathematical operations.
An aggregate function takes zero, one, or more rows as input and produces a single output. In contrast, scalar functions take one row as input and produce one row (i.e. one value) as output.
Note
Unless otherwise noted, aggregate functions can also include an OVER
clause. The OVER
clause enables performing aggregations over a
defined group of rows, often referred to as a window.
In addition, certain aggregate functions support an ORDER BY
subclause in the OVER
clause, which can optionally include a window frame.
For more information about windows and window frames, see Window Functions.
In this Topic:
List of Functions (by Sub-category)¶
Function Name |
Window |
Window Frame |
Notes |
---|---|---|---|
General Aggregation |
|||
✔ |
|||
✔ |
|||
✔ |
|||
✔ |
✔ |
||
✔ |
|||
✔ |
|||
✔ |
|||
✔ |
✔ |
||
✔ |
|||
✔ |
✔ |
||
✔ |
Uses different syntax than the other aggregate functions. |
||
✔ |
Uses different syntax than the other aggregate functions. |
||
✔ |
Alias for STDDEV_POP. |
||
✔ |
|||
✔ |
|||
✔ |
✔ |
||
✔ |
|||
✔ |
|||
✔ |
Alias for VAR_POP. |
||
✔ |
Alias for VAR_SAMP. |
||
Order-sensitive Aggregation . (aka Analytic Functions) |
|||
✔ |
|||
✔ |
|||
✔ |
✔ |
||
✔ |
|||
✔ |
✔ |
||
✔ |
|||
✔ |
✔ |
||
✔ |
|||
✔ |
|||
✔ |
|||
✔ |
|||
Not an actual window/analytic function (i.e. does not utilize a window in its syntax), but performs similar analytic operations on a range of rows. |
|||
Bitwise Aggregation |
|||
✔ |
|||
✔ |
|||
✔ |
|||
Hash |
|||
✔ |
|||
Semi-structured Data Aggregation |
|||
✔ |
|||
✔ |
|||
Linear Regression |
|||
✔ |
|||
✔ |
|||
✔ |
|||
✔ |
|||
✔ |
|||
✔ |
|||
✔ |
|||
✔ |
|||
✔ |
|||
Cardinality Estimation . (using HyperLogLog) |
|||
✔ |
Alias for HLL. |
||
✔ |
|||
✔ |
|||
✔ |
|||
Not an aggregate function; uses scalar input from HLL_ACCUMULATE or HLL_COMBINE. |
|||
✔ |
|||
✔ |
|||
Similarity Estimation . (using MinHash) |
|||
✔ |
Alias for APPROXIMATE_SIMILARITY. |
||
✔ |
|||
✔ |
|||
✔ |
|||
Frequency Estimation . (using Space-Saving) |
|||
✔ |
|||
✔ |
|||
✔ |
|||
Not an aggregate function; uses scalar input from APPROX_TOP_K_ACCUMULATE or APPROX_TOP_K_COMBINE. |
|||
Percentile Estimation . (using t-Digest) |
|||
✔ |
|||
✔ |
|||
✔ |
|||
Not an aggregate function; uses scalar input from APPROX_PERCENTILE_ACCUMULATE or APPROX_PERCENTILE_COMBINE. |
|||
Aggregation Utilities |
|||
Not an aggregate function, but can be used in conjunction with aggregate functions to determine the level of aggregation for a row produced by a GROUP BY query. |
|||
Alias for GROUPING. |
Introductory Example¶
The following example illustrates the difference between an aggregate function (AVG) and scalar function (COS):
SELECT * FROM table_x_y; +----+----+ | X | Y | |----+----| | 10 | 20 | | 20 | 44 | | 30 | 70 | +----+----+ -- Calculate the average value of column x across all the rows in the table_x_y table: SELECT AVG(x) FROM table_x_y; +--------+ | AVG(X) | |--------| | 20.000 | +--------+ -- Caculate the cosine of each value in column x in the table_x_y table. SELECT COS(x) FROM table_x_y; +---------------+ | COS(X) | |---------------| | -0.8390715291 | | 0.4080820618 | | 0.1542514499 | +---------------+
Aggregate Functions and NULL Values¶
Some aggregate functions ignore NULL values. For example, AVG calculates the average of values 1
, 5
, and NULL
to be 3
,
based on the following formula:
(1 + 5) / 2 = 3
(i.e. only the non-NULL values are used in the both the numerator and denominator)
If all of the values passed to the aggregate function are NULL, then the aggregate function returns NULL.
Some aggregate functions can be passed more than one column. For example:
SELECT COUNT(col1, col2) FROM table1;
In these instances, the aggregate function ignores a row if any individual column is NULL.
For example, in the following query, COUNT returns 1
, not 4
, because three of the four rows contain at least one NULL
value in the selected columns:
Create a table and populate it with values:
CREATE TABLE t (x INT, y INT); INSERT INTO t (x, y) VALUES (1, 2), -- No NULLs. (3, NULL), -- One but not all columns are NULL. (NULL, 6), -- One but not all columns are NULL. (NULL, NULL); -- All columns are NULL.Query the table:
-- Returns 1, not 4, because if any of the columns in the aggregate are NULL, -- then the entire row is ignored. SELECT COUNT(x, y) FROM t; +-------------+ | COUNT(X, Y) | |-------------| | 1 | +-------------+
Similarly, if SUM is called with an expression that references two or more columns, and if one or more of those columns is NULL, then the expression evaluates to NULL, and the row is ignored:
-- Returns results for only the rows in which both x and y are non-NULL. SELECT SUM(x + y) FROM t; +------------+ | SUM(X + Y) | |------------| | 3 | +------------+
Note that this behavior differs from the behavior of GROUP BY, which does not discard rows when some columns are NULL:
SELECT x AS X_COL, y AS Y_COL FROM t GROUP BY x, y; +-------+-------+ | X_COL | Y_COL | |-------+-------| | 1 | 2 | | 3 | NULL | | NULL | 6 | | NULL | NULL | +-------+-------+