Using Window Functions

Window functions operate on windows, which are groups of rows that are related (e.g. by date or location). This topic describes how to use the different types of window functions supported by Snowflake, including:

  • Aggregate functions called as window functions.
  • Window functions that support window frames.
  • Window functions that are order-sensitive (aka analytic functions).

In this Topic:

Windowing Concepts

The following diagram illustrates the relationship between aggregate, window, and analytic functions:

Relationships between aggregate functions, window functions, and analytic functions

As shown:

  • All window functions are aggregate functions.
  • Some window functions support window frames.
  • All analytic functions are window functions.
  • Some analytic functions support window frames.

For more details, see the sub-sections in this section.

Aggregate Functions

To understand window functions, you must first understand aggregate functions.

An aggregate function takes multiple inputs (rows or values) and returns a single output. Technically, an aggregate function can take zero, one, or multiple rows/values as input, but for simplicity’s sake, we usually just state that an aggregate function takes multiple rows/values. For example, the SUM function takes multiple numeric values and adds them together to return a single value.

For a more detailed explanation of aggregate functions, see Aggregate Functions.

Windows

A window is a group of rows. For simplicity’s sake, we usually just state that a window contains “multiple rows”, even though it may contain one or zero rows. The rows in a window are related in some way. For example, the rows might be related by location (e.g. all from the same province or state) or by time (e.g. all from the same month or fiscal quarter).

By definition, any function that supports specifying a window over its input rows is a window function.

The window is defined by a PARTITION BY sub-clause inside an OVER clause that is specified when the function is called. For example, the calling SQL statement might explicitly partition input rows into separate windows based on province or state. If the statement does not explicitly define these windows, then the window function operates on the entire input as a single window.

Also, because window functions accept multiple inputs and produce a single output, by definition, they are aggregate functions.

Note

The converse is not necessarily true (i.e. aggregate functions are not necessarily window functions), so the terms are not synonymous.

However, most Snowflake aggregate functions can be called with a window, hence they are also window functions. For the list of all aggregate functions, including the functions that can also be called as window functions, see Aggregate Functions.

Window Frames

A window frame contains a subset of the rows in a window.

Snowflake supports two types of window frames:

  • Cumulative.
  • Sliding.

Cumulative Window Frames

A cumulative window is a variable-width window that starts at a fixed point and grows with each new row. For example, Sales So Far This Month is calculated using a cumulative window that starts on the first of the month and expands as each new day’s data is added:

+--------+-------+--------------+
| Day of | Sales | Sales So Far |
| Month  | Today | This Month   |
|--------+-------+--------------+
|      1 |    10 |           10 |
|      2 |    10 |           20 |
|      3 |    10 |           30 |
|    ... |   ... |          ... |
+--------+-------+--------------+

Sliding Window Frames

A sliding window frame is a fixed-width frame that “slides along” the rows in the window; as new rows are added, old rows are removed from the window, so that the width of the frame is always the same.

For example, a 13-week moving average of a stock price is a sliding window; for each day, it shows us the average price over the last 13 weeks (91 days), including the current day:

  • On June 30th, the function returns the average price for April 1 to June 30.
  • On July 1st, the function returns the average price for April 2 to July 1.
  • On July 2nd, the function returns the average price for April 3 to July 2.
  • etc.

Note

Although sliding windows are fixed-width, there are scenarios when there is not enough data to fill the window. For example, if record-keeping started on April 1st, then from April 1st to June 29th, the sliding window would include fewer than 91 days of data. Only after the window finishes filling does its width remain fixed.

As another example, consider a moving average using a 3-day sliding window frame. The following illustration shows a window consisting of the first 7 days in the month and demonstrates how the window frame slides across the window, always keeping the 3 most recent values for calculating the total within the window:

3-day sliding window frame in 7-day window

And, as you can see in the corresponding table, the last column contains the sum of the three most recent days’ worth of sales data. Note that the column value for day 4 is 36, which is the sum of the sales for days 2, 3, and 4 (11 + 12 + 13):

+--------+-------+---------------+
| Day of | Sales | Most Recent   |
| Month  | Today | 3 Days' Sales |
|--------+-------+---------------+
|      1 |    10 |            10 |
|      2 |    11 |            21 |
|      3 |    12 |            33 |
|      4 |    13 |            36 |
|      5 |    14 |            39 |
|    ... |   ... |           ... |
+--------+-------+---------------+

Analytic Functions

An analytic function is a type of window function for which the order of the rows in the window is significant. As such, the output of an analytic function depends on:

  • The individual row passed to the function.
  • The values of the other rows in the window passed to the function.
  • The order of all the rows in the window.

For example, consider the RANK function. If you want to calculate the rank of each of your salespeople, you must specify the criterion on which to rank them (e.g. the number of dollars of product they’ve sold, the percentage of customers who are still happy with the product 3 years after purchasing it, etc.). Conceptually, you can calculate the rank by putting the sales people in order (e.g. by amount sold last month, in order from highest to lowest) and then calling the RANK function to show which salesperson sold the most, which sold the second most, etc.

The rank of each salesperson depends on both the individual sales person (e.g. the amount they sold) and how much every other sales person sold in that same time period. In database terms, the output of the rank function depends on both the individual row and all the other rows within the window. For example:

+-------------+-------------+------+
| Salesperson | Amount Sold | Rank |
|-------------+-------------+------|
| Smith       |        2000 |    1 |
| Jones       |        1500 |    2 |
| Torkelson   |        1200 |    3 |
+-------------+-------------+------+

The SQL syntax used to generate reports like this is shown in more detail later in this topic and in Window Functions.

For analytic functions, the function must be called once per row. For example, the RANK function returns a different value for each row (assuming no tied values), so the function is called once for each row, and is passed both that row and the entire window (so that the row’s rank can be determined by comparing it to every other row in that window).

Note

Not all window functions are analytic functions. The following section contrasts analytic functions and non-analytic window functions in more detail.

Non-Analytic Functions

For non-analytic functions, order is not important. For example, consider the SUM function:

SELECT salesperson, amount_sold, SUM(AMOUNT_SOLD) FROM sales_table;

+-------------+-------------+-------+
| Salesperson | Amount Sold | Total |
|-------------+-------------+-------|
| Smith       |        2000 |  4700 |
| Jones       |        1500 |  4700 |
| Torkelson   |        1200 |  4700 |
+-------------+-------------+-------+

Because the query simply sums the total sales across the entire window, the order doesn’t matter, and the sum is the same for each row. In fact, in this case, the aggregate function, when used as a window function, only needs to be called once per window, not once for each row in the window.

This is different from an analytic function, which must be called once for each row in each window, because the output is different for each row in the window.

Calling a Window Function

Windows and window frames are specified using an OVER clause:

... OVER ( [ PARTITION BY <expr1> ]
           [ ORDER BY <expr2> [ { cumulativeFrame | slidingFrame } ] ]
         ) ...

The window function syntax is covered in more detail in Window Functions. This topic focuses on the subset of the syntax required by window functions, specifically:

  • The PARTITION BY sub-clause partitions the data into windows. For example, if you want to calculate running sums of monthly sales for more than one month, you could partition the data by month. This would allow you to calculate one running sum of sales for January, another running sum for February, etc.

  • The ORDER BY sub-clause is primarily for analytic window functions and for sliding and cumulative window frame functions; it determines the order of rows within each window.

    For analytic functions, the ORDER BY sub-clause is required. This is because analytic functions only operate on windows that contain rows in a specified order.

Illustrated Example

This example illustrates many of the concepts described earlier in this topic. Continuing with our sales scenario, suppose that you need to generate a financial report that shows values based on sales over the last week:

  • Daily sales

  • Ranking within the week (i.e. sales ranked highest to lowest for the week)

    This uses a window function, specifically an analytic function (RANK).

  • Sales so far this week (i.e. the “running sum” for all days from the beginning of the week up through and including the current day)

    This uses an aggregate function (SUM), but as a window function with a cumulative window frame.

  • Total sales for the week

    This uses SUM purely as an aggregate function.

  • 3-day moving average (i.e. the average of the current day and the two previous days)

    This uses an aggregate function (AVG), but as a window function with a a sliding window frame.

The report might look something like this:

+--------+-------+------+--------------+-------------+--------------+
| Day of | Sales | Rank | Sales So Far | Total Sales | 3-Day Moving |
| Week   | Today |      | This Week    | This Week   | Average      |
|--------+-------+------+--------------+-------------|--------------+
|      1 |    10 |    4 |           10 |          84 |         10.0 |
|      2 |    14 |    3 |           24 |          84 |         12.0 |
|      3 |     6 |    5 |           30 |          84 |         10.0 |
|      4 |     6 |    5 |           36 |          84 |          9.0 |
|      5 |    14 |    3 |           50 |          84 |         10.0 |
|      6 |    16 |    2 |           66 |          84 |         11.0 |
|      7 |    18 |    1 |           84 |          84 |         12.0 |
+--------+-------+------+--------------+-------------+--------------+

The SQL for this query is somewhat complex. Rather than show it as a single query, we’ll break down the SQL for the individual columns.

Note

In a real world scenario, you would have years of data, so to calculate sums and averages for one specific week of data, you would need to use a one-week window, or use a filter similar to:

... WHERE date >= start_of_relevant_week and date <= end_of_relevant_week ...

However, for the purposes of this example, we’ll assume that the table contains only the most recent week’s worth of data.

Calculating Sales Rank

The Rank column is calculated using the RANK analytic function:

SELECT RANK(sales_today)
              OVER (ORDER BY sales_today)
FROM my_table;

Calculating Sales So Far This Week

The Sales So Far This Week column is calculated using SUM as a window function with a cumulative window frame:

SELECT SUM(sales_today)
            OVER (ORDER BY sales_date
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM my_table;

This query orders the rows by date and then, for each date, calculates the sum of sales from the start of the window up to the current date (inclusive).

Calculating Total Sales This Week

The Total Sales This Week column is calculated using SUM as an aggregate function (i.e. no window):

SELECT SUM(sales_today)
FROM my_table;

Calculating a 3-Day Moving Average

The 3-Day Moving Average column is calculated using AVG as a window function with a sliding window frame:

SELECT AVG(sales_today)
            OVER (ORDER BY sales_date
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM my_table;

The difference between this sliding window frame and the cumulative window frame described earlier is simply the starting point:

  • In a sliding window frame, the starting point slides rightward.
  • In a cumulative window frame, the starting point is fixed and the frame continues to accumulate with each additional row.

Additional Examples

This section provides more detailed examples of using the windowing sub-clauses in window functions, and illustrates how these sub-clauses work together.

These examples use the following table and data:

CREATE TABLE sales (sales_date DATE, quantity INTEGER);

INSERT INTO sales (sales_date, quantity) VALUES
    ('2018-01-01', 1),
    ('2018-01-02', 3),
    ('2018-01-03', 5),
    ('2018-02-01', 2)
    ;

Note

Many of these examples use two ORDER BY clauses, one for the window clause, and one to put the result set in the most readable order. For the purposes of our discussion in this topic, when we refer to the ORDER BY clause, we are usually referring to the clause within the window.

Window with PARTITION BY Clause

The PARTITION BY sub-clause of the windowing clause divides the data into distinct subsets based on the value of the input expression; the SELECT statement is applied to each subset, and the output has rows for each subset.

Note that this is similar to, but not identical to, how the GROUP BY clause works.

The next example shows the quantity sold each month, and uses the PARTITION BY clause to divide the data into one-month subsets:

SELECT MONTH(sales_date) AS MONTH_NUM, 
       SUM(quantity) OVER (PARTITION BY MONTH(sales_date)) AS QUANTITY
    FROM sales
    ORDER BY sales_date;
+-----------+----------+
| MONTH_NUM | QUANTITY |
|-----------+----------|
|         1 |        9 |
|         1 |        9 |
|         1 |        9 |
|         2 |        2 |
+-----------+----------+

As you can see, the first three rows are duplicates. There were 3 input rows for month #1, and the window function generates one output row for each input row, so there are 3 output rows for month #1 in the output. The SUM function is not used as a cumulative or sliding function; it is applied to the entire window and returns the same value for the entire window each time, so we get duplicate values as shown above.

You can reduce the duplicates by using the DISTINCT keyword:

SELECT DISTINCT MONTH(sales_date) AS MONTH_NUM, 
       SUM(quantity) OVER (PARTITION BY MONTH(sales_date))
    FROM sales
    ORDER BY MONTH_NUM;
+-----------+-----------------------------------------------------+
| MONTH_NUM | SUM(QUANTITY) OVER (PARTITION BY MONTH(SALES_DATE)) |
|-----------+-----------------------------------------------------|
|         1 |                                                   9 |
|         2 |                                                   2 |
+-----------+-----------------------------------------------------+

In this particular case, you can use a GROUP BY clause rather than a windowing clause. For example:

SELECT MONTH(sales_date) AS MONTH_NUM, 
       SUM(quantity)
    FROM sales
    GROUP BY MONTH_NUM
    ORDER BY MONTH_NUM;
+-----------+---------------+
| MONTH_NUM | SUM(QUANTITY) |
|-----------+---------------|
|         1 |             9 |
|         2 |             2 |
+-----------+---------------+

However, GROUP BY is not as flexible as windowing. Also, in more complex queries, you can’t always substitute GROUP BY for a window clause.

Window with ORDER BY Clause

The ORDER BY clause controls the order of the data within each window (and each partition if there is more than one partition). This is useful if you want to show a “running sum” over time as new rows are added.

A running sum can be calculated either from the beginning of the window to the current row (inclusive) or from the current row to the end of the window.

A query can use a “sliding” window, which is a fixed-width window that processes N specified rows relative to the current row (e.g. the 10 most recent rows, including the current row).

Both cumulative and sliding windows are explained below.

Cumulative Window Frame Example

In a “cumulative” window frame, values are computed from the beginning of the window to the current row (or from the current row to the end of the window):

SELECT MONTH(sales_date) AS MONTH_NUM, 
       quantity, 
       SUM(quantity) OVER (ORDER BY MONTH(sales_date)
                     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
           AS CUMULATIVE_SUM_QUANTITY
    FROM sales
    ORDER BY sales_date;

The query result includes additional comments showing how the CUMULATIVE_SUM_QUANTITY column was calculated:

+-----------+----------+-------------------------+
| MONTH_NUM | QUANTITY | CUMULATIVE_SUM_QUANTITY |
|-----------+----------+-------------------------|
|         1 |        1 |                       1 |  -- sum = 1
|         1 |        3 |                       4 |  -- sum = 1 + 3
|         1 |        5 |                       9 |  -- sum = 1 + 3 + 5
|         2 |        2 |                      11 |  -- sum = 1 + 3 + 5 + 2
+-----------+----------+-------------------------+

Sliding Window Frame Examples

In the financial world, analysts often study “moving averages”.

For example, you may have a graph in which the X axis is time, and the Y axis shows the average price of the stock over the last 13 weeks (i.e. “13-week moving average”). In a graph of a 13-week moving average of a stock price, the price shown for June 30th is not the price of the stock on June 30th, but the average price of the stock for the 13 weeks up to and including June 30th (i.e. April 1st through June 30th). The value on July 1st is the average price for April 2nd through July 1st; the value on July 2nd is the average price for April 3rd through July 2nd, and so on. Each day, we add the most recent day’s value to the moving average, and we remove the oldest day’s value. This smooths out day-to-day fluctuations and can make trends easier to recognize.

Moving averages can be calculated using a “sliding window”. The window has a specific width in rows. In the stock price example above, 13 weeks is 91 days, so the sliding window would be 91 days. If the measurements are taken once per day (e.g. at the end of the day), then the window would be 91 rows “wide”.

To define a window that is 91 rows wide:

SELECT AVG(price) OVER (ORDER BY timestamp1
                        ROWS BETWEEN 90 PRECEDING AND CURRENT ROW);

Note

When the window starts out, it might be less than 91 days wide. For example, if you want the 13-week moving average price of a stock that was created on April 1st, then you’ll have only 3 prices on April 3rd, so the window will be only 3 rows wide.

The following example shows the result of summing over a sliding window wide enough to hold two samples:

SELECT MONTH(sales_date) AS MONTH_NUM,
       quantity,
       SUM(quantity) OVER (ORDER BY sales_date
                           ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) 
           AS SLIDING_SUM_QUANTITY
  FROM sales
  ORDER BY sales_date;

The query result includes additional comments showing how the SLIDING_SUM_QUANTITY column was calculated:

+-----------+----------+----------------------+
| MONTH_NUM | QUANTITY | SLIDING_SUM_QUANTITY |
|-----------+----------+----------------------+
|         1 |        1 |                   1  |  -- sum = 1
|         1 |        3 |                   4  |  -- sum = 1 + 3
|         1 |        5 |                   8  |  -- sum = 3 + 5 (1 is no longer in the window)
|         2 |        2 |                   7  |  -- sum = 5 + 2 (3 is no longer in the window)
+-----------+----------+----------------------+

Note that the “sliding window” functionality requires the ORDER BY clause; the sliding window must know the order in which rows enter and exit the sliding window.

Window with PARTITION BY and ORDER BY Clauses

You can combine PARTITION BY and ORDER BY clauses to get running sums within partitions. In this example, the partitions are one month, and because the sums apply only within a partition, the sum is reset to 0 at the beginning of each new month:

SELECT MONTH(sales_date) AS MONTH_NUM,
       SUM(quantity) OVER (PARTITION BY MONTH(sales_date) ORDER BY sales_date)
          AS MONTHLY_CUMULATIVE_SUM_QUANTITY
    FROM sales
    ORDER BY sales_date;

The query result includes additional comments showing how the MONTHLY_CUMULATIVE_SUM_QUANTITY column was calculated:

+-----------+---------------------------------+
| MONTH_NUM | MONTHLY_CUMULATIVE_SUM_QUANTITY |
|-----------+---------------------------------+
|         1 |                               1 |  -- sum = 1
|         1 |                               4 |  -- sum = 1 + 3
|         1 |                               9 |  -- sum = 1 + 3 + 5
|         2 |                               2 |  -- sum = 0 + 2 (new month)
+-----------+---------------------------------+

You can combine partitions and sliding windows. In the example below, the sliding window is usually two rows wide, but each time a new partition (i.e new month) is reached, the sliding window starts with only the first row in that partition:

SELECT
       MONTH(sales_date) AS MONTH_NUM,
       quantity,
       SUM(quantity) OVER (PARTITION BY MONTH(sales_date) 
                           ORDER BY sales_date
                           ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) 
         AS MONTHLY_SLIDING_SUM_QUANTITY
    FROM sales
    ORDER BY sales_date;

The query result includes additional comments showing how the MONTHLY_SLIDING_SUM_QUANTITY column was calculated:

+-----------+----------+------------------------------+
| MONTH_NUM | QUANTITY | MONTHLY_SLIDING_SUM_QUANTITY |
|-----------+----------+------------------------------+
|         1 |        1 |                           1  |  -- sum = 1
|         1 |        3 |                           4  |  -- sum = 1 + 3
|         1 |        5 |                           8  |  -- sum = 3 + 5
|         2 |        2 |                           2  |  -- sum = 0 + 2 (new month)
+-----------+----------+------------------------------+