Using Window Functions

Window functions operate on windows, which are groups of rows that are related, for example by date or by location.

This tutorial describes window functions, including:

  • Window frame functions.
  • Analytic functions.
  • Aggregate functions used as window functions.
See also:
Window Functions Aggregate Functions

In this Topic:

Introduction to Window Functions

Aggregate Functions

To understand window functions, you must understand aggregate functions. A brief definition is that an aggregate function is a function that takes multiple values as input, and returns 1 value. (Technically, an aggregate function can take 0, 1, or multiple rows as input, but for simplicity, we usually just say that an aggregate function takes “multiple rows”).

For example, the SUM() function takes multiple input values and then returns a single output value that is the sum of all the input rows.

A more complete explanation of aggregate functions is at: aggregate functions.

Windows and Window Functions

A window is a group of 0, 1, or more rows. (For simplicity, we usually just say that a window contains “multiple rows”, even though in some cases it might contain 0 or 1.)

The rows in a window are related in some way. For example, the rows might be related by location (all from the same province or state) or by time (all from the same month or fiscal quarter).

To use a window, an SQL statement defines that window – for example, the statement might explicitly partition rows into separate windows based on province or state. If the statement does not explicitly define the windows, then it operates on a single window.

A function that is passed multiple rows (more precisely, a column name or expression that generates one value for each of the rows) is called a “window function”.

Since the definition of an aggregate function is “a function that takes multiple rows (values) and produces one output row (value)”, and since all window functions accept multiple inputs and produce one output, all window functions are aggregate functions.

(The converse is not necessarily true; an aggregate function is not necessarily a window function, so the terms are not synonymous. However, most or all current Snowflake aggregate functions can be used as window functions. The document Aggregate Functions lists all aggregate functions and indicates which of them can also be used as window functions.)

Analytic Functions

An analytic function is a window function for which the order of the rows in the window is significant.

The output of an analytic function depends upon:

  • The individual row passed to the function.
  • The values of the other rows in the window passed to the function.
  • The order of all the rows in the window.

For example, consider the function RANK. If you want to calculate the rank of each of your salespeople, you must specify the criterion on which to rank them – for example, the number of dollars of product they’ve sold, the percentage of customers who are still happy with the product 3 years after purchasing it, etc. Conceptually, you can calculate the rank by putting the sales people in order (e.g. by amount sold last month, in order from highest to lowest) and then calling the RANK() function to show which salesperson sold the most, which sold the second most, etc.

The rank of each salesperson depends on both the individual sales person (e.g. the amount she sold) and on how much every other sales person sold in that same time period. In database terms, the output of the rank function depends both upon the individual row, and on all the other rows within the window.

For an analytic function, the notion of order is crucial. All analytic functions require that the rows be ordered within the window according to some criterion.

Below is a simple example of the output from a query that used the analytic function RANK() to rank salespeople by total sales over the most recent time period:

+-------------+-------------+------+
| Salesperson | Amount Sold | Rank |
|-------------+-------------+------|
| Smith       |        2000 |    1 |
| Jones       |        1500 |    2 |
| Torkelson   |        1200 |    3 |
+-------------+-------------+------+

(The SQL syntax to generate reports like this is shown in more detail below and in the reference document Window Functions.)

For analytic functions, the function must be called once per row. For example, the RANK() function returns a different value for each row (assuming no tied values), so the RANK() function is called once for each row, and is passed both that row and the entire window (so that the row’s rank can be determined by comparing it to every other row in that window).

Not all window functions are analytic functions. The following section contrasts analytic functions and non-analytic window functions in more detail.

Contrasting Analytic Functions and non-Analytic Aggregate Functions

For non-analytic window functions (and indeed for all non-analytic aggregate functions), order is not important. For example, consider the aggregate function SUM():

SELECT salesperson, amount_sold, SUM(AMOUNT_SOLD) FROM sales_table;

The output might look similar to:

+-------------+-------------+------+
| Salesperson | Amount Sold | Total |
|-------------+-------------+-------|
| Smith       |        2000 |  4700 |
| Jones       |        1500 |  4700 |
| Torkelson   |        1200 |  4700 |
+-------------+-------------+-------+

Since the query simply sums the total sales across the entire window, the order doesn’t matter, and the sum is the same for each row. In fact, in this case the aggregate function used as a window function only needs to be called once per window, not once for each row in the window.

This is different from the requirement for an analytic function, which must be called once for each row in each window, since the output is different for each row in that window.

Window Frames

A window frame contains a subset of the rows in a window. (The term is slightly confusing; in your house, your windows are inside a window frame, but inside your database, your window frame is inside a window! A better term might have been “pane” rather than “frame”.)

There are two types of window frames:

  • Cumulative.
  • Sliding.

Cumulative Window Frame

A cumulative window is a variable-width window that starts at a fixed point and grows with each new row. For example, “sales total so far this month” is calculated using a cumulative window that starts on the first of the month and expands as each new day’s data is added. In the table below, the third column shows the output from a cumulative window frame that calculates the sales total so far this month:

+--------+-------+--------------+
| Day of | Sales | Sales So Far |
| Month  | Today | This Month   |
|--------+-------+--------------+
|      1 |    10 |           10 |
|      2 |    10 |           20 |
|      3 |    10 |           30 |
|    ... |   ... |          ... |
+--------+-------+--------------+

Sliding Window Frame

A sliding window frame is a fixed-width frame that “slides along” the rows in the window; as new rows appear, old rows are removed from the window, so that the width of the frame is always the same.

For example, a 13-week moving average of a stock price is a sliding window; for each day, it shows us the average price over the last 13 weeks (91 days), including the current day.

For example:

  • On June 30th, the function returns the average price for April 1 - June 30.
  • On July 1st, the function returns the average price for April 2 - July 1.
  • On July 2nd, the function returns the average price for April 3 - July 2.
  • Etc.

Note

Although sliding windows are fixed-width, there is an exception when there is not enough data to fill the window. For example, if record-keeping started on April 1st, then obviously from April 1st to June 29th, the sliding window would have fewer than 91 days of data in it. Only after the window finishes filling does its width remain fixed.

Here’s an illustration of a moving average using a 3-day sliding window.

This window always keeps the 3 most recent values and calculates the total within that fixed-size window. You can loosely picture this as a fixed-width window frame sliding from left to right along a timeline.

The timeline below shows the 7 days of the week.

|--1--|--2--|--3--|--4--|--5--|--6--|--7--|

Below, we show a 3-day sliding window frame as it would look on the 3rd day of the week:

|--1--|--2--|--3--|--4--|--5--|--6--|--7--|

|------ frame ----|

As you can see, this window frame covers “today” (the 3rd day of the week) and the preceding two days. On the 4th day of the week, the window frame slides rightward one day, and looks like:

|--1--|--2--|--3--|--4--|--5--|--6--|--7--|

      |------ frame ----|
+--------+-------+---------------+
| Day of | Sales | Most Recent   |
| Month  | Today | 3 Days' Sales |
|--------+-------+---------------+
|      1 |    10 |            10 |
|      2 |    11 |            21 |
|      3 |    12 |            33 |
|      4 |    13 |            36 |
|      5 |    14 |            39 |
|    ... |   ... |           ... |
+--------+-------+---------------+

As you can see from this example, the last column in the table contains the sum of the three most recent days’ worth of sales data. So, for example, for day 4, the sum (36) is the sum of the sales for days 2, 3, and day 4 (11 + 12 + 13).

A More Extensive Example

This example illustrates many of the concepts described above:

  • A window frame that acts as a cumulative window.
  • A window frame that acts as a sliding window.
  • A window that uses an aggregate function that does not use a window frame and is neither cumulative nor sliding.
  • A window used by an analytic function.

Suppose that you want a financial report that shows values based on sales over the last week:

  • Daily sales.
  • A “rank” column that indicates where this day ranked in sales (1st (highest), 2nd, etc. within this week). This uses a window function, specifically an analytic function.
  • Sales so far this week, i.e. the “running sum” for all days from the beginning of the week up through and including the current date. This uses a window function, more specifically an aggregate function in the context of a cumulative window frame.
  • Total sales for the week. This uses an aggregate function, SUM(). Although SUM() can be used as a window function, for this column it’s used as a pure aggregate function.
  • The 3-day moving average, which is the average of today and the two previous days. This uses a window function, more specifically an aggregate function in a sliding window frame.

The table would look something like the following:

+--------+-------+------+--------------+-------------+--------------+
| Day of | Sales | Rank | Sales So Far | Total Sales | 3-Day Moving |
| Week   | Today |      | This Week    | This Week   | Average      |
|--------+-------+------+--------------+-------------|--------------+
|      1 |    10 |    4 |           10 |          84 |         10.0 |
|      2 |    14 |    3 |           24 |          84 |         12.0 |
|      3 |     6 |    5 |           30 |          84 |         10.0 |
|      4 |     6 |    5 |           36 |          84 |          9.0 |
|      5 |    14 |    3 |           50 |          84 |         10.0 |
|      6 |    16 |    2 |           66 |          84 |         11.0 |
|      7 |    18 |    1 |           84 |          84 |         12.0 |
+--------+-------+------+--------------+-------------+--------------+

Window Function Syntax

Introduction

The syntax of window functions is covered in the document Window Functions. That document is a reference document and might seem a little terse for new users, so this tutorial provides some information about a subset of the syntax required by window functions.

Syntax

Windows and window frames are specified with the OVER(...) clause. The syntax of the OVER() clause is:

OVER ( [ PARTITION BY <expr1> ]
       [ ORDER BY <expr2> [ { cumulativeFrame | slidingFrame } ] ]
     )

The PARTITION BY clause partitions the data into windows. For example, if you want to calculate running sums of monthly sales for more than one month, you could partition the data by month. That would allow you to calculate one running sum of sales for January, another running sum for February, etc.

The ORDER BY clause is primarily for analytic window functions and for sliding and cumulative window frame functions; it determines the order of rows within each window.

For analytic functions, the ORDER BY clause is required, not optional. This is because analytic functions only operate on windows that contain rows in a specified order.

Examples

This first set of examples is based on the table used earlier in this tutorial:

+--------+-------+------+--------------+-------------+--------------+
| Day of | Sales | Rank | Sales So Far | Total Sales | 3-Day Moving |
| Week   | Today |      | This Week    | This Week   | Average      |
|--------+-------+------+--------------+-------------|--------------+
|      1 |    10 |    4 |           10 |          84 |         10.0 |
|      2 |    14 |    3 |           24 |          84 |         12.0 |
|      3 |     6 |    5 |           30 |          84 |         10.0 |
|      4 |     6 |    5 |           36 |          84 |          9.0 |
|      5 |    14 |    3 |           50 |          84 |         10.0 |
|      6 |    16 |    2 |           66 |          84 |         11.0 |
|      7 |    18 |    1 |           84 |          84 |         12.0 |
+--------+-------+------+--------------+-------------+--------------+

The SQL for this query is somewhat complex. Rather than show you one query that does all this, we’ll show you the SQL for some of the individual columns.

In the real world, we’d probably have years worth of data, and to calculate sums and averages for one specific week of data, we’d need to use a window one week wide, or use a filter similar to:

WHERE date >= start_of_relevant_week and date <= end_of_relevant_week

To focus on windowing, we’ll simplify and assume that our table contains only the most recent week’s worth of data.

Total Sales This Week

The simplest column is “Total Sales This Week”. It can be calculated using an aggregate function without being in a window context.

SELECT SUM(sales_today)
  FROM my_table;

Sales So Far This Week

The column “sales so far this week” is calculated using a window function. The SQL is similar to:

SELECT SUM(sales_today) OVER (ORDER BY sales_date
                              ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
  FROM my_table;

This says to put the rows in order by date, and then for each date calculate the sum of sales from the start of the window up to (and including the current date.

3-Day Moving Average

In our simple example of a 3-day average calculated since the beginning of the week, the SQL code is similar to:

SELECT SUM(sales_today) OVER (ORDER BY sales_date
                              ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
  FROM my_table;

The difference between this sliding window and the preceding cumulative window is simply the starting point; it’s a fixed point for the cumulative window but slides rightward for the sliding window.

Rank

SELECT RANK(sales_today) OVER (ORDER BY sales_today)
FROM my_table;

Illustrating the Syntax With More Examples

This section provides more detail about the windowing sub-clauses used by window functions, and how these sub-clauses work together. This section includes examples with extensive explanations.

These examples use the following table and data:

CREATE TABLE sales (sales_date DATE, quantity INTEGER);

INSERT INTO sales (sales_date, quantity) VALUES
  ('2018-01-01', 1),
  ('2018-01-02', 3),
  ('2018-01-03', 5),
  ('2018-21-01', 2)
  ;

Example Window With PARTITION BY clause

The PARTITION BY sub-clause of the windowing clause divides the data into distinct sub-sets based on the value of <expr1>; the SELECT statement is applied to each subset, and the output has rows for each subset.

(This is similar to, but not identical to, what the GROUP BY clause does.)

The next example shows the quantity sold each month, and uses the PARTITION BY clause to divide the data into one-month subsets:

SELECT MONTH(sales_date), SUM(quantity) OVER (PARTITION BY MONTH(sales_date))
  FROM sales;

The output is similar to:

+-------+----------+
| MONTH | QUANTITY |
|-------|----------|
| 01    |        9 |
| 01    |        9 |
| 01    |        9 |
| 02    |        2 |
+-------+----------+

As you can see, the first three rows are duplicates. There were 3 input rows for hour #1, and the window function generates one output row for each input row, so there are 3 output rows for hour #1 in the output. The SUM() function is not used as a cumulative or sliding function; it is applied to the entire window and returns the same value for the entire window each time, so we get duplicate values as shown above. You can reduce the duplicates by using the DISTINCT keyword:

SELECT MONTH(sales_date), SUM(quantity) OVER (PARTITION BY MONTH(sales_date))
  FROM sales;

The output is similar to:

+-------+----------+
| MONTH | QUANTITY |
|-------|----------|
| 01    |        9 |
| 02    |        2 |
+-------+----------+

In this particular case, you could have used a GROUP BY clause rather than a windowing clause, for example:

SELECT MONTH(sales_date) AS MonthNum, SUM(quantity)
  FROM sales
  GROUP BY MonthNum;

That query would have returned the same two data rows as shown above.

Example Window With ORDER BY clause

The ORDER BY clause controls the order of the data within each window (and each partition if there is more than one partition).

This is useful if you want to show a “running sum” over time as new rows are added.

A running sum can be calculated either from the beginning of the window to the current row (inclusive) or from the current row to the end of the window.

A query can use a “sliding” window, which is a fixed-width window that processes N specified rows relative to the current row, for example, the 10 most recent rows (including the current row).

Both cumulative and sliding windows are explained below.

Example with Cumulative Window Frame

In a “cumulative” window frame, values are computed from the beginning of the window to the current row (or from the current row to the end of the window).

SELECT MONTH(sales_date), quantity, SUM(quantity) OVER (ORDER BY MONTH(sales_date))
  FROM sales;

The output will be similar to the following (without the “Explanation” column):

+-------+----------+---------------+---------------+
| MONTH | QUANTITY | SUM(QUANTITY) | EXPLANATION   |
|-------+----------+---------------|---------------+
| 01    |        1 |             1 | 1             |
| 01    |        3 |             4 | 1 + 3         |
| 01    |        5 |             9 | 1 + 3 + 5     |
| 02    |        2 |            11 | 1 + 3 + 5 + 2 |
+-------+----------+---------------+---------------+

Example With Sliding Window Frame

In the financial world, people often talk about “moving averages”.

For example, you might see a graph in which the X axis is time, and the Y axis shows the average price of the stock over the last 13 weeks (the “13-week moving average”). In a graph of a 13-week moving average of a stock price, the price shown for June 30th is not the price of the stock on June 30th, but the average price of the stock for the 13 weeks up to and including June 30th (i.e. April 1st through June 30th). The value on July 1st is the average price for April 2nd through July 1st; the value on July 2nd is the average price for April 3rd through July 2nd, and so on. Each day, we add the most recent day’s value to the moving average, and we remove the oldest day’s value. This smooths out day-to-day fluctuations and can make trends easier to recognize.

Moving averages can be calculated by using a “sliding window”. The window has a specific width in rows. In the stock price example given above, 13 weeks is 91 days, so the sliding window would be 91 days. If the measurements are taken once per day (e.g. at the end of the day), then the window would be 91 rows “wide”.

To define a window 91 rows wide, use a clause similar to the following:

SELECT AVG(price) OVER (ORDER BY timestamp1
                        ROWS BETWEEN 90 PRECEDING AND CURRENT ROW);

(When the window starts out, it might be less than 91 days wide. For example, if you want the 13-week moving average price of a stock that was created on April 1st, then of course on April 3rd you’ll have only 3 prices so far, so the window will be only 3 rows wide, not 91 rows wide.)

The example below shows the result of summing over a sliding window wide enough to hold two samples:

SELECT MONTH(sales_date),
       quantity,
       SUM(quantity) OVER (ORDER BY sales_date
                          ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
  FROM sales;

The output looks similar to:

+-------+----------+---------------+---------------------------------------+
| MONTH | QUANTITY | SUM(QUANTITY) | Explanation                           |
|-------+----------+---------------+---------------------------------------|
| 01    |        1 |            1  | 1                                     |
| 01    |        3 |            4  | 1 + 3                                 |
| 01    |        5 |            8  | 3 + 5  (1 is no longer in the window) |
| 02    |        2 |            2  | 2      (new window for a new month)   |
+-------+----------+---------------+---------------------------------------+

Note that the “sliding window” functionality can’t be used without the ORDER BY clause; the sliding window must know the order in which rows enter and exit the sliding window.

Example Combining PARTITION BY and ORDER BY Clauses

You can combine PARTITION BY and ORDER BY clauses to get running sums within partitions. In this example, the partitions are one day, and because the sums apply only within a partition, the sum is reset to 0 at the beginning of each new day.

SELECT MONTH(sales_date),
       SUM(quantity) OVER (PARTITION BY MONTH(sales_date)
                           ORDER BY sales_date)
  FROM sales;

The output will be similar to:

+-------+----------+---------------+
| MONTH | QUANTITY | Explanation   |
|-------+----------+---------------|
|    01 |       1  | 1             |
|    01 |       4  | 1 + 3         |
|    01 |       9  | 1 + 3 + 5     |
|    02 |       2  | 2             |
+-------+----------+---------------+

You can combine partitions and sliding windows. In the example below, the sliding window is usually two rows wide, but each time a new partition is reached, the sliding window starts with only the first row in that partition:

SELECT
     MONTH(sales_date),
     quantity,
     SUM(quantity) OVER (PARTITION BY MONTH(sales_date)
           ORDER BY sales_date
           ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
  FROM sales;

The output looks similar to:

+-------+----------+---------------+---------------+
| MONTH | QUANTITY | SUM(QUANTITY) | Explanation   |
|-------+----------+---------------+---------------|
|    01 |        1 |            1  | 1             |
|    01 |        3 |            4  | 1 + 3         |
|    01 |        5 |            8  | 3 + 5         |
|    02 |        2 |            2  | 2             |
+-------+----------+---------------+---------------+