JavaScript UDTFs (User-Defined Table Functions)

Similar to SQL UDTFs, a JavaScript UDTF returns a set of rows, and can be accessed in the FROM clause of a query.

In this Topic:

Syntax

A JavaScript UDTF is defined using the same syntax as a standard SQL UDTF, but with the addition of the LANGUAGE keyword. In addition, instead of a SQL block in the function definition, JavaScript code is passed:

CREATE OR REPLACE FUNCTION <name> ( [ <arguments> ] )
  RETURNS TABLE ( <output_columns> )
  LANGUAGE JAVASCRIPT
  AS '<javascript_code>';

The next section provides details for writing the javascript_code that comprises the main body (aka “definition”) of the UDTF.

For a more detailed description of the general syntax for all UDFs, including JavaScript UDTFs, see CREATE FUNCTION.

Usage Notes

  • When using a UDTF in a JOIN, the join filter must be in the WHERE clause, not the ON clause.

Writing JavaScript UDTFs

Code Requirements

The JavaScript code must meet the following requirements for the UDTF to be valid:

  • The code must define a single literal JavaScript object.
  • The defined object must include a callback function called processRow. For more information, see the next section.

Important

If the JavaScript code does not meet these requirements, the UDTF will still be created; however, it will fail when called in a query.

Object Callback Functions

Through the JavaScript code, Snowflake interacts with the UDTF by invoking various callback functions during the execution of the query. The following skeleton outlines all available callbacks and their expected signature:

{
   processRow: function (row, rowWriter, context) {/*...*/},
   finalize: function (rowWriter, context) {/*...*/},
   initialize: function (argumentInfo, context) {/*...*/},
}

Note that only the processRow callback is required; the rest are optional.

processRow

This callback function is invoked once for each row in the input relation. The arguments to the UDTF are passed in the row object. For each of the arguments defined in the CREATE FUNCTION DDL used to create the UDTF, there is a property on the row object with the same name in all uppercase. The value of this property is the value of the argument for the current row converted to a JavaScript value.

The rowWriter argument is used by the user-supplied code to produce output rows. A single function, writeRow, is defined on the rowWriter object. The writeRow function takes one argument, the row object, which is a single row in the output table represented as a JavaScript object. For each column defined in the RETURNS clause of the CREATE FUNCTION DDL, a corresponding property can be defined on the row object. The value of that property on the row object will become the value for the corresponding column in the output relation. Any output columns without a corresponding property on the row object will have the value NULL in the result table.

finalize

This callback function is invoked once for each partition, after all rows have been passed to processRow.

It can be used to output any state that may have been aggregated in processRow using the same row writer as is passed to processRow.

initialize

This callback function is invoked once for each partition prior to any invocations of processRow; rows cannot be produced by this method.

initialize can be used to set up any state needed during the result computation. In addition, initialize is passed metadata about the arguments to the user-defined function in the argumentInfo object. argumentInfo has a property for each input column defined in the DDL with metadata about that column. Each entry is an object with the following values:

  • type: String. The type of this column.
  • isConst: Boolean. If true, the value of this column is constant (i.e. is the same for every row).
  • constValue: If isConst (as defined above) is true, this entry contains the constant value of the column; otherwise, this field is undefined.

General Usage Notes for Callback Functions

  • All three callback functions take a context object; this is reserved for future use and currently is empty.

    Caution

    Modifying the context object can yield undefined behavior.

  • Additional functions and properties can be defined, as needed, on the object for use in the UDTF.

  • The arguments to the callback functions are positional and can be named anything; however, for the purposes of this topic, the above names are used for the remaining discussion and examples.

Partitions

JavaScript UDTFs can be called using a partition. For example:

SELECT * FROM tab1, TABLE(js_udtf(tab1.c1, tab1.c2) OVER (PARTITION BY <expr> ORDER BY <expr>));

A partition consists of all rows for which the PARTITION BY expressions evaluates to the same value. The rows are passed to processRow in the order defined by the ORDER BY expression.

For this discussion, an instance of a JavaScript UDTF is defined as one instance of the JavaScript object used to represent the function in Snowflake.

For improved performance when invoking a JavaScript UDTF, several UDTF instances can execute in parallel. For this to work, the rows in the input need to be partitioned between the different instances of the function. Partitioning is specified when calling a JavaScript UDTF:

  • All rows in a partition are passed to the same UDTF instance through the processRow function.
  • initialize and finalize are each invoked once per partition.

Note that there is not a one-to-one relationship between partitions and UDTF instances. Although each partition will be processed by only one UDTF instance, the converse is not true – a single UDTF instance can process multiple partitions. It is therefore important to use initialize and finalize to specifically set up and tear down partitions, for example, to avoid “carrying over” accumulated values from the processing of one partition to the processing of another partition.

Result Columns

This section describes the columns that are returned by a JavaScript UDTF:

  • All columns defined in the RETURNS clause of the CREATE FUNCTION DDL are returned in the output relation.
  • All columns passed into the UDTF are also returned.

There is a distinction between rows produced in the processRow callback and rows produced by finalize:

  • When a row is produced in processRow, Snowflake can correlate it to an input, namely the one passed into the function as the row argument. Note that if a given processRow produces more than one row, input attributes are duplicated into each output row.

    For rows produced in processRow, all input columns are duplicated in the output relation.

  • In the finalize callback, Snowflake is unable to correlate it to any single row because there is no “current row” to correlate to.

    For rows produced in the finalize callback, only the columns used in the PARTITION BY clause are duplicated (as these are the same for any row in the current partition); all other attributes are NULL. If no PARTITION BY clause is specified, all input attributes are NULL.

Calling JavaScript UDTFs in Queries

No Partitioning

Similar to a SQL UDTF, the standard way to call a JavaScript UDTF is through a lateral join. For example:

SELECT * FROM tab1, TABLE(js_udtf(tab1.c1, tab1.c2));

In this example, tab1 is the table, with columns c1 and c2, and js_udtf is the name of the UDTF.

When no partitioning is used, the Snowflake execution engine partitions the input itself according to multiple factors, such as the size of the warehouse processing the function and the cardinality of the input relation. When running in this mode, the user code can make no assumptions about partitions. This is most useful when the function only needs to look at rows in isolation to produce its output and no state is aggregated across rows.

Explicit Partitioning

JavaScript UDTFs can also be called using a partition. For example:

SELECT * FROM tab1, TABLE(js_udtf(tab1.c1, tab1.c2) OVER (PARTITION BY tab1.c3 ORDER BY tab1.c1));

Explicit Partitioning with an Empty OVER Clause

SELECT * FROM tab1, TABLE(js_udtf(tab1.c1, tab1.c2) OVER ());

An empty OVER clause means that every row belongs to the same partition (i.e the entire input relation is one partition).

Note

You should exercise caution when calling a Javascript UDTF with an empty OVER clause because this limits Snowflake to creating one instance of the function and, therefore, Snowflake is unable to parallelize the computation.

Sample JavaScript UDTFs

This section contains several sample JavaScript UDTFs.

Basic Hello World Examples

The following JavaScript UDTF takes no parameters and always returns the same values. It is provided primarily for illustration purposes:

CREATE OR REPLACE FUNCTION HelloWorld0()
    RETURNS TABLE (OUTPUT_COL VARCHAR)
    LANGUAGE JAVASCRIPT
    AS '{
        processRow: function f(row, rowWriter, context){
           rowWriter.writeRow({OUTPUT_COL: "Hello"});
           rowWriter.writeRow({OUTPUT_COL: "World"});
           }
        }';

SELECT output_col FROM TABLE(HelloWorld0());

Output:

+------------+
| OUTPUT_COL |
+============+
| Hello      |
+------------+
| World      |
+------------+

The following JavaScript UDTF is also for illustration purposes, but uses an input parameter. Note that JavaScript is case-sensitive, but SQL forces identifiers to uppercase, so when the JavaScript code references a SQL parameter name, the JavaScript code must use uppercase.

Note also that function parameters are accessed through the parameter named row in the get_params() function:

CREATE OR REPLACE FUNCTION HelloHuman(First_Name VARCHAR, Last_Name VARCHAR)
    RETURNS TABLE (V VARCHAR)
    LANGUAGE JAVASCRIPT
    AS '{
        processRow: function get_params(row, rowWriter, context){
           rowWriter.writeRow({V: "Hello"});
           rowWriter.writeRow({V: row.FIRST_NAME});  // Note the capitalization and the use of "row."!
           rowWriter.writeRow({V: row.LAST_NAME});   // Note the capitalization and the use of "row."!
           }
        }';

SELECT V AS Greeting FROM TABLE(HelloHuman('James', 'Kirk'));

Output:

+------------+
|  GREETING  |
+============+
| Hello      |
+------------+
| James      |
+------------+
| Kirk       |
+------------+

Basic Examples Illustrating the Callback Functions

The following JavaScript UDTF illustrates all the API callback functions and various output columns. It simply returns all rows as-is and provides a count of the number of characters seen in each partition. It also illustrates how to share state across a partition using a THIS reference. Note that the example uses an initialize callback to initialize the counter to zero; this is needed because a given function instance can be used to process multiple partitions:

-- set up for the sample
CREATE TABLE parts (p FLOAT, s STRING);

INSERT INTO parts VALUES (1, 'michael'), (1, 'kelly'), (1, 'brian');
INSERT INTO parts VALUES (2, 'clara'), (2, 'maggie'), (2, 'reagan');

-- creation of the UDTF
CREATE OR REPLACE FUNCTION "CHAR_SUM"(INS STRING)
    RETURNS TABLE (NUM FLOAT)
    LANGUAGE JAVASCRIPT
    AS '{
    processRow: function (row, rowWriter, context) {
      this.ccount = this.ccount + 1;
      this.csum = this.csum + row.INS.length;
      rowWriter.writeRow({NUM: row.INS.length});
    },
    finalize: function (rowWriter, context) {
     rowWriter.writeRow({NUM: this.csum});
    },
    initialize: function(argumentInfo, context) {
     this.ccount = 0;
     this.csum = 0;
    }}';

The following query illustrates calling the CHAR_SUM UDTF on the parts table with no partitioning:

SELECT * FROM parts, TABLE(char_sum(s));

Output:

+--------+---------+-----+
| P      | S       | NUM |
+--------+---------+-----+
| 1      | michael | 7   |
| 1      | kelly   | 5   |
| 1      | brian   | 5   |
| 2      | clara   | 5   |
| 2      | maggie  | 6   |
| 2      | reagan  | 6   |
| [NULL] | [NULL]  | 34  |
+--------+---------+-----+

When no partitioning is specified, Snowflake automatically defines partitions. In this example, due to the small number of rows, only one partition is created (i.e. only one invocation of finalize is executed). Note that the final row has NULL values in the input columns.

Same query, but with explicit partitioning:

SELECT * FROM parts, TABLE(char_sum(s) OVER (PARTITION BY p));

Output:

+--------+---------+-----+
| P      | S       | NUM |
+--------+---------+-----+
| 1      | michael | 7   |
| 1      | kelly   | 5   |
| 1      | brian   | 5   |
| 1      | [NULL]  | 17  |
| 2      | clara   | 5   |
| 2      | maggie  | 6   |
| 2      | reagan  | 6   |
| 2      | [NULL]  | 17  |
+--------+---------+-----+

This example partitions over the p column, yielding two partitions. For each partition, a single row is returned in the finalize callback, yielding a total of two rows, distinguished by the NULL value in the s column. Because p is the PARTITION BY column, the rows created in finalize have the value of p that defines the current partition.

Extended Examples Using Table Values and Other UDTFs as Input

This basic UDTF converts a “range” of IP addresses to a complete list of IP addresses. The input consists of the first 3 segments of the IP address (e.g. '192.168.1') and then the start and end of the range used to generate the last segment (e.g. 42 and 45):

CREATE OR REPLACE FUNCTION range_to_values(PREFIX VARCHAR, RANGE_START FLOAT, RANGE_END FLOAT)
    RETURNS TABLE (IP_ADDRESS VARCHAR)
    LANGUAGE JAVASCRIPT
    AS $$
      {
        processRow: function f(row, rowWriter, context)  {
          var suffix = row.RANGE_START;
          while (suffix <= row.RANGE_END)  {
            rowWriter.writeRow( {IP_ADDRESS: row.PREFIX + "." + suffix} );
            suffix = suffix + 1;
            }
          }
      }
      $$;

SELECT * FROM TABLE(range_to_values('192.168.1', 42::FLOAT, 45::FLOAT));

Output:

+--------------+
| IP_ADDRESS   |
+==============+
| 192.168.1.42 |
+--------------+
| 192.168.1.43 |
+--------------+
| 192.168.1.44 |
+--------------+
| 192.168.1.45 |
+--------------+

Building on the previous example, you might want to calculate individual IP addresses for more than one range. This next statement creates a table of ranges that can be used to expand to individual IP addresses. The query then inputs the rows from the table into the range_to_values() UDTF to return the individual IP addresses:

CREATE TABLE ip_address_ranges(prefix VARCHAR, range_start INTEGER, range_end INTEGER);
INSERT INTO ip_address_ranges (prefix, range_start, range_end) VALUES
    ('192.168.1', 42, 44),
    ('192.168.2', 10, 12),
    ('192.168.2', 40, 40)
    ;

SELECT rtv.ip_address
  FROM ip_address_ranges AS r, TABLE(range_to_values(r.prefix, r.range_start::FLOAT, r.range_end::FLOAT)) AS rtv;

Output:

+--------------+
| IP_ADDRESS   |
+==============+
| 192.168.1.42 |
+--------------+
| 192.168.1.43 |
+--------------+
| 192.168.1.44 |
+--------------+
| 192.168.2.10 |
+--------------+
| 192.168.2.11 |
+--------------+
| 192.168.2.12 |
+--------------+
| 192.168.2.40 |
+--------------+

Attention

In this example, the syntax used in the FROM clause is identical to the syntax of an inner join (i.e. FROM t1, t2); however, the operation performed is not a true inner join. The actual behavior is the range_to_values() function is called with the values from each row in the ip_address changes table. In other words, it would be equivalent to writing:

for input_row in ip_address_ranges:
  output_row = range_to_values(input_row.prefix, input_row.range_start, input_row.range_end)

The concept of passing values to a UDTF can be extended to multiple UDTFs. The next example creates a UDTF named fake_ipv4_to_ipv6() that “converts” IPV4 address to IPV6 addresses. The query then calls the function as part of a more complex statement involving another UDTF:

-- Example UDTF that "converts" an IPV4 address to a range of IPV6 addresses.
-- (for illustration purposes only and is not intended for actual use)
CREATE OR REPLACE FUNCTION fake_ipv4_to_ipv6(ipv4 VARCHAR)
    RETURNS TABLE (IPV6 VARCHAR)
    LANGUAGE JAVASCRIPT
    AS $$
      {
        processRow: function f(row, rowWriter, context)  {
          rowWriter.writeRow( {IPV6: row.IPV4 + "." + "000.000.000.000"} );
          rowWriter.writeRow( {IPV6: row.IPV4 + "." + "..."} );
          rowWriter.writeRow( {IPV6: row.IPV4 + "." + "FFF.FFF.FFF.FFF"} );
          }
      }
      $$;

SELECT ipv6 FROM TABLE(fake_ipv4_to_ipv6('192.168.3.100'));

Output:

+-------------------------------+
| IPV6                          |
+===============================+
| 192.168.3.100.000.000.000.000 |
+-------------------------------+
| 192.168.3.100....             |
+-------------------------------+
| 192.168.3.100.FFF.FFF.FFF.FFF |
+-------------------------------+

The following query uses the fake_ipv4_to_ipv6 and range_to_values() UDTFs created earlier, with input from the ip_address changes table. In other words, it starts with a set of IP address ranges, converts them to individual IPV4 addresses, and then takes each IPV4 address and “converts” it to a range of IPV6 addresses:

SELECT rtv6.ipv6
  FROM ip_address_ranges AS r,
       TABLE(range_to_values(r.prefix, r.range_start::FLOAT, r.range_end::FLOAT)) AS rtv,
       TABLE(fake_ipv4_to_ipv6(rtv.ip_address)) AS rtv6
  WHERE r.prefix = '192.168.2'  -- limits the output for this example
  ;

Output:

+------------------------------+
| IPV6                         |
+==============================+
| 192.168.2.10.000.000.000.000 |
+------------------------------+
| 192.168.2.10....             |
+------------------------------+
| 192.168.2.10.FFF.FFF.FFF.FFF |
+------------------------------+
| 192.168.2.11.000.000.000.000 |
+------------------------------+
| 192.168.2.11....             |
+------------------------------+
| 192.168.2.11.FFF.FFF.FFF.FFF |
+------------------------------+
| 192.168.2.12.000.000.000.000 |
+------------------------------+
| 192.168.2.12....             |
+------------------------------+
| 192.168.2.12.FFF.FFF.FFF.FFF |
+------------------------------+
| 192.168.2.40.000.000.000.000 |
+------------------------------+
| 192.168.2.40....             |
+------------------------------+
| 192.168.2.40.FFF.FFF.FFF.FFF |
+------------------------------+

Note that in this example, we used join syntax twice, but neither of the operations was a true join; both were calls to a UDTF using the output of a table or another UDTF as input.

A true inner join is order-insensitive. For example, the following statements are identical:

table1 INNER JOIN table2 ON ...
table2 INNER JOIN table1 ON ...

Inputing values to a UDTF is not a true join, and the operations are not order-insensitive. For example, the following query is identical to the previous example, except it reverses the order of the UDTFs in the FROM clause:

SELECT rtv6.ipv6
  FROM ip_address_ranges AS r,
       TABLE(fake_ipv4_to_ipv6(rtv.ip_address)) AS rtv6,
       TABLE(range_to_values(r.prefix, r.range_start::FLOAT, r.range_end::FLOAT)) AS rtv
 WHERE r.prefix = '192.168.2'  -- limits the output for this example
  ;

The query fails with the following error message:

SQL compilation error: error line 3 at position 35 invalid identifier 'RTV.IP_ADDRESS'

The rtv.ip_address identifier is invalid because it was not defined before it was used. In a true join, this wouldn’t happen, but when processing UDTFs using join syntax, this error may occur.

Next, try a statement that mixes inputing to a UDTF with a true join; however, remember that inputing to a UDTF and doing an inner join both use the same syntax, which may be confusing:

-- First, create a small table of IP address owners.
-- This table uses only IPv4 addresses for simplicity.
DROP TABLE ip_address_owners;
CREATE TABLE ip_address_owners (ip_address VARCHAR, owner_name VARCHAR);
INSERT INTO ip_address_owners (ip_address, owner_name) VALUES
  ('192.168.2.10', 'Barbara Hart'),
  ('192.168.2.11', 'David Saugus'),
  ('192.168.2.12', 'Diego King'),
  ('192.168.2.40', 'Victoria Valencia')
  ;

-- Now join the IP address owner table to the IPv4 addresses.
SELECT rtv.ip_address, ipo.owner_name
  FROM ip_address_ranges AS r,
       TABLE(range_to_values(r.prefix, r.range_start::FLOAT, r.range_end::FLOAT)) AS rtv,
       ip_address_owners AS ipo
 WHERE ipo.ip_address = rtv.ip_address AND
      r.prefix = '192.168.2'   -- limits the output for this example
  ;

Output:

+--------------+-------------------+
| IP_ADDRESS   | OWNER_NAME        |
+==============+===================+
| 192.168.2.10 | Barbara Hart      |
+--------------+-------------------+
| 192.168.2.11 | David Saugus      |
+--------------+-------------------+
| 192.168.2.12 | Diego King        |
+--------------+-------------------+
| 192.168.2.40 | Victoria Valencia |
+--------------+-------------------+

Attention

The preceding example works as described; however, you should take care when combining UDTFs with true joins because this may result in non-deterministic and/or unexpected behavior.

Also, note that this behavior may change in the future.