Semi-structured Data Types

This topic describes the semi-structured data types supported in Snowflake.

In this Topic:

Data Types

The following data types are used to represent arbitrary data structures which can be used to import and operate on semi-structured data (JSON, Avro, ORC, Parquet, or XML). Snowflake stores these types internally in an efficient compressed columnar binary representation of the documents for better performance and efficiency. Snowflake’s optimization for storage of these data types is completely transparent and produces no user-visible changes in semantics.

VARIANT

A tagged universal type, which can store values of any other type, including OBJECT and ARRAY, up to a maximum size of 16MB compressed.

A value of any data type can be implicitly cast to a VARIANT value, subject to size restrictions. When a VARIANT value (without an explicit cast) and a NUMERIC value are coerced into the same type, the NUMERIC value is cast to VARIANT. For example:

var:json_path >= 56 is cast to var:json_path >= 56::VARIANT

Explicit casts can be used to convert values to and from the VARIANT data type. For example, to convert a value to the VARIANT data type, use TO_VARIANT or expr::variant. To convert a value from the variant type, use expr::<data_type>, where <data_type> is the data type that you want to convert to, for example variant_col1::FLOAT. The sample code below shows conversions from variant and to variant:

CREATE TABLE varia (float1 FLOAT, v VARIANT, float2 FLOAT);
INSERT INTO varia (float1, v, float2) VALUES (1.23, NULL, NULL);
SELECT * FROM varia;
UPDATE varia SET v = TO_VARIANT(float1);  -- converts FROM a float TO a variant.
UPDATE varia SET float2 = v::FLOAT;       -- converts FROM a variant TO a float.
SELECT * FROM varia;

Note that a VARIANT value can be missing (contain SQL NULL), which is different from a VARIANT null value, which is a real value used to represent a null value in semi-structured data. VARIANT null is a true value that compares as equal to itself.

VARIANT columns in a relational table are stored as separate physical columns. Repeating keys and paths are further stored as separate physical columns, similar to regular SQL attributes. Currently, this is not true of data stored in arrays. For data that is mostly regular and uses only native JSON types (strings and numbers, not timestamps), both storage and query performance for operations on relational data and data in a VARIANT column is very similar. Non-native values such as dates and timestamps are stored as strings when loaded into a VARIANT column, so operations on these values could be slower and also consume more space than when stored in a relational column with the corresponding data type.

OBJECT

Used to represent collections of key-value pairs, where the key is a non-empty string, and the value is a value of VARIANT type. Snowflake does not currently support explicitly-typed objects.

ARRAY

Used to represent dense or sparse arrays of arbitrary size, where index is a non-negative integer (up to 2^31-1), and values have VARIANT type. Snowflake does not currently support fixed-size arrays or arrays of elements of a specific non-VARIANT type.

Examples

For additional semi-structured data usage examples, see Querying Semi-structured Data.

CREATE OR REPLACE TABLE test_semi_structured(var VARIANT,
                                    arr ARRAY,
                                    obj OBJECT
                                    );

DESC TABLE test_semi_structured;

+------+---------+--------+-------+---------+-------------+------------+-------+------------+---------+
| name | type    | kind   | null? | default | primary key | unique key | check | expression | comment |
|------+---------+--------+-------+---------+-------------+------------+-------+------------+---------|
| VAR  | VARIANT | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| ARR  | ARRAY   | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| OBJ  | OBJECT  | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
+------+---------+--------+-------+---------+-------------+------------+-------+------------+---------+