Querying Semi-structured Data

Snowflake supports SQL queries that access semi-structured data using special operators and functions. Note that this topic applies to JSON, Avro, ORC, and Parquet data; the topic does not apply to XML data.

In this Topic:

Sample Data Used in Examples

Except where noted, the examples in this topic refer to a table named car_sales that contains a single VARIANT column named src:

CREATE OR REPLACE TABLE car_sales
(
  src variant
)
AS
SELECT PARSE_JSON(column1) AS src
FROM VALUES
('{
    "date" : "2017-04-28",
    "dealership" : "Valley View Auto Sales",
    "salesperson" : {
      "id": "55",
      "name": "Frank Beasley"
    },
    "customer" : [
      {"name": "Joyce Ridgely", "phone": "16504378889", "address": "San Francisco, CA"}
    ],
    "vehicle" : [
      {"make": "Honda", "model": "Civic", "year": "2017", "price": "20275", "extras":["ext warranty", "paint protection"]}
    ]
}'),
('{
    "date" : "2017-04-28",
    "dealership" : "Tindel Toyota",
    "salesperson" : {
      "id": "274",
      "name": "Greg Northrup"
    },
    "customer" : [
      {"name": "Bradley Greenbloom", "phone": "12127593751", "address": "New York, NY"}
    ],
    "vehicle" : [
      {"make": "Toyota", "model": "Camry", "year": "2017", "price": "23500", "extras":["ext warranty", "rust proofing", "fabric protection"]}
    ]
}') v;

SELECT * FROM car_sales;

+-------------------------------------------+
| SRC                                       |
|-------------------------------------------|
| {                                         |
|   "customer": [                           |
|     {                                     |
|       "address": "San Francisco, CA",     |
|       "name": "Joyce Ridgely",            |
|       "phone": "16504378889"              |
|     }                                     |
|   ],                                      |
|   "date": "2017-04-28",                   |
|   "dealership": "Valley View Auto Sales", |
|   "salesperson": {                        |
|     "id": "55",                           |
|     "name": "Frank Beasley"               |
|   },                                      |
|   "vehicle": [                            |
|     {                                     |
|       "extras": [                         |
|         "ext warranty",                   |
|         "paint protection"                |
|       ],                                  |
|       "make": "Honda",                    |
|       "model": "Civic",                   |
|       "price": "20275",                   |
|       "year": "2017"                      |
|     }                                     |
|   ]                                       |
| }                                         |
| {                                         |
|   "customer": [                           |
|     {                                     |
|       "address": "New York, NY",          |
|       "name": "Bradley Greenbloom",       |
|       "phone": "12127593751"              |
|     }                                     |
|   ],                                      |
|   "date": "2017-04-28",                   |
|   "dealership": "Tindel Toyota",          |
|   "salesperson": {                        |
|     "id": "274",                          |
|     "name": "Greg Northrup"               |
|   },                                      |
|   "vehicle": [                            |
|     {                                     |
|       "extras": [                         |
|         "ext warranty",                   |
|         "rust proofing",                  |
|         "fabric protection"               |
|       ],                                  |
|       "make": "Toyota",                   |
|       "model": "Camry",                   |
|       "price": "23500",                   |
|       "year": "2017"                      |
|     }                                     |
|   ]                                       |
| }                                         |
+-------------------------------------------+

Traversing Semi-structured Data

Insert a colon : between the VARIANT column name and any first-level element: <column>:<level1_element>.

Note

In the following examples, the query output is enclosed in double quotes. Operators : and subsequent . and [] always return VARIANT values containing strings by default, not VARCHAR values.

Get a list of all dealership names:

SELECT src:dealership FROM car_sales;

+--------------------------+
| SRC:DEALERSHIP           |
|--------------------------|
| "Valley View Auto Sales" |
| "Tindel Toyota"          |
+--------------------------+

Use dot-notation to traverse a path in a JSON object: <column>:<level1_element>.<level2_element>.<level3_element>. Optionally enclose element names in double quotes: <column>:"<level1_element>"."<level2_element>"."<level3_element>".

Note

If an element name contains spaces, you must enclose the name in double quotes, e.g.:

SELECT src:"company name" FROM partners;

Get the names of all salespeople who sold cars:

SELECT src:salesperson.name FROM car_sales;

+----------------------+
| SRC:SALESPERSON.NAME |
|----------------------|
| "Frank Beasley"      |
| "Greg Northrup"      |
+----------------------+

Alternatively, use bracket-notation to traverse the path in an object: <column>['<level1_element>']['<level2_element>']. Enclose element names in single quotes. Values are retrieved as strings.

Get the names of all salespeople who sold cars:

SELECT src['salesperson']['name'] FROM car_sales;

+----------------------------+
| SRC['SALESPERSON']['NAME'] |
|----------------------------|
| "Frank Beasley"            |
| "Greg Northrup"            |
+----------------------------+

Note

Regardless of which notation you use, the column name is case-insensitive but element names are case-sensitive. For example, in the following list, the first two paths are equivalent, but the third is not:

  • src:salesperson.name
  • SRC:salesperson.name
  • src:Salesperson.Name

Retrieving a Single Instance of a Repeating Element

Retrieve a specific numbered instance of a child element in a repeating array by adding a numbered predicate (starting from 0) to the array reference.

Note that to retrieve all instances of a child element in a repeating array, it is necessary to flatten the array. See an example in Using the FLATTEN Function to Parse Arrays in this topic.

Get the vehicle details for each sale:

SELECT src:vehicle[0] FROM car_sales;

+-------------------------+
| SRC:VEHICLE[0]          |
|-------------------------|
| {                       |
|   "extras": [           |
|     "ext warranty",     |
|     "paint protection"  |
|   ],                    |
|   "make": "Honda",      |
|   "model": "Civic",     |
|   "price": "20275",     |
|   "year": "2017"        |
| }                       |
| {                       |
|   "extras": [           |
|     "ext warranty",     |
|     "rust proofing",    |
|     "fabric protection" |
|   ],                    |
|   "make": "Toyota",     |
|   "model": "Camry",     |
|   "price": "23500",     |
|   "year": "2017"        |
| }                       |
+-------------------------+

Get the price of each car sold:

SELECT src:vehicle[0].price FROM car_sales;

+----------------------+
| SRC:VEHICLE[0].PRICE |
|----------------------|
| "20275"              |
| "23500"              |
+----------------------+

Explicitly Casting Values

By default, values retrieved from a VARIANT column are returned as string literals. To avoid unexpected results, explicitly cast the values to the desired data type (using :: notation).

Get the list of salesperson ID numbers as string values:

SELECT src:salesperson.id::string FROM car_sales;

+----------------------------+
| SRC:SALESPERSON.ID::STRING |
|----------------------------|
| 55                         |
| 274                        |
+----------------------------+

Using the FLATTEN Function to Parse Arrays

Parse an array using the FLATTEN function. FLATTEN is a table function that produces a lateral view of a VARIANT, OBJECT, or ARRAY column. The function returns a row for each object, and the LATERAL modifier joins the data with any information outside of the object.

Get the names and addresses of all customers. Cast the VARIANT output to string values:

SELECT
  value:name::string as "Customer Name",
  value:address::string as "Address"
  FROM
    car_sales
  , LATERAL FLATTEN(INPUT => SRC:customer);

+--------------------+-------------------+
| Customer Name      | Address           |
|--------------------+-------------------|
| Joyce Ridgely      | San Francisco, CA |
| Bradley Greenbloom | New York, NY      |
+--------------------+-------------------+

Using the FLATTEN Function to Parse Nested Arrays

The extras array is nested within the vehicle array in the sample data:

"vehicle" : [
     {"make": "Honda", "model": "Civic", "year": "2017", "price": "20275", "extras":["ext warranty", "paint protection"]}
   ]

Add a second FLATTEN clause to flatten the extras array within the flattened vehicle array and retrieve the “extras” purchased for each car sold:

SELECT
  vm.value:make::string as make,
  vm.value:model::string as model,
  ve.value::string as "Extras Purchased"
  FROM
    car_sales
  , LATERAL FLATTEN(INPUT => SRC:vehicle) vm
  , LATERAL FLATTEN(INPUT => vm.value:extras) ve;

  +--------+-------+-------------------+
  | MAKE   | MODEL | Extras Purchased  |
  |--------+-------+-------------------|
  | Honda  | Civic | ext warranty      |
  | Honda  | Civic | paint protection  |
  | Toyota | Camry | ext warranty      |
  | Toyota | Camry | rust proofing     |
  | Toyota | Camry | fabric protection |
  +--------+-------+-------------------+

Parsing Text as VARIANT Values Using the PARSE_JSON Function

Parse text as a JSON document using the PARSE_JSON function.

If the input is NULL, the output will also be NULL. However, if the input string is null, it is interpreted as a VARIANT null value; that is, the result is not a SQL NULL but a real value used to represent a null value in semi-structured formats.

For an example, see Sample Data Used in Examples in this topic.

Extracting Values Using the GET Function

GET accepts a VARIANT, OBJECT, or ARRAY value as the first argument and extracts the VARIANT value of the element in the path provided as the second argument.

Compute and extract the last element of each array in a VARIANT column using the GET and ARRAY_SIZE functions. ARRAY_SIZE returns the size of the input array:

Note

This example departs from the car_sales table used elsewhere in this topic.

CREATE OR replace TABLE colors (v variant);

INSERT INTO
   colors
   SELECT
      parse_json(column1) AS v
   FROM
   VALUES
     ('[{r:255,g:12,b:0},{r:0,g:255,b:0},{r:0,g:0,b:255}]'),
     ('[{c:0,m:1,y:1,k:0},{c:1,m:0,y:1,k:0},{c:1,m:1,y:0,k:0}]')
    v;

SELECT *, GET(v, ARRAY_SIZE(v)-1) FROM colors;

+---------------+-------------------------+
| V             | GET(V, ARRAY_SIZE(V)-1) |
|---------------+-------------------------|
| [             | {                       |
|   {           |   "b": 255,             |
|     "b": 0,   |   "g": 0,               |
|     "g": 12,  |   "r": 0                |
|     "r": 255  | }                       |
|   },          |                         |
|   {           |                         |
|     "b": 0,   |                         |
|     "g": 255, |                         |
|     "r": 0    |                         |
|   },          |                         |
|   {           |                         |
|     "b": 255, |                         |
|     "g": 0,   |                         |
|     "r": 0    |                         |
|   }           |                         |
| ]             |                         |
| [             | {                       |
|   {           |   "c": 1,               |
|     "c": 0,   |   "k": 0,               |
|     "k": 0,   |   "m": 1,               |
|     "m": 1,   |   "y": 0                |
|     "y": 1    | }                       |
|   },          |                         |
|   {           |                         |
|     "c": 1,   |                         |
|     "k": 0,   |                         |
|     "m": 0,   |                         |
|     "y": 1    |                         |
|   },          |                         |
|   {           |                         |
|     "c": 1,   |                         |
|     "k": 0,   |                         |
|     "m": 1,   |                         |
|     "y": 0    |                         |
|   }           |                         |
| ]             |                         |
+---------------+-------------------------+

Extracting Values by Path Using the GET_PATH Function

Extract a value from a VARIANT column using the GET_PATH , : function. The function is a variation of GET, used to extract a value using a path name. GET_PATH is equivalent to a chain of GET functions.

Get the vehicle make for the car purchased by each customer:

SELECT GET_PATH(src, 'vehicle[0]:make') FROM car_sales;

+----------------------------------+
| GET_PATH(SRC, 'VEHICLE[0]:MAKE') |
|----------------------------------|
| "Honda"                          |
| "Toyota"                         |
+----------------------------------+

Traversing Semi-structured Data describes the path syntax used to retrieve elements in a VARIANT column. The syntax is shorthand for the GET or GET_PATH , : function. Unlike the path syntax, these functions can handle irregular paths or path elements.

The following queries produce the same results:

SELECT GET_PATH(src, 'vehicle[0].make') FROM car_sales;

SELECT src:vehicle[0].make FROM car_sales;

Parsing Arrays Directly from a Staged Data File

Assume a staged file named contacts.json.gz contains the following data:

{
    "root": [
        {
            "employees": [
                {
                    "firstName": "Anna",
                    "lastName": "Smith"
                },
                {
                    "firstName": "Peter",
                    "lastName": "Jones"
                }
            ]
        }
    ]
}

Also assume a file format named my_json_format includes TYPE=JSON in its definition.

Query the name of the first employee in the staged file. In this example, the file is located in the customers table stage, but it could be located in any internal (i.e. Snowflake) or external stage:

SELECT 'The First Employee Record is '||
    S.$1:root[0].employees[0].firstName||
    ' '||S.$1:root[0].employees[0].lastName
FROM @%customers/contacts.json.gz (file_format => 'my_json_format') as S;

+----------------------------------------------+
| 'THE FIRST EMPLOYEE RECORD IS '||            |
|      S.$1:ROOT[0].EMPLOYEES[0].FIRSTNAME||   |
|      ' '||S.$1:ROOT[0].EMPLOYEES[0].LASTNAME |
|----------------------------------------------|
| The First Employee Record is Anna Smith      |
+----------------------------------------------+