Script: Loading JSON Data into a Relational Table

The annotated script in this tutorial loads sample JSON data into separate columns in a relational table directly from staged data files, avoiding the need for a staging table.

The example uses the following functions to modify the staged data while loading it:

The following is a representative row in the sample JSON file:

{
   "location": {
      "state_city": "MA-Lexington",
      "zip": "40503"
   },
   "sale_date": "2017-3-5",
   "price": "275836"
}

Execute the script using the Snowflake CLI client (SnowSQL). Note that PUT cannot be executed from the Worksheet in the Snowflake web interface.

In this Topic:

Prerequisites

  • Requires an active, running virtual warehouse.

  • Download the sample JSON data file. Right-click the name of the file, sales.json, and save the link/file to your local file system.

    Copy the file to your temporary folder:

    • Mac OS or Linux: //tmp
    • Windows: Open an Explorer window, and enter %TEMP% in the address bar.

SQL Script

/* Create a target relational table for the JSON data. The table is temporary, meaning it persists only for the duration
of the user session and is not visible to other users. */

CREATE OR REPLACE TEMPORARY TABLE home_sales (
  city STRING,
  zip STRING,
  state STRING,
  type STRING DEFAULT 'Residential',
  sale_date timestamp_ntz,
  price STRING
  );

/* Create a named file format with the file delimiter set as none and the record delimiter set as the new line character.

When loading semi-structured data, e.g. JSON, you should set CSV as the file format type (default value). You could use the
JSON file format, but any error in the transformation would stop the COPY operation, even if you set the ON_ERROR option to
continue or skip the file. */

CREATE OR REPLACE FILE FORMAT sf_tut_csv_format
  FIELD_DELIMITER = NONE
  RECORD_DELIMITER = '\\n';

/* Create a temporary internal stage that references the file format object.

  Similar to temporary tables, temporary stages are automatically dropped at the end of the session. */

CREATE OR REPLACE TEMPORARY STAGE sf_tut_stage
  FILE_FORMAT = sf_tut_csv_format;

/* Stage the data file.

Note that the example PUT statement references the Mac OS or Linux location of the data file.
If you are using Windows, execute the following statement instead:
PUT %TEMP%/sales.json @sf_tut_stage; */

PUT file:///tmp/sales.json @sf_tut_stage;

/* Load the JSON data into the relational table.

A SELECT query in the COPY statement identifies a numbered set of columns in the data files you are loading from. Note that all JSON data is stored in a single column ($1). */

COPY INTO home_sales(city, state, zip, sale_date, price)
   FROM (SELECT SUBSTR(parse_json($1):location.state_city,4), SUBSTR(parse_json($1):location.state_city,1,2), parse_json($1):location.zip, to_timestamp_ntz(parse_json($1):sale_date), parse_json($1):price
   FROM @sf_tut_stage/sales.json.gz t)
   ON_ERROR = 'continue';

-- Query the relational table
SELECT * from home_sales;