Script: Loading and Unloading Parquet Data

The annotated scripts in this tutorial describe a Parquet data workflow:

  • Script 1. Loads sample Parquet data into separate columns in a relational table directly from staged data files, avoiding the need for a staging table. In this example, COPY INTO <table> loads separate data into a pair of VARCHAR columns and a VARIANT column.

    Note that the script includes a PUT statement, so it cannot be executed in the Worksheets Worksheet tab page in the Snowflake web interface (or any other interfaces that do not support PUT). Instead, we recommend using SnowSQL, the Snowflake CLI client, to execute the script.

  • Script 2. Unloads relational Snowflake table data into separate columns in a Parquet file. When the Parquet file type is specified, COPY INTO <location> unloads to a single column by default.

In this Topic:

Prerequisites

  • Active, running virtual warehouse.

  • Sample Parquet data file (cities.parquet). If clicking the link does not download the file, right-click the link and save the link/file to your local file system.

    Then, copy the file to your temporary folder/directory:

    • macOS or Linux: //tmp

    • Windows: Open an Explorer window and enter %TEMP% in the address bar.

Example of Sample Data

The following is a representative row in the sample Parquet file:

{
  "continent": "Europe",
  "country": {
    "city": {
      "bag": [
        {
          "array_element": "Paris"
        },
        {
          "array_element": "Nice"
        },
        {
          "array_element": "Marseilles"
        },
        {
          "array_element": "Cannes"
        }
      ]
    },
    "name": "France"
  }
}

SQL Script 1: Load Parquet Data

Note

The PUT statement in this script assumes you are using a macOS or Linux environment. If you are using Windows, adjust the statement as noted in the comments.

/* Create a target relational table for the Parquet data. The table is temporary, meaning it persists only    */
/* for the duration of the user session and is not visible to other users.                                    */

create or replace temporary table cities (
  continent varchar default NULL,
  country varchar default NULL,
  city variant default NULL
);

/* Create a file format object that specifies the Parquet file format type.                                   */
/* Accepts the default values for other options.                                                              */

create or replace file format sf_tut_parquet_format
  type = 'parquet';

/* Create a temporary internal stage that references the file format object.                                  */
/* Similar to temporary tables, temporary stages are automatically dropped at the end of the session.         */

create or replace temporary stage sf_tut_stage
  file_format = sf_tut_parquet_format;

/* Stage the data file.                                                                                       */
/*                                                                                                            */
/* Note that the example PUT statement references the macOS or Linux location of the data file.               */
/* If you are using Windows, execute the following statement instead:                                         */

-- put %TEMP%/cities.parquet @sf_tut_stage;

put file:///tmp/cities.parquet @sf_tut_stage;

/* Load the Parquet data into the relational table.                                                           */
/*                                                                                                            */
/* A SELECT query in the COPY statement identifies a numbered set of columns in the data files you are        */
/* loading from. Note that all Parquet data is stored in a single column ($1).                                */
/*                                                                                                            */
/* Cast element values to the target column data type.                                                        */

copy into cities
  from (select
  $1:continent::varchar,
  $1:country:name::varchar,
  $1:country:city.bag::variant
  from @sf_tut_stage/cities.parquet);

/* Query the relational table                                                                                 */

SELECT * from cities;

SQL Script 2: Unload Parquet Data

/* Unload the CITIES table columns into a Parquet file. Optionally flatten the CITY column array and unload   */
/* the child elements to a separate column.                                                                   */
/*                                                                                                            */
/* To retain the table schema in the output file, use a simple SELECT statement (e.g. SELECT * FROM cities).  */
/* To retain the column names in the output file, use the HEADER = TRUE copy option.                          */

copy into @sf_tut_stage/out/parquet_ from (
select
  continent
  , country
  , c.value:array_element::string
  from
    cities
  , lateral flatten(input => city) c)
  file_format = (type = 'parquet')
  header = true;

/* Query the staged Parquet file.                                                                             */

select t.$1 from @sf_tut_stage/out/ t;