Data Unloading Considerations

This topic provides best practices, general guidelines, and important considerations for unloading data from a table. It is intended to help simplify exporting data from Snowflake tables into files in stages using the COPY INTO location command.

In this Topic:

Empty Strings and NULL Values

An empty string is a string with zero length or no characters, whereas NULL values represent an absence of data. In CSV files, a NULL value is typically represented by two successive delimiters, e.g. ,,, to indicate that the field contains no data; however, you can use string values to denote NULL, e.g. null or any unique string. An empty string is typically represented by a quoted empty string, e.g. ,'',, to indicate that the string contains zero characters.

The following file format options enable you to differentiate between empty strings and NULL values when unloading or loading data. For more information about these file formats, see CREATE FILE FORMAT:

FIELD_OPTIONALLY_ENCLOSED_BY = ‘character‘ | NONE

Use this option to enclose strings in the specified character: single quote ('), double quote ("), or NONE.

Enclosing string values in quotes while unloading data is not required. The COPY INTO location command can unload empty string values without enclosing quotes, with the EMPTY_FIELD_AS_NULL option set to FALSE. If the EMPTY_FIELD_AS_NULL option is TRUE (which is prohibited), then empty strings and NULL values are indistinguishable in the output file.

When a field contains this character, escape it using the same character. For example, if the value is the double quote character and a field contains the string "A", escape the double quotes as follows: ""A"".

Default: NONE

EMPTY_FIELD_AS_NULL = TRUE | FALSE
  • When unloading empty string data from tables, choose one of the following options:

    • Preferred: Enclose strings in quotes by setting the FIELD_OPTIONALLY_ENCLOSED_BY option, to distinguish empty strings from NULLs in output CSV files.

    • Leave string fields unenclosed by setting the FIELD_OPTIONALLY_ENCLOSED_BY option to NONE (default), and set the EMPTY_FIELD_AS_NULL value to FALSE to unload empty strings as empty fields.

      Important

      If you choose this option, make sure to specify a replacement string for NULL data using the NULL_IF option, to distinguish NULL values from empty strings in the output file. If you later choose to load data from the output files, you will specify the same NULL_IF value to identify the NULL values in the data files.

  • When loading data into tables, use this option to specify whether to insert SQL NULL for empty fields in an input file. If set to FALSE, Snowflake attempts to cast an empty field to the corresponding column type. An empty string is inserted into columns of type STRING. For other column types, the COPY command produces an error.

Default: TRUE

NULL_IF = ( ‘string1‘ [, ‘string2‘ ... ] )
  • When unloading data from tables: Snowflake converts SQL NULL values to the first value in the list. Be careful to specify only the value you actually want to unload from your table as NULL.
  • When loading data into tables: Strings in the data load source that Snowflake replaces with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value, e.g. NULL_IF = ('NULL', 'NUL', ''). Note that NULL_IF can include empty strings and only applies to columns that are nullable.

Default: \\N (i.e. NULL, assuming that ESCAPE_UNENCLOSED_FIELD is \\)

Example: Unloading and Loading Data with Enclosing Quotes

In the following example, a set of data is unloaded from the null_empty1 table to the user’s stage. The output data file is then used to load data into the null_tempty2 table:

-- Source table (``null_empty1``) contents
+---+------+--------------+
| i | V    | D            |
|---+------+--------------|
| 1 | NULL | NULL value   |
| 2 |      | Empty string |
+---+------+--------------+

-- Create file format that describes the data and the guidelines for processing it
create or replace file format my_csv_format
  field_optionally_enclosed_by='0x27' null_if=('null');

-- Unload table data into the user's stage
copy into @~
  from null_empty1
  file_format = (format_name = 'my_csv_format');

-- Output data file contents
1,'null','NULL value'
2,'','Empty string'

-- Load data from the staged file into the target table (``null_empty2``)
copy into null_empty2
    from @~/data_0_0_0.csv.gz
    file_format = (format_name = 'my_csv_format');

select * from null_empty2;

+---+------+--------------+
| i | V    | D            |
|---+------+--------------|
| 1 | NULL | NULL value   |
| 2 |      | Empty string |
+---+------+--------------+

Example: Unloading and Loading Data Without Enclosing Quotes

In the following example, a set of data is unloaded from the null_empty1 table to the user’s stage. The output data file is then used to load data into the null_tempty2 table:

-- Source table (``null_empty1``) contents
+---+------+--------------+
| i | V    | D            |
|---+------+--------------|
| 1 | NULL | NULL value   |
| 2 |      | Empty string |
+---+------+--------------+

-- Create file format that describes the data and the guidelines for processing it
create or replace file format my_csv_format
  empty_field_as_null=false null_if=('null');

-- Unload table data into the user's stage
copy into @~
  from null_empty1
  file_format = (format_name = 'my_csv_format');

-- Output data file contents
1,null,NULL value
2,,Empty string

-- Load data from the staged file into the target table (``null_empty2``)
copy into null_empty2
    from @~/data_0_0_0.csv.gz
    file_format = (format_name = 'my_csv_format');

select * from null_empty2;

+---+------+--------------+
| i | V    | D            |
|---+------+--------------|
| 1 | NULL | NULL value   |
| 2 |      | Empty string |
+---+------+--------------+

Unloading a Relational Table with Multiple Columns to JSON

You can use the OBJECT_CONSTRUCT function combined with the COPY command to convert the rows in a relational table to a single VARIANT column and unload the rows into a file.

For example:

-- Create a table

CREATE OR REPLACE TABLE mytable (
 id number(8) NOT NULL,
 first_name varchar(255) default NULL,
 last_name varchar(255) default NULL,
 city varchar(255),
 state varchar(255)
);

-- Populate the table with data

INSERT INTO mytable (id,first_name,last_name,city,state)
 VALUES
 (1,'Ryan','Dalton','Salt Lake City','UT'),
 (2,'Upton','Conway','Birmingham','AL'),
 (3,'Kibo','Horton','Columbus','GA');

-- Unload the data to a file in the stage for the table
-- Note: You can unload data to any stage

COPY INTO @%mytable
 FROM (SELECT OBJECT_CONSTRUCT('id', id, 'first_name', first_name, 'last_name', last_name, 'city', city, 'state', state) FROM mytable)
 FILE_FORMAT = (TYPE = JSON);

-- The COPY INTO location statement creates a file named data_0_0_0.json.gz in the stage.
-- The file contains the following data:

{"city":"Salt Lake City","first_name":"Ryan","id":1,"last_name":"Dalton","state":"UT"}
{"city":"Birmingham","first_name":"Upton","id":2,"last_name":"Conway","state":"AL"}
{"city":"Columbus","first_name":"Kibo","id":3,"last_name":"Horton","state":"GA"}

Unloading a Single Uncompressed (CSV) File

By default, COPY INTO location statements separate table data into a set of output files to take advantage of parallel operations. The maximum size for each file is set using the MAX_FILE_SIZE copy option. The default value is 16000000 (16 MB) but can be increased to accommodate larger files.

To maintain the data in a single output file (at the potential cost of decreased performance), specify the SINGLE = true copy option in your statement. You can optionally specify a name for the file in the path.

Note

  • If the COMPRESSION option is set to true, specify a filename with the appropriate file extension for the compression method so that the output file can be decompressed. For example, specify the GZ file extension if the GZIP compression method is specified.
  • The file size limit for single-file mode is 5GB.

For example, unload the mytable table data into a single file named myfile.csv in the user’s stage. Increase the MAX_FILE_SIZE limit to accommodate the large data set:

copy into @~/myfile.csv.gz from mytable
file_format = (type=csv compression='gzip') single=true max_file_size=4900000000;