Categories:
DML Commands (by Category) - Data Loading

COPY INTO <table>

Loads data from staged files to an existing table. The files must already be staged in one of the following locations:

  • Named internal stage (or table/user stage). Files can be staged using the PUT command.
  • Named external stage that references an external location (AWS S3 bucket or Microsoft Azure container).
  • External location (AWS S3 bucket or Microsoft Azure container).
See also:
COPY INTO <location>

In this Topic:

Syntax

/* Standard data load */
COPY INTO [<namespace>.]<table_name>
     FROM { internalStage | externalStage | externalLocation }
[ FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
[ PATTERN = '<regex_pattern>' ]
[ FILE_FORMAT = ( { FORMAT_NAME = '[<namespace>.]<file_format_name>' |
                    TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] } ) ]
[ copyOptions ]
[ VALIDATION_MODE = RETURN_<n>_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS ]

/* Data load with transformation */
COPY INTO [<namespace>.]<table_name> [ ( <col_name> [ , <col_name> ... ] ) ]
     FROM ( SELECT [<alias>.]$<file_col_num>[.<element>] [ , [<alias>.]$<file_col_num>[.<element>] ... ]
            FROM { internalStage | externalStage | externalLocation } )
[ FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
[ PATTERN = '<regex_pattern>' ]
[ FILE_FORMAT = ( { FORMAT_NAME = '[<namespace>.]<file_format_name>' |
                    TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] } ) ]
[ copyOptions ]
[ VALIDATION_MODE = RETURN_<n>_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS ]

Where:

internalStage
internalStage ::=
    @[<namespace>.]<int_stage_name>[/<path>]
  | @[<namespace>.]%<table_name>[/<path>]
  | @~[/<path>]
externalStage
externalStage ::=
  @[<namespace>.]<ext_stage_name>[/<path>]
externalLocation (for AWS S3 bucket)
externalLocation ::=
  's3://<bucket>[/<path>]'
  [ CREDENTIALS = ( { AWS_KEY_ID = '<string>' AWS_SECRET_KEY = '<string>' [ AWS_TOKEN = '<string>' ] } ]
  [ ENCRYPTION = ( [ TYPE = { AWS_CSE | AWS_SSE_S3 | AWS_SSE_KMS | NONE } ] [ MASTER_KEY = '<string>' ] ) ]
externalLocation (for Microsoft Azure container)
externalLocation ::=
  'azure://<account>.blob.core.windows.net/<container>[/<path>]'
  [ CREDENTIALS = ( [ AZURE_SAS_TOKEN = <string> ] ) ]
  [ ENCRYPTION = ( [ TYPE = { AZURE_CSE | NONE } ] [ MASTER_KEY = '<string>' ] ) ]
                  /* Support for client-side encryption (AZURE_CSE) is not available yet */
formatTypeOptions (format-specific)
formatTypeOptions ::=
-- If FILE_FORMAT = ( TYPE = CSV ... )
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
     RECORD_DELIMITER = '<character>' | NONE
     FIELD_DELIMITER = '<character>' | NONE
     SKIP_HEADER = <integer>
     DATE_FORMAT = '<string>' | AUTO
     TIME_FORMAT = '<string>' | AUTO
     TIMESTAMP_FORMAT = '<string>' | AUTO
     BINARY_FORMAT = HEX | BASE64 | UTF8
     ESCAPE = '<character>' | NONE
     ESCAPE_UNENCLOSED_FIELD = '<character>' | NONE
     TRIM_SPACE = TRUE | FALSE
     FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | NONE
     NULL_IF = ( '<string>' [ , '<string>' ... ] )
     ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE | FALSE
     VALIDATE_UTF8 = TRUE | FALSE
     EMPTY_FIELD_AS_NULL = TRUE | FALSE
     SKIP_BYTE_ORDER_MARK = TRUE | FALSE
     ENCODING = '<string>' | UTF8
-- If FILE_FORMAT = ( TYPE = JSON ... )
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
     ENABLE_OCTAL = TRUE | FALSE
     ALLOW_DUPLICATE = TRUE | FALSE
     STRIP_OUTER_ARRAY = TRUE | FALSE
     STRIP_NULL_VALUES = TRUE | FALSE
     IGNORE_UTF8_ERRORS = TRUE | FALSE
     SKIP_BYTE_ORDER_MARK = TRUE | FALSE
-- If FILE_FORMAT = ( TYPE = AVRO ... )
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
-- If FILE_FORMAT = ( TYPE = ORC ... )
     -- none (no options are supported)
-- If FILE_FORMAT = ( TYPE = PARQUET ... )
     BINARY_AS_TEXT = TRUE | FALSE
-- If FILE_FORMAT = ( TYPE = XML ... )
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
     IGNORE_UTF8_ERRORS = TRUE | FALSE
     PRESERVE_SPACE = TRUE | FALSE
     STRIP_OUTER_ELEMENT = TRUE | FALSE
     DISABLE_SNOWFLAKE_DATA = TRUE | FALSE
     DISABLE_AUTO_CONVERT = TRUE | FALSE
     SKIP_BYTE_ORDER_MARK = TRUE | FALSE
copyOptions
copyOptions ::=
     ON_ERROR = { CONTINUE | SKIP_FILE | SKIP_FILE_<num> | SKIP_FILE_<num>% | ABORT_STATEMENT }
     SIZE_LIMIT = <num>
     PURGE = TRUE | FALSE
     RETURN_FAILED_ONLY = TRUE | FALSE
     ENFORCE_LENGTH = TRUE | FALSE
     TRUNCATECOLUMNS = TRUE | FALSE
     FORCE = TRUE | FALSE
     LOAD_UNCERTAIN_FILES = TRUE | FALSE

Required Parameters

[namespace.]table_name

Specifies the name of the table into which data is loaded.

Namespace optionally specifies the database and/or schema for the table, in the form of database_name.schema_name or schema_name. It is optional if a database and schema are currently in use within the user session; otherwise, it is required.

FROM ...

Specifies the internal or external location where the files containing data to be loaded are staged:

@[namespace.]int_stage_name[/path] Files are in the specified named internal stage.
@[namespace.]ext_stage_name[/path] Files are in the specified named external stage.
@[namespace.]%table_name[/path] Files are in the stage for the specified table.
@~[/path] Files are in the stage for the current user.
's3://bucket[/path]' Files are in the specified external location (S3 bucket). Additional parameters may be required. For details, see Additional Cloud Provider Parameters (in this topic).
'azure://account.blob.core.windows.net/container[/path]' Files are in the specified external location (Azure container). Additional parameters may be required. For details, see Additional Cloud Provider Parameters (in this topic).

Where:

  • namespace is the database and/or schema in which the internal or external stage resides, in the form of database_name.schema_name or schema_name. It is optional if a database and schema are currently in use within the user session; otherwise, it is required.

  • The optional path parameter restricts the set of files being queried to the files under the folder prefix. If path is specified, but no file is explicitly named in the path, all data files in the path are queried.

    Relative path modifiers such as /./ and /../ are interpreted literally because “paths” are literal prefixes for a name. For example:

    -- S3 bucket
    COPY INTO mytable FROM 's3://mybucket/./../a.csv';
    
    -- Azure container
    COPY INTO mytable FROM 'azure://myaccount.blob.core.windows.net/mycontainer/./../a.csv';
    

    In these COPY statements, Snowflake looks for a file literally named ./../a.csv in the external location.

Note

The URI string for an external location (i.e. S3 bucket or Azure container) must be enclosed in single quotes; however, you can enclose any string in single quotes, which allows special characters, including spaces, to be used in location and file names. For example:

-- Stages
COPY INTO mytable FROM '@mystage/path 1/file 1.csv';
COPY INTO mytable FROM '@%mytable/path 1/file 1.csv';
COPY INTO mytable FROM '@~/path 1/file 1.csv';

-- S3 bucket
COPY INTO mytable FROM 's3://mybucket 1/prefix 1/file 1.csv';

-- Azure container
COPY INTO mytable FROM 'azure://myaccount.blob.core.windows.net/mycontainer/encrypted_files/file 1.csv';

Additional Cloud Provider Parameters

CREDENTIALS = ( cloud_specific_credentials )

Required only for loading from an external private/protected S3 bucket or Azure container; not required for public buckets/containers

AWS S3
CREDENTIALS = ( AWS_KEY_ID = 'string' AWS_SECRET_KEY = 'string' [ AWS_TOKEN = 'string' ] )

Specifies the security credentials for connecting to AWS and accessing the private/protected S3 bucket where the files to load are staged. For more information, see Configuring IAM Credentials to Access an External Stage.

The credentials you specify depend on whether you associated the Snowflake access permissions for the bucket with an AWS IAM (Identity & Access Management) user or role:

  • IAM user: Temporary IAM credentials are required. Temporary (aka “scoped”) credentials are generated by AWS Security Token Service (STS) and consist of three components:

    • AWS_KEY_ID
    • AWS_SECRET_KEY
    • AWS_TOKEN

    All three are required to access a private/protected bucket. After a designated period of time, temporary credentials expire and can no longer be used. You must then generate a new set of valid temporary credentials.

    Important

    COPY commands contain complex syntax and sensitive information, such as credentials. In addition, they are executed frequently and are often stored in scripts or worksheets, which could lead to sensitive information being inadvertently exposed. The COPY command allows permanent (aka “long-term”) credentials to be used; however, for security reasons, do not use permanent credentials in COPY commands. Instead, use temporary credentials.

    If you must use permanent credentials, use external stages, for which credentials are entered once and securely stored, minimizing the potential for exposure.

Microsoft Azure
CREDENTIALS = ( AZURE_SAS_TOKEN = 'string' )
Specifies the SAS (shared access signature) token for connecting to Azure and accessing the private/protected container where the files containing data are staged. Credentials are generated by Azure.
ENCRYPTION = ( cloud_specific_encryption )

Required only for loading from encrypted files; not required if files are unencrypted

AWS S3
ENCRYPTION = ( [ TYPE = AWS_CSE | AWS_SSE_S3 | AWS_SSE_KMS | NONE ] [ MASTER_KEY = 'string' ] )

Specifies the encryption settings used to decrypt encrypted files in the S3 bucket and extract data:

TYPE = ...

Specifies the encryption type used. Possible values are:

  • AWS_CSE: Client-side encryption (requires a MASTER_KEY value). Currently, the client-side master key you provide can only be a symmetric key. Note that, when a MASTER_KEY value is provided, Snowlake assumes TYPE = AWS_CSE (i.e. when a MASTER_KEY value is provided, TYPE is not required).
  • AWS_SSE_S3: Server-side encryption that requires no additional encryption settings.
  • AWS_SSE_KMS: Server-side encryption that accepts an optional KMS_KEY_ID value.

For more information about the encryption types, see the AWS documentation for client-side encryption or server-side encryption.

MASTER_KEY = 'string' (applies only to AWS_CSE encryption)
Specifies the client-side master key that was used to encrypt the files in the bucket. The master key must be a 128-bit or 256-bit key in Base64-encoded form. Snowflake requires this key to decrypt encrypted files in the bucket and extract data for loading.
Microsoft Azure
ENCRYPTION = ( [ TYPE = AZURE_CSE | NONE ] [ MASTER_KEY = 'string' ] )

Specifies the encryption settings used to decrypt the files containing the loaded data:

TYPE = ...

Specifies the encryption type used. Possible values are:

  • AZURE_CSE: Client-side encryption (requires a MASTER_KEY value).
  • NONE: No encryption.

Note

Support for AZURE_CSE (i.e. client-side encryption) is not available yet.

MASTER_KEY = 'string' (applies only to AZURE_CSE encryption)
Specifies the client-side master key that Snowflake uses to encrypt the files containing the unloaded data. The master key must be a 128-bit or 256-bit key in Base64-encoded form.

Transformation Parameters

( SELECT [alias.]$file_col_num[.element] [ , [alias.]$file_col_num[.element] ... ] FROM ... )

Required for transforming data during loading

Specifies an explicit set of fields/columns (separated by commas) to load from the staged data files. The fields/columns are selected from the files using a standard SQL query (i.e. SELECT list), where:

alias Specifies an optional alias.
file_col_num Specifies the positional number of the field/column (in the file) that contains the data to be loaded (1 for the first field, 2 for the second field, etc.)
element Specifies the path and element name of a repeating value in the data file (applies only to semi-structured data files).

The SELECT list defines a numbered set of field/columns in the data files you are loading from. The list must match the sequence of columns in the target table. You can use the optional ( col_name [ , col_name ... ] ) parameter to map the list to specific columns in the target table.

Note that the actual field/column order in the data files can be different from the column order in the target table. It is only important that the SELECT list maps fields/columns in the data files to the corresponding columns in the table.

Note

The SELECT statement used for transformations does not support all functions. For a complete list of the supported functions and more details about data loading transformations, including examples, see the usage notes in Transforming Data During a Load.

Also, data loading transformation only supports selecting data from user stages and named stages (internal or external).

( col_name [ , col_name ... ] )

Optionally specifies an explicit list of table columns (separated by commas) into which you want to insert data:

  • The first column consumes the values produced from the first field/column extracted from the loaded files.
  • The second column consumes the values produced from the second field/column extracted from the loaded files.
  • And so on, in the order specified.

Columns cannot be repeated in this listing. Any columns excluded from this column list are populated by their default value (NULL, if not specified). However, excluded columns cannot have a sequence as their default value.

Optional Parameters

FILES = ( 'file_name' [ , 'file_name' ... ] )

Specifies a list of one or more files names (separated by commas) to be loaded. The files must already have been staged in either the Snowflake internal location or external location specified in the command.

The maximum number of files names that can be specified is 1000.

Note

For external stages only (i.e. S3 or Azure), the file path is set by concatenating the URL in the stage definition and the list of resolved file names.

However, Snowflake doesn’t insert a separator implicitly between the path and file names. You must explicitly include a separator (/) either at the end of the URL in the stage definition or at the beginning of each file name specified in this parameter.

PATTERN = 'regex_pattern'

A regular expression pattern string, enclosed in single quotes, specifying the file names and/or paths to match.

Note

For external stages only (i.e. S3 or Azure), the file path is set by concatenating the URL in the stage definition and the list of file names resolved by the regular expression pattern.

However, Snowflake doesn’t insert a separator implicitly between the path and file names. You must explicitly include a separator (/) either at the end of the URL in the stage definition or at the beginning of the specified pattern in this parameter.

Tip

For the best performance, try to avoid applying patterns that filter on large numbers of files.

FILE_FORMAT = ( FORMAT_NAME = 'file_format_name' ) or . FILE_FORMAT = ( TYPE = CSV | JSON | AVRO | ORC | PARQUET | XML [ ... ] )

Specifies the format of the data files to load:

FORMAT_NAME = 'file_format_name'
Specifies an existing named file format to use for loading data into the table. The named file format determines the format type (CSV, JSON, etc.), as well as any other format options, for the data files. For more information, see CREATE FILE FORMAT.
TYPE = CSV | JSON | AVRO | ORC | PARQUET | XML [ ... ]
Specifies the type of files to load into the table. If a format type is specified, then additional format-specific options can be specified. For more details, see formatTypeOptions (in this topic).

Note

FORMAT_NAME and TYPE are mutually exclusive; specifying both in the same COPY command may result in unexpected behavior.

COPY_OPTIONS = ( ... )
Specifies one or more copy options for the loaded data. For more details, see copyOptions (in this topic).
VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS

String (constant) that instructs the COPY command to validate the files instead of loading them into the specified table. The command validates the data to be loaded and returns results based on the validation option specified:

Supported Values Notes
RETURN_n_ROWS (e.g. RETURN_10_ROWS) Validates the specified number of rows, if no errors are encountered; otherwise, fails at the first error encountered in the rows.
RETURN_ERRORS Returns all errors (parsing, conversion, etc.) across all files specified in the COPY statement.
RETURN_ALL_ERRORS Returns all errors across all files specified in the COPY statement, including files with errors that were partially loaded during an earlier load because the ON_ERROR copy option was set to CONTINUE during the load.

Note

  • VALIDATION_MODE does not support COPY statements that transform data during a load. If the parameter is specified, the COPY statement returns an error.
  • Use the VALIDATE table function to view all errors encountered during a previous load. Note that this function also does not support COPY statements that transform data during a load.

formatTypeOptions

For FILE_FORMAT = ( ... ), the following format-specific options can be specified (separated by blank spaces, commas, or new lines):

TYPE = CSV
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.

Supported Values Notes
AUTO Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO.
GZIP  
BZ2  
BROTLI Must be used if loading Brotli-compressed files. Note that support for loading files compressed using Brotli (BROTLI) is currently in open preview.
ZSTD Zstandard v0.8 (and higher) supported. Note that support for loading files compressed using Zstandard (ZSTD) is currently in open preview.
DEFLATE Deflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATE Raw Deflate-compressed files (without header, RFC1951).
NONE Files for loading data have not been compressed.
RECORD_DELIMITER = 'character' | NONE

Character that separates records in an input file. Accepts common escape sequences, octal values (prefixed by \\), or hex values (prefixed by 0x). Also accepts a value of NONE.

Default: New line character. Note that “new line” is logical such that \r\n will be understood as a new line for files on a Windows platform.

FIELD_DELIMITER = 'character' | NONE

Character that separates fields in an input file. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE.

Default: comma (,)

SKIP_HEADER = integer

Number of lines at the start of the file to skip.

Default: 0

DATE_FORMAT = 'string' | AUTO

String that defines the format of date values in the data files to be loaded. If a value is not specified or is AUTO, the value for the DATE_INPUT_FORMAT session parameter is used.

Default: AUTO

TIME_FORMAT = 'string' | AUTO

String that defines the format of time values in the data files to be loaded. If a value is not specified or is AUTO, the value for the TIME_INPUT_FORMAT session parameter is used.

Default: AUTO

TIMESTAMP_FORMAT = 'string' | AUTO

String that defines the format of timestamp values in the data files to be loaded. If a value is not specified or is AUTO, the value for the TIMESTAMP_INPUT_FORMAT session parameter is used.

Default: AUTO

BINARY_FORMAT = HEX | BASE64 | UTF8

String (constant) that defines the encoding format for binary input or output. This option only applies when loading data into binary columns in a table.

Default: HEX

ESCAPE = 'character' | NONE

Single character used as the escape character for any field values. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE.

Specifies the escape character for enclosed fields. Specify the character used to enclose fields by setting FIELD_OPTIONALLY_ENCLOSED_BY.

Default: NONE

ESCAPE_UNENCLOSED_FIELD = 'character' | NONE

Single character string used as the escape character for unenclosed field values only. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE.

Default: backslash (\\)

TRIM_SPACE = TRUE | FALSE

Boolean that specifies whether to remove white space from fields.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Use this option to remove undesirable spaces during the data load.

As another example, if leading or trailing space surrounds quotes that enclose strings, you can remove the surrounding space using the TRIM_SPACE option and the quote character using the FIELD_OPTIONALLY_ENCLOSED_BY option. Note that any space within the quotes is preserved.

For example, assuming the field delimiter is | and FIELD_OPTIONALLY_ENCLOSED_BY = '"':

|"Hello world"|
|" Hello world "|
| "Hello world" |

becomes:

+---------------+
| C1            |
|----+----------|
| Hello world   |
|  Hello world  |
| Hello world   |
+---------------+

Default: FALSE

FIELD_OPTIONALLY_ENCLOSED_BY = 'character' | NONE

Character used to enclose strings. Value can be NONE, single quote character ('), or double quote character ("). To use the single quote character, use the octal or hex representation (0x27) or the double single-quoted escape ('').

When a field contains this character, escape it using the same character. For example, if the value is the double quote character and a field contains the string A "B" C, escape the double quotes as follows:

A ""B"" C

Default: NONE

NULL_IF = ( 'string1' [ , 'string2' ... ] )

String used to convert to and from SQL NULL.

When loading data, Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value. For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings and only applies to columns that are nullable.

Default: \\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\ (default))

ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE | FALSE

Boolean that specifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input data file does not match the number of columns in the corresponding table.

If set to FALSE, an error is not generated and the load continues. If the file is successfully loaded:

  • If the input file contains records with more fields than columns in the table, the matching fields are loaded in order of occurrence in the file and the remaining fields are not loaded.
  • If the input file contains records with fewer fields than columns in the table, the non-matching columns in the table are loaded with NULL values.

This option assumes all the records within the input file are the same length (i.e. a file containing records of varying length return an error regardless of the value specified for this option).

Default: TRUE

Note

When transforming data during loading (i.e. using a query as the source for the COPY command), this option is ignored. There is no requirement for your data files to have the same number and ordering of columns as your target table.

VALIDATE_UTF8 = TRUE | FALSE

Boolean that pecifies whether to validate UTF-8 character encoding in string column data.

If set to TRUE, Snowflake validates UTF-8 character encoding in string column data. When invalid UTF-8 character encoding is detected, the COPY command produces an error.

Default: TRUE

Important

This option is provided only to ensure backward compatibility with earlier versions of Snowflake. You should not disable this option unless instructed by Snowflake Support.

EMPTY_FIELD_AS_NULL = TRUE | FALSE

Boolean that specifies whether to insert SQL NULL for empty fields in an input file.

If set to FALSE, Snowflake attempts to cast an empty field to the corresponding column type. An empty string is inserted into columns of type STRING. For other column types, the COPY command produces an error.

Default: TRUE

SKIP_BYTE_ORDER_MARK = TRUE | FALSE

Boolean that specifies whether to skip any BOM (byte order mark) present in an input file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.

If set to FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.

Default: code:TRUE

ENCODING = 'string'

String (constant) that specifies the character set of the source data when loading data into a table.

Character Set ENCODING Value Supported Languages
Big5 BIG5 Traditional Chinese
EUC-JP EUCJP Japanese
EUC-KR EUCKR Korean
GB18030 GB18030 Chinese
IBM420 IBM420 Arabic
IBM424 IBM424 Hebrew
ISO-2022-CN ISO2022CN Simplified Chinese
ISO-2022-JP ISO2022JP Japanese
ISO-2022-KR ISO2022KR Korean
ISO-8859-1 ISO88591 Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish
ISO-8859-2 ISO88592 Czech, Hungarian, Polish, Romanian
ISO-8859-5 ISO88595 Russian
ISO-8859-6 ISO88596 Arabic
ISO-8859-7 ISO88597 Greek
ISO-8859-8 ISO88598 Hebrew
ISO-8859-9 ISO88599 Turkish
KOI8-R KOI8R Russian
Shift_JIS SHIFTJIS Japanese
UTF-8 UTF8 All languages
UTF-16BE UTF16BE All languages
UTF-16LE UTF16LE All languages
UTF-32BE UTF32BE All languages
UTF-32LE UTF32LE All languages
windows-1250 WINDOWS1250 Czech, Hungarian, Polish, Romanian
windows-1251 WINDOWS1251 Russian
windows-1252 WINDOWS1252 Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish
windows-1253 WINDOWS1253 Greek
windows-1254 WINDOWS1254 Turkish
windows-1255 WINDOWS1255 Hebrew
windows-1256 WINDOWS1256 Arabic

Default: UTF8

Note

Snowflake stores all data internally in the UTF-8 character set. The data is converted into UTF-8 before it is loaded into Snowflake. If the VALIDATE_UTF8 file format option is TRUE, Snowflake validates the UTF-8 character encoding in string column data after it is converted from its original character encoding.

TYPE = JSON
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.

Supported Values Notes
AUTO Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO.
GZIP  
BZ2  
BROTLI Must be used if loading Brotli-compressed files. Note that support for loading files compressed using Brotli (BROTLI) is currently in open preview.
ZSTD Zstandard v0.8 (and higher) supported. Note that support for loading files compressed using Zstandard (ZSTD) is currently in open preview.
DEFLATE Deflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATE Raw Deflate-compressed files (without header, RFC1951).
NONE Indicates the files for loading data have not been compressed.

Default: AUTO

ENABLE_OCTAL = TRUE | FALSE

Boolean that enables parsing of octal numbers.

Default: FALSE

ALLOW_DUPLICATE = TRUE | FALSE

Boolean that allows duplicate object field names (only the last one will be preserved).

Default: FALSE

STRIP_OUTER_ARRAY = TRUE | FALSE

Boolean that instructs the JSON parser to remove outer brackets [ ].

Default: FALSE

STRIP_NULL_VALUES = TRUE | FALSE

Boolean that instructs the JSON parser to remove object fields or array elements containing null values. For example, when set to TRUE:

Before After
[null] []
[null,null,3] [,,3]
{"a":null,"b":null,"c":123} {"c":123}
{"a":[1,null,2],"b":{"x":null,"y":88}} {"a":[1,,2],"b":{"y":88}}

Default: FALSE

IGNORE_UTF8_ERRORS = TRUE | FALSE

Boolean that specifies whether UTF-8 encoding errors produce error conditions. If set to TRUE, any invalid UTF-8 sequences are silently replaced with Unicode character U+FFFD (i.e. “replacement character”).

Default: FALSE

SKIP_BYTE_ORDER_MARK = TRUE | FALSE

Boolean that specifies whether to skip any BOM (byte order mark) present in an input file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.

If set to FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.

Default: code:TRUE

TYPE = AVRO
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.

Supported Values Notes
AUTO Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO.
GZIP  
BZ2  
BROTLI Must be used if loading Brotli-compressed files. Note that support for loading files compressed using Brotli (BROTLI) is currently in open preview.
ZSTD Zstandard v0.8 (and higher) supported. Note that support for loading files compressed using Zstandard (ZSTD) is currently in open preview.
DEFLATE Deflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATE Raw Deflate-compressed files (without header, RFC1951).
NONE Files for loading data have not been compressed.

Default: AUTO

TYPE = PARQUET
BINARY_AS_TEXT = TRUE | FALSE

Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text. When set to FALSE, Snowflake interprets these columns as binary data.

Default: TRUE

TYPE = XML
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.

Supported Values Notes
AUTO Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO.
GZIP  
BZ2  
BROTLI Must be used if loading Brotli-compressed files. Note that support for loading files compressed using Brotli (BROTLI) is currently in open preview.
ZSTD Zstandard v0.8 (and higher) supported. Note that support for loading files compressed using Zstandard (ZSTD) is currently in open preview.
DEFLATE Deflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATE Raw Deflate-compressed files (without header, RFC1951).
NONE Files for loading data have not been compressed.

Default: AUTO

IGNORE_UTF8_ERRORS = TRUE | FALSE

Boolean that specifies whether UTF-8 encoding errors produce error conditions. If set to TRUE, any invalid UTF-8 sequences are silently replaced with Unicode character U+FFFD (i.e. “replacement character”).

Default: FALSE

PRESERVE_SPACE = TRUE | FALSE

Boolean that specifies whether the XML parser preserves leading and trailing spaces in element content.

Default: FALSE

STRIP_OUTER_ELEMENT = TRUE | FALSE

Boolean that specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents.

Default: FALSE

DISABLE_SNOWFLAKE_DATA = TRUE | FALSE

Boolean that specifies whether the XML parser disables recognition of Snowflake semi-structured data tags.

Default: FALSE

DISABLE_AUTO_CONVERT = TRUE | FALSE

Boolean that specifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation.

Default: FALSE

SKIP_BYTE_ORDER_MARK = TRUE | FALSE

Boolean that specifies whether to skip any BOM (byte order mark) present in an input file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.

If set to FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.

Default: TRUE

Note

No file format options are supported for TYPE = ORC.

copyOptions

For COPY_OPTIONS = ( ... ), the following options can be specified (separated by blank spaces, commas, or new lines):

ON_ERROR = CONTINUE | SKIP_FILE | SKIP_FILE_num | SKIP_FILE_num% | ABORT_STATEMENT

String (constant) that specifies the action to perform when an error is encountered while loading data from a file:

Supported Values Notes
CONTINUE Continue loading the file.
SKIP_FILE Skip file if any errors encountered in the file.
SKIP_FILE_num (e.g. SKIP_FILE_10) Skip file when the number of errors in the file is equal to or exceeds the specified number.
SKIP_FILE_num% (e.g. SKIP_FILE_10%) Skip file when the percentage of errors in the file exceeds the specified percentage.
ABORT_STATEMENT Abort the COPY statement if any error is encountered.

Default: ABORT_STATEMENT

SIZE_LIMIT = num

Number (> 0) that specifies the maximum size (in bytes) of data to be loaded for a given COPY statement. When the threshold is exceeded, the COPY operation discontinues loading files. This option is commonly used to load a common group of files using multiple COPY statements. For each statement, the data load continues until the specified SIZE_LIMIT is exceeded, before moving on to the next statement.

For example, suppose a set of files in a stage path were each 10MB in size. If multiple COPY statements set SIZE_LIMIT to 25000000 (25MB), each would load 3 files. That is, each COPY operation would discontinue after the SIZE_LIMIT threshold was exceeded.

Note that at least one file is loaded regardless of the value specified for SIZE_LIMIT:code: unless there is no file to be loaded.

Default: null (no size limit)

PURGE = TRUE | FALSE

Boolean that specifies whether to purge the data files from the location automatically after the data is successfully loaded.

Default: FALSE

RETURN_FAILED_ONLY = TRUE | FALSE

Boolean that specifies whether to return only files that have failed to load in the statement result.

Default: FALSE

ENFORCE_LENGTH = TRUE | FALSE

Alternative syntax for TRUNCATECOLUMNS with reverse logic (for compatibility with other systems)

Boolean that specifies whether to truncate text strings that exceed the target column length:

  • If TRUE, the COPY statement produces an error if a loaded string exceeds the target column length.
  • If FALSE, strings are automatically truncated to the target column length.

Default: TRUE

Note

  • If the length of the target string column is set to the maximum (e.g. VARCHAR (16777216)), an incoming string cannot exceed this length; otherwise, the COPY command produces an error.
  • This parameter is functionally equivalent to TRUNCATECOLUMNS, but has the opposite behavior. It is provided for compatibility with other databases. It is only necessary to include one of these two parameters in a COPY statement to produce the desired output.
TRUNCATECOLUMNS = TRUE | FALSE

Alternative syntax for ENFORCE_LENGTH with reverse logic (for compatibility with other systems)

Boolean that specifies whether to truncate text strings that exceed the target column length:

  • If TRUE, strings are automatically truncated to the target column length.
  • If FALSE, the COPY statement produces an error if a loaded string exceeds the target column length.

Default: FALSE

Note

  • If the length of the target string column is set to the maximum (e.g. VARCHAR (16777216)), an incoming string cannot exceed this length; otherwise, the COPY command produces an error.
  • This parameter is functionally equivalent to ENFORCE_LENGTH, but has the opposite behavior. It is provided for compatibility with other databases. It is only necessary to include one of these two parameters in a COPY statement to produce the desired output.
FORCE = TRUE | FALSE

Boolean that specifies to load all files, regardless of whether they’ve been loaded previously and have not changed since they were loaded. Note that this option reloads files, potentially duplicating data in a table.

Default: FALSE

LOAD_UNCERTAIN_FILES = TRUE | FALSE

Boolean that specifies to load files for which the load status is unknown. The COPY command skips these files by default.

The load status is unknown if all of the following conditions are true:

  • The file’s LAST_MODIFIED date (i.e. date when the file was staged) is older than 64 days.
  • The initial set of data was loaded into the table more than 64 days earlier.
  • If the file was already loaded successfully into the table, this event occurred more than 64 days earlier.

To force the COPY command to load all files regardless of whether the load status is known, use the FORCE option instead.

For more information about load status uncertainty, see Loading Older Files.

Default: FALSE

Usage Notes

  • If loading into a table from the table’s own stage, the FROM clause is not required and can be omitted.

  • When transforming data during loading (i.e. using a query as the source for the COPY command):

    • Selecting data from files is supported only for named stages (internal or external) and user stages.

    • Both CSV and semi-structured file types are supported; however, even when loading semi-structured data (e.g. JSON), you should set CSV as the file format type (default value). You can use the corresponding file format (e.g. JSON), but any error in the transformation will stop the COPY operation, even if you set the ON_ERROR option to continue or skip the file.

      In addition, set the file format option FIELD_DELIMITER = NONE.

    • To transform JSON data during a load operation, you must structure the data files in NDJSON (“Newline Delimited JSON”) standard format; otherwise, you might encounter the following error:

      Error parsing JSON: more than one document in the input

    • The DISTINCT keyword is not supported. For example, the following command is not supported:

      COPY INTO mytable FROM (SELECT DISTINCT t.$1);
      
  • If referencing a file format in the current namespace, you can omit the single quotes around the format identifier.

  • CREDENTIALS and ENCRYPTION only apply if you are loading directly from a private/protected storage location (i.e. S3 bucket or Azure container):

    • If you are loading from a public bucket, no credentials are required.
    • If you are loading from a named external stage, the stage provides all the credential information required for accessing the bucket.
  • If you encounter errors while running the COPY command, after the command completes, you can validate the files that produced the errors using the VALIDATE table function.

    Note

    The VALIDATE function only returns output for COPY commands used to perform standard data loading; it does not support COPY commands that perform transformations during data loading (e.g. loading a subset of data columns or reordering data columns).

  • Unless you explicitly specify FORCE = TRUE as one of the copy options, the command ignores staged data files that were already loaded into the table. To reload the data, you must either specify FORCE = TRUE or modify the file and stage it again, which generates a new checksum.

  • The COPY command does not validate data type conversions for Parquet files.

Output

The command returns the following columns:

Column Name Data Type Description
FILE TEXT Name of source file and relative path to the file
STATUS TEXT Status: loaded, load failed or partially loaded
ROWS_PARSED NUMBER Number of rows parsed from the source file
ROWS_LOADED NUMBER Number of rows loaded from the source file
ERROR_LIMIT NUMBER If the number of errors reaches this limit, then abort
ERRORS_SEEN NUMBER Number of error rows in the source file
FIRST_ERROR TEXT First error of the source file
FIRST_ERROR_LINE NUMBER Line number of the first error
FIRST_ERROR_CHARACTER NUMBER Position of the first error character
FIRST_ERROR_COLUMN_NAME TEXT Column name of the first error

Examples

For examples of data loading transformations, see Transforming Data During a Load.

Loading Files from an Internal Stage

Note

These examples assume the files were copied to the stage earlier using the PUT command.

Load files from a named internal stage into a table:

COPY INTO mytable
FROM @my_int_stage;

Load files from a table’s stage into the table:

COPY INTO mytable
FILE_FORMAT = (TYPE = CSV);

Note

When copying data from files in a table location, the FROM clause can be omitted because Snowflake automatically checks for files in the table’s location.

Load files from the user’s personal stage into a table:

COPY INTO mytable from @~/staged
FILE_FORMAT = (FORMAT_NAME = 'mycsv');

Loading Files from a Named External Stage

Load files from a named external stage that you created previously using the CREATE STAGE command. The named external stage references an external location (i.e. S3 bucket or Azure container) and includes all the credentials and other details required for accessing the location:

COPY INTO mycsvtable
  FROM @my_ext_stage/tutorials/dataloading/contacts1.csv;

Loading Files Directly from an External Location

The following example loads all files prefixed with data/files from a storage location (i.e. S3 bucket or Azure container) using a named my_csv_format file format:

S3 bucket
COPY INTO mytable
  FROM s3://mybucket/data/files
  CREDENTIALS=(AWS_KEY_ID='$AWS_ACCESS_KEY_ID' AWS_SECRET_KEY='$AWS_SECRET_ACCESS_KEY')
  ENCRYPTION=(MASTER_KEY = 'eSxX0jzYfIamtnBKOEOwq80Au6NbSgPH5r4BDDwOaO8=')
  FILE_FORMAT = (FORMAT_NAME = my_csv_format);
Azure container
COPY INTO mytable
  FROM azure://myaccount.blob.core.windows.net/mycontainer/data/files
  CREDENTIALS=(AZURE_SAS_TOKEN='?sv=2016-05-31&ss=b&srt=sco&sp=rwdl&se=2018-06-27T10:05:50Z&st=2017-06-27T02:05:50Z&spr=https,http&sig=bgqQwoXwxzuD2GJfagRg7VOS8hzNr3QLT7rhS8OFRLQ%3D')
  ENCRYPTION=(TYPE='AZURE_CSE' MASTER_KEY = 'kPxX0jzYfIamtnJEUTHwq80Au6NbSgPH5r4BDDwOaO8=')
  FILE_FORMAT = (FORMAT_NAME = my_csv_format);

Loading Using Pattern Matching

Load files from a table’s stage into the table, using pattern matching to only load data from compressed CSV files in any path:

COPY INTO mytable
  FILE_FORMAT = (TYPE = 'CSV')
  PATTERN='.*/.*/.*[.]csv[.]gz';

Load files from a table’s stage into the table, using pattern matching to only load uncompressed CSV files whose names include the string sales:

COPY INTO mytable
  FILE_FORMAT = (FORMAT_NAME = myformat)
  PATTERN='.*sales.*[.]csv';

Loading JSON Data into a VARIANT Column

The following example loads JSON data into a table with a single column of type VARIANT.

The staged JSON array comprises three objects separated by new lines:

[{
    "location": {
      "city": "Lexington",
      "zip": "40503",
      },
      "sq__ft": "1000",
      "sale_date": "4-25-16",
      "price": "75836"
},
{
    "location": {
      "city": "Belmont",
      "zip": "02478",
      },
      "sq__ft": "1103",
      "sale_date": "6-18-16",
      "price": "92567"
}
{
    "location": {
      "city": "Winchester",
      "zip": "01890",
      },
      "sq__ft": "1122",
      "sale_date": "1-31-16",
      "price": "89921"
}]
/* Create a JSON file format that delimits records using the new line character and strips the outer array. */

CREATE OR REPLACE FILE FORMAT json_format
  TYPE = 'JSON'
  FIELD_DELIMITER = NONE
  RECORD_DELIMITER = '\\n'
  STRIP_OUTER_ARRAY = TRUE;

/* Create an internal stage that references the JSON file format. */

CREATE OR REPLACE STAGE mystage
  FILE_FORMAT = json_format;

/* Stage the JSON file. */

PUT file:///tmp/sales.json @mystage AUTO_COMPRESS=TRUE;

/* Create a target table for the JSON data. */

CREATE OR REPLACE TABLE house_sales (src VARIANT);

/* Copy the JSON data into the target table. */

COPY INTO house_sales
   FROM @mystage/sales.json.gz;

SELECT * FROM house_sales;

+---------------------------+
| SRC                       |
|---------------------------|
| {                         |
|   "location": {           |
|     "city": "Lexington",  |
|     "zip": "40503"        |
|   },                      |
|   "price": "75836",       |
|   "sale_date": "4-25-16", |
|   "sq__ft": "1000",       |
|   "type": "Residential"   |
| }                         |
| {                         |
|   "location": {           |
|     "city": "Belmont",    |
|     "zip": "02478"        |
|   },                      |
|   "price": "92567",       |
|   "sale_date": "6-18-16", |
|   "sq__ft": "1103",       |
|   "type": "Residential"   |
| }                         |
| {                         |
|   "location": {           |
|     "city": "Winchester", |
|     "zip": "01890"        |
|   },                      |
|   "price": "89921",       |
|   "sale_date": "1-31-16", |
|   "sq__ft": "1122",       |
|   "type": "Condo"         |
| }                         |
+---------------------------+

Reloading Files

Load files from a table’s stage into the table. Load specific files, then reload the files. In this example, the first command loads the specified files and the second command forces the same files to be loaded again (producing duplicate rows), even though the contents of the files have not changed:

COPY INTO load1 FROM @%load1/data1/
    FILES=('test1.csv', 'test2.csv')
    FILE_FORMAT=(SKIP_HEADER=1 NULL_IF=('') FIELD_OPTIONALLY_ENCLOSED_BY='"');

COPY INTO load1 FROM @%load1/data1/
    FILES=('test1.csv', 'test2.csv')
    FILE_FORMAT=(SKIP_HEADER=1 NULL_IF=('') FIELD_OPTIONALLY_ENCLOSED_BY='"')
    FORCE=TRUE;

Note

If FORCE=TRUE is not specified in the second command, Snowflake does not reload the files and the command completes successfully with zero files processed.

Purging Files After Loading

Load files from a table’s stage into the table and purge files after loading. By default, COPY does not purge loaded files from the location. To purge the files after loading:

  • Set PURGE=TRUE for the table to specify that all files successfully loaded into the table are purged after loading:

    ALTER TABLE mytable SET STAGE_COPY_OPTIONS = (PURGE = TRUE);
    
    COPY INTO mytable;
    
  • You can also override any of the copy options directly in the COPY command:

    COPY INTO mytable PURGE = TRUE;
    

Validating Staged Files

Validate files in a stage without loading:

  • Run the COPY command in validation mode and see all errors:

    COPY INTO mytable VALIDATION_MODE = 'RETURN_ERRORS';
    
    +-------------------------------------------------------------------------------------------------------------------------------+------------------------+------+-----------+-------------+----------+--------+-----------+----------------------+------------+----------------+
    |                                                         ERROR                                                                 |            FILE        | LINE | CHARACTER | BYTE_OFFSET | CATEGORY |  CODE  | SQL_STATE |   COLUMN_NAME        | ROW_NUMBER | ROW_START_LINE |
    +-------------------------------------------------------------------------------------------------------------------------------+------------------------+------+-----------+-------------+----------+--------+-----------+----------------------+------------+----------------+
    | Field delimiter ',' found while expecting record delimiter '\n'                                                               | @MYTABLE/data1.csv.gz  | 3    | 21        | 76          | parsing  | 100016 | 22000     | "MYTABLE"["QUOTA":3] | 3          | 3              |
    | NULL result in a non-nullable column. Use quotes if an empty field should be interpreted as an empty string instead of a null | @MYTABLE/data3.csv.gz  | 3    | 2         | 62          | parsing  | 100088 | 22000     | "MYTABLE"["NAME":1]  | 3          | 3              |
    | End of record reached while expected to parse column '"MYTABLE"["QUOTA":3]'                                                   | @MYTABLE/data3.csv.gz  | 4    | 20        | 96          | parsing  | 100068 | 22000     | "MYTABLE"["QUOTA":3] | 4          | 4              |
    +-------------------------------------------------------------------------------------------------------------------------------+------------------------+------+-----------+-------------+----------+--------+-----------+----------------------+------------+----------------+
    
  • Run the COPY command in validation mode for a specified number of rows. In this example, the first run encounters no errors in the specified number of rows and completes successfully, displaying the information as it will appear when loaded into the table. The second run encounters an error in the specified number of rows and fails with the error encountered:

    COPY INTO mytable VALIDATION_MODE = 'RETURN_2_ROWS';
    
    +--------------------+----------+-------+
    |        NAME        |    ID    | QUOTA |
    +--------------------+----------+-------+
    | Joe Smith          |  456111  | 0     |
    | Tom Jones          |  111111  | 3400  |
    +--------------------+----------+-------+
    
    COPY INTO mytable VALIDATION_MODE = 'RETURN_3_ROWS';
    
    FAILURE: NULL result in a non-nullable column. Use quotes if an empty field should be interpreted as an empty string instead of a null
      File '@MYTABLE/data3.csv.gz', line 3, character 2
      Row 3, column "MYTABLE"["NAME":1]