Binary Input and Output

In this Topic:

Supported Binary Formats

Snowflake supports three binary formats or encoding schemes: hex, base64, and UTF-8.

Hex

The “hex” format refers to the hexadecimal system, or base 16. In this format, each byte is represented by two characters (digits and letters from A to F). When converting binary to string, HEX uses uppercase letters. When converting string to binary, it is case-insensitive.

Hex is the default binary format.

Base64

The base64 format refers to the base 64 encoding defined in RFC 4648. With this format, each group of 3 bytes is represented by 4 characters (digits, letters, +, and /, with the = character for padding). When converting binary to string values, base64 does not insert any whitespace or line breaks. When converting string to binary, it ignores all whitespace and line breaks.

UTF-8

The UTF-8 format refers to the UTF-8 character encoding for Unicode. Unlike hex and base64, which are binary-to-text encodings, UTF-8 is a text-to-binary encoding. This means that conversion from string to binary always succeeds, but conversion from binary to string can fail.

This format is convenient for performing one-to-one conversion between binary and string, for reinterpreting the underlying data as one type or the other rather than actually encoding and decoding.

Session Parameters for Binary Values

There are two session parameters that determine how binary values are passed into and out of Snowflake:

  • BINARY_INPUT_FORMAT:

    Used for loading data into Snowflake and for performing conversion to binary in the one-argument version of TO_CHAR , TO_VARCHAR.

    The parameter can be set to HEX, BASE64, or UTF-8 (UTF8 is also allowed). The strings are case-insensitive The default is HEX.

  • BINARY_OUTPUT_FORMAT:

    Used for displaying binary result sets, unloading data from Snowflake, and performing conversion to string in the one-argument version of TO_BINARY.

    The parameter can be set to HEX or BASE64. The strings are case-insensitive The default is HEX.

    Note

    Since conversion from binary to string may fail with the UTF-8 format, BINARY_OUTPUT_FORMAT can not be set to UTF-8. To use UTF-8 for conversion in this situation, use the two-argument version of TO_CHAR , TO_VARCHAR.

The parameters can be set at the account, user, and session levels. Execute the SHOW PARAMETERS command to view the current parameter settings that apply to all operations in the current session.

File Format Option for Loading/Unloading Binary Values

Separate from the input and output format parameters, Snowflake provides the BINARY_FORMAT file format option to use when loading data into or unloading data from Snowflake tables.

This option can be set to HEX, BASE64, or UTF-8 (case-insensitive). The option affects both data loading and unloading and, similar to other file format options, can be specified in a number of ways:

  • In a named file format, which can then be referenced in a named stage or directly in a COPY command.
  • In a named stage, which can then be referenced directly in a COPY command.
  • Directly in a COPY command.

Data Loading

When used in data loading, the BINARY_FORMAT option specifies the format of binary values in your staged data files. This option overrides the BINARY_INPUT_FORMAT parameter setting (see Session Parameters for Binary Values above).

If the option is set to HEX or BASE64, data loading may fail if the strings in the staged data file are not valid hex or base64. In that case, Snowflake will return an error and then perform the action specified in the ON_ERROR copy option.

Data Unloading

When used in data unloading, the BINARY_FORMAT option specifies the format applied to binary values unloaded to the files in specified staging location. This option overrides the BINARY_OUTPUT_FORMAT parameter setting (see Session Parameters for Binary Values above).

If the option is set to UTF-8, data unloading may fail if any binary values in the table contain invalid UTF-8. In that case, Snowflake returns an error.