Binary Input and Output

Snowflake supports three binary formats or encoding schemes: hex, base64, and UTF-8.

In this Topic:

Overview of Supported Binary Formats

hex (Default)

The “hex” format refers to the hexadecimal, or base 16, system. In this format, each byte is represented by two characters (digits from 0 to 9 and letters from A to F). When using hex to perform conversion:

From: To: Notes
Binary String hex uses uppercase letters.
String Binary hex is case-insensitive.

Hex is the default binary format.

base64

The base64 format refers to the base 64 encoding scheme defined in RFC 4648. With this format, each group of 3 bytes is represented by 4 characters (digits, letters, +, and /, with the = character for padding). When using base64 to performing conversion:

From: To: Notes
Binary String base64 does not insert any whitespace or line breaks.
String Binary base64 ignores all whitespace and line breaks.

UTF-8

The UTF-8 format refers to the UTF-8 character encoding for Unicode. Unlike hex and base64, which are binary-to-text encodings, UTF-8 is a text-to-binary encoding. This means that conversion from string to binary always succeeds, but conversion from binary to string can fail.

This format is convenient for performing one-to-one conversion between binary and string, for reinterpreting the underlying data as one type or the other rather than actually encoding and decoding.

Session Parameters for Binary Values

There are two session parameters that determine how binary values are passed into and out of Snowflake:

  • BINARY_INPUT_FORMAT:

    Used for:

    • Performing conversion to binary in the one-argument version of TO_CHAR , TO_VARCHAR.
    • Loading data into Snowflake (if no file format option is specified; see below for details).

    The parameter can be set to HEX, BASE64, or UTF-8 (UTF8 is also allowed). The parameter values are case-insensitive. The default is HEX.

  • BINARY_OUTPUT_FORMAT:

    Used for:

    • Displaying binary result sets.
    • Performing conversion to string in the one-argument version of TO_BINARY.
    • Unloading data from Snowflake (if no file format option is specified; see below for details).

    The parameter can be set to HEX or BASE64. The parameter values are case-insensitive. The default is HEX.

    Note

    Because conversion from binary to string may fail with the UTF-8 format, BINARY_OUTPUT_FORMAT can not be set to UTF-8. To use UTF-8 for conversion in this situation, use the two-argument version of TO_CHAR , TO_VARCHAR.

The parameters can be set at the account, user, and session levels. Execute the SHOW PARAMETERS command to view the current parameter settings that apply to all operations in the current session.

File Format Option for Loading/Unloading Binary Values

Separate from the binary input and output session parameters, Snowflake provides the BINARY_FORMAT file format option, which can be used to explicitly control binary formatting when loading data into or unloading data from Snowflake tables.

This option can be set to HEX, BASE64, or UTF-8 (values are case-insensitive). The option affects both data loading and unloading and, similar to other file format options, can be specified in a number of ways:

  • In a named file format, which can then be referenced in a named stage or directly in a COPY command.
  • In a named stage, which can then be referenced directly in a COPY command.
  • Directly in a COPY command.

Data Loading

When used for data loading, BINARY_FORMAT specifies the format of binary values in your staged data files. This option overrides any value set for the BINARY_INPUT_FORMAT parameter in the session (see Session Parameters for Binary Values above).

If the option is set to HEX or BASE64, data loading may fail if the strings in the staged data file are not valid hex or base64. In this case, Snowflake returns an error and then performs the action specified for the ON_ERROR copy option.

Data Unloading

When used in data unloading, the BINARY_FORMAT option specifies the format applied to binary values unloaded to the files in the specified stage. This option overrides any value set for the BINARY_OUTPUT_FORMAT parameter in the session (see Session Parameters for Binary Values above).

If the option is set to UTF-8, data unloading may fail if any binary values in the table contain invalid UTF-8. In this case, Snowflake returns an error.

Example Input/Output

BINARY input/output can be confusing because “what you see isn’t necessarily what you get”.

Consider, for example, the following sequence of code:

CREATE TABLE binary_table (v VARCHAR, b BINARY);
INSERT INTO binary_table (v, b)
    SELECT 'AB', TO_BINARY('AB');
SELECT v, b FROM binary_table;
+----+----+
| V  | B  |
|----+----|
| AB | AB |
+----+----+

The outputs for column v (VARCHAR) and column b appear to be identical. Yet the value for column b was converted to binary. Why does the value in column b look unchanged?

The answer is that the argument to TO_BINARY is treated as a sequence of hexadecimal digits (even though it’s inside quotes and therefore looks like a string); the 2 characters you see are actually interpreted as a pair of hexadecimal digits that represent 1 byte of binary data, not 2 bytes of string data. (This wouldn’t have worked if our input “string” had contained characters other than hexadecimal digits; we would have gotten an error message similar to “String ‘…’ is not a legal hex-encoded string”.)

Furthermore, when BINARY data is displayed, by default it is displayed as a sequence of hexadecimal digits. Thus the data went in as hexadecimal digits (not a string) and is displayed as hexadecimal digits, so it appears unchanged.

In fact, if the goal was to store the two-character string ‘AB’, then the code was wrong. The proper code would use the function HEX_ENCODE to convert the string to a sequence of hexadecimal digits (or use another “encode” function to convert to another format, such as base64) before storing the data. Examples of that are below.

Hexadecimal (“HEX”) Format

One way to enter BINARY data is to encode it as a string of hexadecimal characters. An example is below.

Start by creating a table with a BINARY column:

CREATE TABLE demo_binary (b BINARY);

If you try to insert an “ordinary” string by using the TO_BINARY() function to try to convert it to a valid BINARY value, it fails:

INSERT INTO demo_binary (b) SELECT TO_BINARY('HELP', 'HEX');

Here’s the error message:

100115 (22000): String 'HELP' is not a legal hex-encoded string

This time, explicitly convert the input to a string of hexadecimal digits before inserting it (this will succeed):

INSERT INTO demo_binary (b) SELECT TO_BINARY(HEX_ENCODE('HELP'), 'HEX');

Now, retrieve the data that we inserted successfully:

SELECT TO_VARCHAR(b), HEX_DECODE_STRING(TO_VARCHAR(b)) FROM demo_binary;

The output of the SELECT statement is:

SELECT TO_VARCHAR(b), HEX_DECODE_STRING(TO_VARCHAR(b)) FROM demo_binary;
+---------------+----------------------------------+
| TO_VARCHAR(B) | HEX_DECODE_STRING(TO_VARCHAR(B)) |
|---------------+----------------------------------|
| 48454C50      | HELP                             |
+---------------+----------------------------------+

As you can see, by default the output is shown as hexadecimal. To get back the original string, use the function HEX_DECODE_STRING (the complement of the function HEX_ENCODE_STRING that was used earlier to encode the string).

The following helps show in more detail what’s going on internally:

SELECT 'HELP', HEX_ENCODE('HELP'), b, HEX_DECODE_STRING(HEX_ENCODE('HELP')),
   TO_VARCHAR(b), HEX_DECODE_STRING(TO_VARCHAR(b)) FROM demo_binary;

Output:

SELECT 'HELP', HEX_ENCODE('HELP'), b, HEX_DECODE_STRING(HEX_ENCODE('HELP')),
   TO_VARCHAR(b), HEX_DECODE_STRING(TO_VARCHAR(b)) FROM demo_binary;
+--------+--------------------+----------+---------------------------------------+---------------+----------------------------------+
| 'HELP' | HEX_ENCODE('HELP') | B        | HEX_DECODE_STRING(HEX_ENCODE('HELP')) | TO_VARCHAR(B) | HEX_DECODE_STRING(TO_VARCHAR(B)) |
|--------+--------------------+----------+---------------------------------------+---------------+----------------------------------|
| HELP   | 48454C50           | 48454C50 | HELP                                  | 48454C50      | HELP                             |
+--------+--------------------+----------+---------------------------------------+---------------+----------------------------------+

BASE64 Format

Before reading this section, you might want to read the Hexadecimal Format section above. The basic concepts are similar, and the Hexadecimal Format section explains them in more detail.

Start by creating a table with a BINARY column:

CREATE TABLE demo_binary (b BINARY);

Insert a row:

INSERT INTO demo_binary (b) SELECT TO_BINARY(BASE64_ENCODE('HELP'), 'BASE64');

Retrieve that row:

SELECT 'HELP', BASE64_ENCODE('HELP'),
   BASE64_DECODE_STRING(BASE64_ENCODE('HELP')),
   TO_VARCHAR(b, 'BASE64'), 
   BASE64_DECODE_STRING(TO_VARCHAR(b, 'BASE64'))
   FROM demo_binary;
+--------+-----------------------+---------------------------------------------+-------------------------+-----------------------------------------------+
| 'HELP' | BASE64_ENCODE('HELP') | BASE64_DECODE_STRING(BASE64_ENCODE('HELP')) | TO_VARCHAR(B, 'BASE64') | BASE64_DECODE_STRING(TO_VARCHAR(B, 'BASE64')) |
|--------+-----------------------+---------------------------------------------+-------------------------+-----------------------------------------------|
| HELP   | SEVMUA==              | HELP                                        | SEVMUA==                | HELP                                          |
+--------+-----------------------+---------------------------------------------+-------------------------+-----------------------------------------------+

UTF-8 Format

Start by creating a table with a BINARY column:

CREATE TABLE demo_binary (b BINARY);

Insert a row:

INSERT INTO demo_binary (b) SELECT TO_BINARY('HELP', 'UTF-8');

Retrieve that row:

SELECT 'HELP', TO_VARCHAR(b, 'UTF-8')
   FROM demo_binary;
+--------+------------------------+
| 'HELP' | TO_VARCHAR(B, 'UTF-8') |
|--------+------------------------|
| HELP   | HELP                   |
+--------+------------------------+