Overview of Data Loading

This topic describes concepts related to bulk loading data into Snowflake tables.

In this Topic:

Data Loading Process

Data loading overview

As illustrated in the diagram above, data loading is performed in two, separate steps:

  1. Upload (i.e. stage) one or more data files into either an internal file staging location (i.e. within Snowflake) or an external file staging location:

    Internal:Use the PUT command to stage the files to the internal location.
    External:Use the upload interfaces/utilities provided by the service that hosts the location. Currently, Amazon S3 is the only supported service for hosting external locations.
  2. Use the COPY INTO table command to load the contents of the staged file(s) into a Snowflake database table.

    This step requires a running virtual warehouse that is also the current (i.e. in use) warehouse for the session. The warehouse provides the compute resources to perform the actual insertion of rows into the table.

Data Loading Details

Data loading details

Information Required for Data Loading

To load data into Snowflake, the following information is needed:

  • File Staging Location: Location of the staged files that contain the data to load.
  • File Format: Information about the format of the files containing the data.
  • Copy Options: Information about the actions to perform while loading.

However, the only information that must be explicitly specified in the COPY INTO table command is the location of the staged files. For all the file format and copy options, Snowflake provides defaults. In addition, Snowflake provides named database objects (file formats and stages) that can be used to simplify specifying the information required for loading.

For more information, see:

Metadata Generated/Recorded for Data Loading

Each time the COPY INTO table command is executed, i.e. data load “run”, Snowflake records or updates metadata about the data load, including:

  • Number of files loaded.
  • Errors, if any, encountered during loading.

For more information, see Monitoring Data Loading & Managing Data Files.

Semi-structured Data Loading

Snowflake natively supports semi-structured data, which means semi-structured data can be loaded into relational tables without requiring the definition of a schema in advance. Snowflake supports loading semi-structured data directly into columns of type VARIANT (see Semi-structured Data Types for more details).

Currently supported semi-structured data formats include JSON, Avro, ORC, Parquet, or XML:

  • For JSON, Avro, ORC, and Parquet data, each top-level, complete object is loaded as a separate row in the table. Each object can contain new line characters and spaces as long as the object is valid.
  • For XML data, each top-level element is loaded as a separate row in the table. An element is identified by a start and close tag of the same name.

Typically, tables used to store semi-structured data consist of a single VARIANT column. Once the data is loaded, you can query the data similar to structured data. You can also perform other tasks, such as extracting values and objects from arrays. For more information, see the FLATTEN table function.

Note

Semi-structured data can be loaded into tables with multiple columns, but the semi-structured data must be stored as a field in a structured file (e.g. CSV file). Then, the data can be loaded into a specified column in the table.

Data Loading Tasks

For more information about the tasks associated with loading data, see Loading Data Using SnowSQL:

In addition, Snowflake provides a data loading wizard in the web interface. For more information, see Loading Data Using the Web Interface (Limited).