Overview of Data Loading

This topic provides an overview of the main options available to load data into Snowflake. It provides an overview of the options available.

In this Topic:

Bulk vs Continuous Loading

Snowflake provides the following main solutions for data loading. The best solution may depend upon the volume of data to load and the frequency of loading.

Bulk Loading Using the COPY Command

This option enables loading batches of data from files already available in cloud storage, or copying (i.e. staging) data files from a local machine to an internal (i.e. Snowflake) cloud storage location before loading the data into tables using the COPY command.

Compute Resources

Bulk loading relies on user-provided virtual warehouses, which are specified in the COPY statement. Users are required to size the warehouse appropriately to accommodate expected loads.

Simple Transformations During a Load

Snowflake supports transforming data while loading it into a table using the COPY command. Options include:

  • Column reordering
  • Column omission
  • Casts
  • Truncating text strings that exceed the target column length

There is no requirement for your data files to have the same number and ordering of columns as your target table.

Continuous Loading Using Snowpipe

This option is designed to load small volumes of data (i.e. micro-batches) and incrementally make them available for analysis. Snowpipe loads data within minutes after files are added to a stage and submitted for ingestion. This ensures users have the latest results, as soon as the raw data is available.

Compute Resources

Snowpipe uses compute resources provided by Snowflake (i.e. a serverless compute model). These Snowflake-provided resources are automatically resized and scaled up or down as required, and are charged and itemized using per-second billing. Data ingestion is charged based upon the actual workloads.

Simple Transformations During a Load

The COPY statement in a pipe definition supports the same COPY transformation options as when bulk loading data.

In addition, data pipelines can leverage Snowpipe to continously load micro-batches of data into staging tables for transformation and optimization using automated tasks and the change data capture (CDC) information in streams.

Data Pipelines for Complex Transformations

A data pipeline enables applying complex transformations to loaded data. This workflow generally leverages Snowpipe to load “raw” data into a staging table and then uses a series of table streams and tasks to transform and optimize the new data for analysis.

Loading from Data Files Staged on Other Cloud Platforms

Snowflake supports loading data from files staged in any of the following locations, regardless of the cloud platform for your Snowflake account:

  • Internal (i.e. Snowflake) stages
  • Amazon S3
  • Google Cloud Storage
  • Microsoft Azure Blob storage

Note

Some data transfer billing charges may apply when loading data from files staged across different platforms. For more information, see Understanding Snowflake Data Transfer Billing.

Alternatives to Loading Data

It is not always necessary to load data before running queries.

External Tables

External tables enable querying existing data stored in external data storage for analysis without first loading it into Snowflake. This solution is especially beneficial to customers who already have a lot of data stored external but only want to query a portion of the data, for example the most recent data. Users can create materialized views on subsets of this data for improved query performance.