Introduction to Snowpipe

Snowpipe enables loading data from files as soon as they’re available in a stage. This means you can load data from files in micro-batches, making it available to users within minutes, rather than manually executing COPY statements on a schedule to load larger batches.

In this Topic:

How Does Snowpipe Work?

Snowpipe loads data from files as soon as they are available in a stage. The data is loaded according to the COPY statement defined in a referenced pipe.

A pipe is a named, first-class Snowflake object that contains a COPY statement used by Snowpipe. The COPY statement identifies the source location of the data files (i.e., a stage) and a target table. All data types are supported, including semi-structured data types such as JSON and Avro.

Different mechanisms for detecting the staged files are available:

Note

Currently, the options to automate data loads using a cloud messaging service are not available for internal (i.e. Snowflake) or Google Cloud Storage stages. To load data from files stored in either of these locations, make regular calls to the Snowpipe REST endpoints.

Automated Snowpipe Using Amazon SQS

Amazon SQS (Simple Queue Service) notifications for an S3 bucket trigger Snowpipe data loads automatically. This option is suitable if no S3 event notification exists for the target path (or “prefix,” in AWS terminology) in the S3 bucket where your data files are located. AWS rules prohibit creating conflicting notifications for the same path.

The following diagram shows the Snowpipe auto-ingest process flow:

Snowpipe Auto-ingest Process Flow
  1. Data files are copied to a stage.

  2. An S3 event notification informs Snowpipe via an SQS queue that files are ready to load. Snowpipe copies the files into a queue.

  3. A Snowflake-provided virtual warehouse loads data from the queued files into the target table based on parameters defined in the specified pipe.

Automated Snowpipe Using Amazon SQS with SNS

Similar to the previous option, Amazon SQS notifications for an S3 bucket trigger data loads. However, if an S3 event notification already exists for the target path in the S3 bucket where your data files are located, you can configure Amazon Simple Notification Service (SNS) as a broadcaster to share notifications for a given path with multiple endpoints (or “subscribers”), including the Snowflake SQS queue for Snowpipe automation. An S3 event notification published by SNS informs Snowpipe via an SQS queue when files are ready to load.

The following diagram shows the process flow for Snowpipe auto-ingest with AWS SNS:

Snowpipe Auto-ingest Process Flow with AWS SNS
  1. Data files are copied to a stage.

  2. An S3 event notification published by SNS informs Snowpipe via an SQS queue that files are ready to load. Snowpipe copies the files into a queue.

  3. A Snowflake-provided virtual warehouse loads data from the queued files into the target table based on parameters defined in the specified pipe.

Automated Snowpipe Using Microsoft Azure Event Grid

Microsoft Azure Event Grid notifications for an Azure container trigger Snowpipe data loads automatically.

The following diagram shows the Snowpipe auto-ingest process flow:

Snowpipe Auto-ingest Process Flow
  1. Data files are loaded in a stage.

  2. A Blob storage event message informs Snowpipe via Event Grid that files are ready to load. Snowpipe copies the files into a queue.

  3. A Snowflake-provided virtual warehouse loads data from the queued files into the target table based on parameters defined in the specified pipe.

Calling Snowpipe REST Endpoints

Your client application calls a public REST endpoint with a list of data filenames and a referenced pipe name (Java and Python SDKs are provided for your convenience). If new data files matching the list are discovered in the stage, they are queued for loading. Snowflake-provided compute resources load data from the queue into a Snowflake table based on parameters defined in the pipe.

The following diagram shows the Snowpipe process flow:

Snowpipe Process Flow
  1. Data files are copied to an internal (i.e. Snowflake) or external (i.e. AWS S3, Google Cloud Storage, or Microsoft Azure) stage.

  2. A client calls the insertFiles endpoint with a list of files to ingest and a defined pipe.

    The endpoint moves these files to an ingest queue.

  3. A Snowflake-provided virtual warehouse loads data from the queued files into the target table based on parameters defined in the specified pipe.

How Is Snowpipe Different from the COPY Command?

The following table briefly describes the primary differences between Snowpipe and a traditional manual data load workflow using the COPY command. Additional details are provided throughout the Snowpipe documentation.

Important

Snowpipe and the COPY command store different metadata for each file loaded in Snowflake. To avoid reloading files (and duplicating data), we recommend using either Snowpipe or the COPY command, but not both, to load data from a specific set of files.

COPY Command

Snowpipe

Authentication

Security options supported by the client for authenticating and initiating a user session.

When calling the REST endpoints: Keypair-based authentication with JSON Web Token (JWT). JWTs are signed using a public/private key pair with RSA encryption.

Transactions

Adds data to a table in transactions alongside any other SQL statements submitted manually by users.

Adds data to a table in transactions controlled by Snowflake with no opportunity to involve other statements in the transaction.

Load History

Stored in the metadata for the target table. Available upon completion of the COPY statement as statement results.

Stored in the metadata for the pipe. Must be requested from Snowflake (via a REST endpoint, SQL table function, or ACCOUNT_USAGE view).

Compute Resources

Requires a user-specified warehouse to execute COPY statements.

Uses Snowflake-supplied compute resources.

Cost

Billed for the amount of time each virtual warehouse is active.

Billed according to the compute resources used in the Snowpipe warehouse while loading the files.

Load Order of Data Files

For each pipe object, Snowflake establishes a single queue to sequence data files awaiting loading. As new data files are discovered in a stage, Snowpipe appends them to the queue. However, multiple processes pull files from the queue; and so, while Snowpipe generally loads older files first, there is no guarantee that files are loaded in the same order they are staged.

Snowpipe DDL

To support creating and managing pipes, Snowflake provides the following set of special DDL commands:

In addition, providers can view, grant, or revoke access to the necessary database objects for Snowpipe using the following standard access control DDL: