Troubleshooting Snowpipe

This topic describes a methodical approach to troubleshooting issues with loading data using Snowpipe.

In this Topic:

The steps to troubleshoot issues with Snowpipe differ depending on the worklow used to load data files.

Automatically Loading Data Using Cloud Storage Event Notifications

Step 1: Check the Pipe Status

Retrieve the current status of the pipe. The results are displayed in JSON format. For information, see SYSTEM$PIPE_STATUS.

Check the following values:

lastReceivedMessageTimestamp

Specifies the timestamp of the last event message received from the message queue. Note that this message might not apply to the specific pipe, e.g., if the path associated with the message does not match the path in the pipe definition. In addition, only messages triggered by created data objects are consumed by auto-ingest pipes.

If the timestamp is earlier than expected, this likely indicates an issue with either the service configuration (i.e. AWS AWS SQS or SNS, or Azure Event Grid) or the service itself. If the field is empty, verify your service configuration settings. If field contains a timestamp but it is earlier than expected, verify whether any settings were changed in your service configuration.

lastForwardedMessageTimestamp

Specifies the timestamp of the last “create object” event message with a matching path that was forwarded to the pipe.

If event messages are getting received from the message queue but are not forwarded to the pipe, then there is likely a mismatch between the Blob storage path where the new data files are created and the combined path specified in the Snowflake stage and pipe definitions. Verify any paths specified in the stage and pipe definitions. Note that a path specified in the pipe definition is appended to any path in the stage definition.

Step 2. View the COPY History for the Table

If event messages are getting received and forwarded, then query the load activity history for the target table. For information, see COPY_HISTORY.

The STATUS column indicates whether a particular set of files was loaded, partially loaded, or failed to load. The FIRST_ERROR_MESSAGE column provides a reason when an attempt partially loaded or failed.

Note that if a set of files has multiple issues, the FIRST_ERROR_MESSAGE column only indicates the first error encountered. To view all errors in the files, execute a COPY INTO <table> statement with the VALIDATION_MODE copy option set to RETURN_ALL_ERRORS. The VALIDATION_MODE copy option instructs a COPY statement to validate the data to be loaded and return results based on the validation option specified. No data is loaded when this copy option is specified. In the statement, reference the set of files you had attempted to load using Snowpipe. For more information about the copy option, see COPY INTO <table>.

If the COPY_HISTORY output does not include a set of expected files, query an earlier time period. If the files were duplicates of earlier files, the load history might have recorded the activity when the attempt to load the original files was made.

Calling Snowpipe REST Endpoints to Load Data

Step 1: Checking Authentication Issues

The Snowpipe REST endpoints use keypair-based authentication with JSON Web Token (JWT).

The Python/Java ingest SDKs generate the JWT for you. When calling the REST API directly, you need to generate them. If no JWT token is provided in the request, error 400 is returned by the REST endpoint. If an invalid token is provided, an error similar to the following is returned:

snowflake.ingest.error.IngestResponseError: Http Error: 401, Vender Code: 390144, Message: JWT token is invalid.

Step 2. Viewing the COPY History for the Table

Query the load activity history for a table, including any attempted data loads using Snowpipe. For information, see COPY_HISTORY. The STATUS column indicates whether a particular set of files was loaded, partially loaded, or failed to load. The FIRST_ERROR_MESSAGE column provides a reason when an attempt partially loaded or failed.

Note that if a set of files has multiple issues, the FIRST_ERROR_MESSAGE column only indicates the first error encountered. To view all errors in the files, execute a COPY INTO <table> statement with the VALIDATION_MODE copy option set to RETURN_ALL_ERRORS. The VALIDATION_MODE copy option instructs a COPY statement to validate the data to be loaded and return results based on the validation option specified. No data is loaded when this copy option is specified. In the statement, reference the set of files you had attempted to load using Snowpipe. For more information about the copy option, see COPY INTO <table>.

Step 3: Checking the Pipe Status

If the COPY_HISTORY table function returns 0 results for the data load you are investigating, retrieve the current state of the pipe. The results are displayed in JSON format. For information, see SYSTEM$PIPE_STATUS.

The executionState key identifies the execution state of the pipe. For example, PAUSED indicates the pipe is currently paused. The pipe owner could resume running the pipe using ALTER PIPE.

If the executionState value indicates an issue with starting the pipe, check the error key for more information.

Other Issues

Unloaded Set of Files

If the COPY_HISTORY function output indicates a subset of files was not loaded, you may try to “refresh” the pipe.

This situation can arise in any of the following situations:

  • The external stage was previously used to bulk load data using the COPY INTO table command.

  • REST API:

    • External event-driven functionality is used to call the REST APIs, and a backlog of data files already existed in the external stage before the events were configured.

  • Auto-ingest:

    • A backlog of data files already existed in the external stage before event notifications were configured.

    • An event notification failure prevented a set of files from getting queued.

To load the data files in your external stage using the configured pipe, execute a ALTER PIPE … REFRESH statement.

Difficulty Reloading Data Into a Table

Snowflake uses file loading metadata to prevent reloading the same files (and duplicating data) in a table. Snowpipe prevents loading files with the same name even if they were later modified (i.e. have a different eTag).

The file loading metadata is associated with the pipe object rather than the table. As a result:

  • Staged files with the same name as files that were already loaded are ignored, even if they have been modified, e.g. if new rows were added or errors in the file were corrected.

  • Truncating the table using the TRUNCATE TABLE command does not delete the Snowpipe file loading metadata.

To reload the same data files, it is currently necessary to recreate the pipe object using the CREATE OR REPLACE PIPE syntax.

The following example recreates the mypipe pipe based on the example in Step 1 of Preparing to Load Data Using the Snowpipe REST API:

create or replace pipe mypipe as copy into mytable from @mystage;

Duplicate Records Loaded into Table

The load histories for the COPY INTO <table> command and Snowpipe are stored separately in Snowflake. The load history for the COPY command is stored in the metadata for the target table, while the load history for Snowpipe is stored in the metadata for the pipe object.

Do not execute COPY statements for the same bucket/path and target table as you have configured for Snowpipe. Doing so can result in duplicate data loaded into the target table. If you need to manually load staged data into the table using the pipe configuration, execute an ALTER PIPE … REFRESH statement. See Unloaded Set of Files in this topic for more information.