Troubleshooting Snowpipe

This topic describes a methodical approach to troubleshooting issues with loading data using Snowpipe.

In this Topic:

The steps to troubleshoot issues with Snowpipe differ depending on the worklow used to load data files.

Automatically Loading Data Using Cloud Storage Event Notifications

Step 1: Check the Pipe Status

Retrieve the current status of the pipe. The results are displayed in JSON format. For information, see SYSTEM$PIPE_STATUS.

Check the following values:

lastReceivedMessageTimestamp

Specifies the timestamp of the last event message received from the message queue. Note that this message might not apply to the specific pipe, e.g., if the path associated with the message does not match the path in the pipe definition. In addition, only messages triggered by created data objects are consumed by auto-ingest pipes.

If the timestamp is earlier than expected, this likely indicates an issue with either the service configuration (i.e. Amazon SQS or Amazon SNS, or Azure Event Grid) or the service itself. If the field is empty, verify your service configuration settings. If field contains a timestamp but it is earlier than expected, verify whether any settings were changed in your service configuration.

lastForwardedMessageTimestamp

Specifies the timestamp of the last “create object” event message with a matching path that was forwarded to the pipe.

If event messages are getting received from the message queue but are not forwarded to the pipe, then there is likely a mismatch between the Blob storage path where the new data files are created and the combined path specified in the Snowflake stage and pipe definitions. Verify any paths specified in the stage and pipe definitions. Note that a path specified in the pipe definition is appended to any path in the stage definition.

Step 2. View the COPY History for the Table

If event messages are getting received and forwarded, then query the load activity history for the target table. For information, see COPY_HISTORY.

The STATUS column indicates whether a particular set of files was loaded, partially loaded, or failed to load. The FIRST_ERROR_MESSAGE column provides a reason when an attempt partially loaded or failed.

Note that if a set of files has multiple issues, the FIRST_ERROR_MESSAGE column only indicates the first error encountered. To view all errors in the files, execute a COPY INTO <table> statement with the VALIDATION_MODE copy option set to RETURN_ALL_ERRORS. The VALIDATION_MODE copy option instructs a COPY statement to validate the data to be loaded and return results based on the validation option specified. No data is loaded when this copy option is specified. In the statement, reference the set of files you had attempted to load using Snowpipe. For more information about the copy option, see COPY INTO <table>.

If the COPY_HISTORY output does not include a set of expected files, query an earlier time period. If the files were duplicates of earlier files, the load history might have recorded the activity when the attempt to load the original files was made.

Step 3: Validate the Data Files

If the load operation encounters errors in the data files, the COPY_HISTORY table function describes the first error encountered in each file. To validate the data files, query the VALIDATE_PIPE_LOAD function.

Calling Snowpipe REST Endpoints to Load Data

Step 1: Checking Authentication Issues

The Snowpipe REST endpoints use key pair authentication with JSON Web Token (JWT).

The Python/Java ingest SDKs generate the JWT for you. When calling the REST API directly, you need to generate them. If no JWT token is provided in the request, error 400 is returned by the REST endpoint. If an invalid token is provided, an error similar to the following is returned:

snowflake.ingest.error.IngestResponseError: Http Error: 401, Vender Code: 390144, Message: JWT token is invalid.

Step 2. Viewing the COPY History for the Table

Query the load activity history for a table, including any attempted data loads using Snowpipe. For information, see COPY_HISTORY. The STATUS column indicates whether a particular set of files was loaded, partially loaded, or failed to load. The FIRST_ERROR_MESSAGE column provides a reason when an attempt partially loaded or failed.

Note that if a set of files has multiple issues, the FIRST_ERROR_MESSAGE column only indicates the first error encountered. To view all errors in the files, execute a COPY INTO <table> statement with the VALIDATION_MODE copy option set to RETURN_ALL_ERRORS. The VALIDATION_MODE copy option instructs a COPY statement to validate the data to be loaded and return results based on the validation option specified. No data is loaded when this copy option is specified. In the statement, reference the set of files you had attempted to load using Snowpipe. For more information about the copy option, see COPY INTO <table>.

Step 3: Checking the Pipe Status

If the COPY_HISTORY table function returns 0 results for the data load you are investigating, retrieve the current state of the pipe. The results are displayed in JSON format. For information, see SYSTEM$PIPE_STATUS.

The executionState key identifies the execution state of the pipe. For example, PAUSED indicates the pipe is currently paused. The pipe owner could resume running the pipe using ALTER PIPE.

If the executionState value indicates an issue with starting the pipe, check the error key for more information.

Step 4: Validate the Data Files

If the load operation encounters errors in the data files, the COPY_HISTORY table function describes the first error encountered in each file. To validate the data files, query the VALIDATE_PIPE_LOAD function.

Other Issues

Unloaded Set of Files

If the COPY_HISTORY function output indicates a subset of files was not loaded, you may try to “refresh” the pipe.

This situation can arise in any of the following situations:

  • The external stage was previously used to bulk load data using the COPY INTO table command.

  • REST API:

    • External event-driven functionality is used to call the REST APIs, and a backlog of data files already existed in the external stage before the events were configured.

  • Auto-ingest:

    • A backlog of data files already existed in the external stage before event notifications were configured.

    • An event notification failure prevented a set of files from getting queued.

To load the data files in your external stage using the configured pipe, execute a ALTER PIPE … REFRESH statement.

Unable to Reload Modified Data, Modified Data Loaded Unintentionally

Snowflake uses file loading metadata to prevent reloading the same files (and duplicating data) in a table. Snowpipe prevents loading files with the same name even if they were later modified (i.e. have a different eTag).

The file loading metadata is associated with the pipe object rather than the table. As a result:

  • Staged files with the same name as files that were already loaded are ignored, even if they have been modified, e.g. if new rows were added or errors in the file were corrected.

  • Truncating the table using the TRUNCATE TABLE command does not delete the Snowpipe file loading metadata.

However, note that pipes only maintain the load history metadata for 14 days. Therefore:

Files modified and staged again within 14 days

Snowpipe ignores modified files that are staged again. To reload modified data files, it is currently necessary to recreate the pipe object using the CREATE OR REPLACE PIPE syntax.

The following example recreates the mypipe pipe based on the example in Step 1 of Preparing to Load Data Using the Snowpipe REST API:

create or replace pipe mypipe as copy into mytable from @mystage;
Files modified and staged again after 14 days

Snowpipe loads the data again, potentially resulting in duplicate records in the target table.

In addition, duplicate records can be loaded into the target table if COPY INTO <table> statements are executed that reference the same bucket/container, path, and target table as in your active Snowpipe loads. The load histories for the COPY command and Snowpipe are stored separately in Snowflake. After you have loaded any historic staged data, if you need to load data manually using the pipe configuration, execute an ALTER PIPE … REFRESH statement. See Unloaded Set of Files in this topic for more information.

Error: Integration {0} associated with the stage {1} cannot be found

Loading data from an external stage may produce an error similar to the following:

003139=SQL compilation error:\nIntegration ''{0}'' associated with the stage ''{1}'' cannot be found.

This error can occur when the association between the external stage and the storage integration linked to the stage has been broken. This happens when the storage integration object has been recreated (using CREATE OR REPLACE STORAGE INTEGRATION). A stage links to a storage integration using a hidden ID rather than the name of the storage integration. Behind the scenes, the CREATE OR REPLACE syntax drops the object and recreates it with a different hidden ID.

If you must recreate a storage integration after it has been linked to one or more stages, you must reestablish the association between each stage and the storage integration by executing ALTER STAGE stage_name SET STORAGE_INTEGRATION = storage_integration_name, where:

  • stage_name is the name of the stage.

  • storage_integration_name is the name of the storage integration.