Data Encryption

Protecting customer data is one of Snowflake’s highest priorities. Snowflake encrypts all customer data by default, using the latest security standards, at no additional cost. Snowflake provides best-in-class key management, which is entirely transparent to customers. This makes Snowflake one of the easiest to use and most secure data warehouses.

In this Topic:

End-to-End Encryption

End-to-end encryption (E2EE) is a form of communication in which no one but end users can read the data. In Snowflake, this means that only a customer and the runtime components can read the data. No third parties, including Amazon AWS or any ISP, can see data in the clear.

E2EE minimizes the attack surface. In the event of a security breach of any third party (e.g. Amazon S3), the data is protected because it is always encrypted, regardless of whether the breach exposes access credentials indirectly or data files directly, whether by an internal or external attacker.

E2EE in Snowflake

The figure illustrates the E2EE system in Snowflake. The system includes the following components:

  • The Snowflake customer in a corporate network
  • A data file staging location
  • Snowflake running in a secure virtual private cloud (VPC)

Snowflake supports both internal and external staging locations for data files. Snowflake provides internal staging locations where you can upload and group your data files before loading the data into tables (option B). Customer-provided staging locations are buckets or directories on Amazon S3 that you own and manage (option A). Customer-provided staging locations are an attractive option for customers that already have data stored on S3, which they want to copy into Snowflake. Snowflake supports E2EE for both types of staging location.

Snowflake runs in a single secure VPC on Amazon AWS.

The flow of E2EE in Snowflake is as follows (see the figure in this section):

  1. A user uploads one or more data files to a staging location. If the staging location is a customer-managed S3 bucket (option A), the user may optionally encrypt the data files using client-side encryption (see Client-Side Encryption on S3 for more information). We recommend client-side encryption for data files staged external to Snowflake; but if the data is not encrypted, Snowflake immediately encrypts the data when it is loaded into a table.

    If the staging location is a Snowflake-provided staging location (option B), data files are automatically encrypted when they are staged.

  2. The user loads the data from the staging location into a table. The data is transformed into Snowflake’s proprietary file format and stored on S3 (“data at rest”). In Snowflake, all data at rest is always encrypted.

  3. Query results can be unloaded into a staging location. Results are optionally encrypted using client-side encryption when unloaded into a customer-managed staging location, and are automatically encrypted when unloaded to a Snowflake-provided staging location.

  4. The user downloads data files from the staging location and decrypts the data on the client side.

In all of these steps, all data files are encrypted. Only the user and the Snowflake runtime components can read the data. The runtime components decrypt the data in memory for query processing. No third-party service can see data in the clear.

Client-Side Encryption on S3

Client-side encryption provides a secure system for managing data on S3. Client-side encryption means that a user encrypts data stored in S3 before loading it into Snowflake. S3 only stores the encrypted version of the data and never includes data in the clear.

Uploading data to S3 using client-side encryption

Client-side encryption follows a specific protocol defined by Amazon AWS. The AWS SDK and third-party tools such as s3cmd or the S3 Browser implement this protocol. The S3 client-side encryption protocol works as follows (see the figure in this section):

  1. The Snowflake customer creates a secret master key, which remains with the customer.
  2. The S3 client generates a random encryption key and encrypts the file before uploading it to S3. The random encryption key, in turn, is encrypted with the customer’s master key.
  3. Both the encrypted file and the encrypted random key are uploaded to S3. The encrypted random key is stored with the file’s metadata.

When downloading data, the S3 client downloads both the encrypted file and the encrypted random key. The client decrypts the encrypted random key using the customer’s master key. Next, the client decrypts the encrypted file using the now decrypted random key. All encryption and decryption happens on the client side. At no time does S3 or any other third party (such as an ISP) see the data in the clear. Customers may upload client-side encrypted data using any client or tool that supports client-side encryption.

Ingesting Client-Side Encrypted Data into Snowflake

Snowflake supports the S3 client-side encryption protocol using a client-side master key when reading or writing data between a S3 staging location and Snowflake.

Ingesting client-Side encrypted data into Snowflake

To load client-side encrypted data from a customer-provided staging location, you create a named stage object with an additional MASTER_KEY parameter using CREATE STAGE, and then load data from the staging location into your Snowflake tables. The MASTER_KEY parameter requires a 256-bit Advanced Encryption Standard (AES) key encoded in Base64.

A named stage object stores settings related to a staging location and provides a convenient way to load or unload data between Snowflake and a specific S3 bucket. The following SQL snippet creates an example stage object in Snowflake that supports client-side encryption:

-- create encrypted stage
create stage encrypted_customer_stage
url='s3://customer-bucket/data/'
credentials=(AWS _KEY_ID='ABCDEFGH' AWS_SECRET_KEY='12345678')
encryption=(MASTER_KEY='eSxX0jzYfIamtnBKOEOxq80Au6NbSgPH5r4BDDwOaO8=');

The master key specified in this SQL command is the Base64-encoded string of the customer’s secret master key. As with all other credentials, this master key is transmitted over Transport Layer Security (HTTPS) to Snowflake and is stored encrypted in metadata storage. Only the customer and the query-processing components of Snowflake are exposed to the master key and are therefore able to decrypt data stored in the staging location.

A benefit of named stage objects is that they can be granted to other users within a Snowflake account without revealing S3 access credentials and client-side encryption keys to those users. Users with the appropriate access control privileges simply reference the named stage object when loading or unloading data, without having to provide the S3 credentials, bucket details, or master key.

The following SQL commands create a table named USERS and copies data from the encrypted stage into the USERS table:

-- create table and ingest data from stage
CREATE TABLE users (id bigint, name varchar(500), purchases int);
COPY INTO TABLE FROM @encrypted_customer_stage/users;

The data is now ready to be analyzed using Snowflake.

You can also unload data into the staging location. The following SQL command creates a MOST_PURCHASES table and populates it with the results of a query that finds the top 10 users with the most purchases, and then unloads the table data into the staging location:

-- find top 10 users by purchases, unload into stage
CREATE TABLE most_purchases as select * FROM users ORDER BY purchases desc LIMIT 10;
COPY INTO @encrypted_customer_stage/most_purchases FROM most_purchases;

Snowflake encrypts the data files copied into the customer’s staging location using the master key stored in the stage object. Snowflake adheres to the S3 client-side encryption protocol. A customer can download the encrypted data files using any client or tool that supports client-side encryption.

Encryption Key Management

All Snowflake customer data is encrypted by default using the latest security standards and best practices. Snowflake uses strong AES 256-bit encryption with a hierarchical key model rooted in AWS CloudHSM. Keys are automatically rotated on a regular basis by the Snowflake service, and data can be automatically re-encrypted (“rekeyed”) on a regular basis. Data encryption and key management is entirely transparent and requires no configuration or management.

For additional information about Snowflake encryption, see our security whitepaper.

Hierarchical Key Model

A hierarchical key model provides a framework for Snowflake’s encryption key management. The hierarchy is composed of several layers of keys in which each higher layer of keys (parent keys) encrypts the layer below (child keys). In security terminology, a parent key encrypting all child keys is known as “wrapping”.

Snowflake’s hierarchical key model consists of four levels of keys:

  • The root key
  • Account master keys
  • Table master keys
  • File keys

Each customer account has a separate key hierarchy of account level, table level, and file level keys.

Snowflake’s hierarchical key model

In a multi-tenant cloud service like Snowflake, the hierarchical key model isolates every account with the use of separate account master keys. In addition to the access control model, which separates storage of customer data, the hierarchical key model provides another layer of account isolation.

A hierarchical key model reduces the scope of each layer of keys. For example, a table master key encripts a single table. A file key encrypts a single file. A hierarchical key model constrains the amount of data each key protects and the duration of time for which it is usable.

Encryption Key Rotation

Account and table master keys are automatically rotated on a regular basis by Snowflake. Active keys are retired, and new keys are created. After a specified time period, retired keys are destroyed. When active, a key is used to encrypt data and is available for usage by the originator. When retired, the key is used solely to decrypt data and is only available for usage by the recipient. When wrapping child keys in the key hierarchy, or when inserting data into a table, only the current, active key is used to encrypt data. When a key is destroyed, it is not used for either encryption or decryption. Regular key rotation limits the lifecycle for the keys to a limited period of time.

Key rotation of one table master key (TMK) over a time period of three months.

This figure shows the key rotation for one table master key (TMK):

  • Version 1 of the TMK is active in April. Data inserted into this table in April is protected with TMK v1.
  • In May, this TMK is rotated: TMK v1 is retired and a new, completely random key, TMK v2, is created. TMK v1 is now used only to decrypt data from April. New data inserted into the table is encrypted using TMK v2.
  • In June, the TMK is rotated again: TMK v2 is retired and a new TMK, v3, is created. TMK v1 is used to decrypt data from April, TMK v2 is used to decrypt data from May, and TMK v3 is used to encrypt and decrypt new data inserted into the table in June.

As stated previously, key rotation limits the duration of time in which a key is actively used to encrypt data. In conjunction with the hierarchical key model, key rotation further constrains the amount of data a key version protects. Limiting the lifetime of a key is recommended by the National Institute of Standards and Technology (NIST) to enhance security.

Periodic Rekeying

This section continues the explanation of the account and table master key lifecycle. Encryption Key Rotation described key rotation, which replaces active keys with new keys on a periodic basis and retires the old keys. Periodic data rekeying completes the lifecycle. If periodic rekeying is enabled, when the retired encryption key for a table is older than one year, Snowflake automatically creates a new encryption key and re-encrypts all data previously protected by the retired key using the new key. The new key is used to decrypt the table data going forward.

Note

For Enterprise Edition accounts, users with the ACCOUNTADMIN role (i.e. your account administrators) can enable rekeying using ALTER ACCOUNT and the PERIODIC_DATA_REKEYING parameter:

ALTER ACCOUNT SET PERIODIC_DATA_REKEYING = true;

While key rotation ensures that a key is transferred from its active state (originator usage) to a retired state (recipient usage), rekeying ensures that a key is transferred from its retired state to being destroyed.

Rekeying one table master key (TMK) after one year

In this figure, the table master key (TMK) for a single table is rotated on a monthly basis. The figure shows the key rotations in April, May, and June (TMK v1, v2, and v3):

  • In April of the following year, after TMK v1 has been retired for an entire year, it is rekeyed (generation 2) using a fully new random key. The data files protected by TMK v1 generation 1 are decrypted and re-encrypted using TMK v1 generation 2. Having no further purpose, TMK v1 generation 1 is destroyed.
  • In May, Snowflake performs the same rekeying process on the table data protected by TMK v2.
  • And so on.

In this example, the lifecycle of a key is limited to a total duration of one year.

Rekeying constrains the total duration in which a key is used for recipient usage, following NIST recommendations. Furthermore, when rekeying data, Snowflake can increase encryption key sizes and utilize better encryption algorithms that may be standardized since the previous key generation was created. Rekeying therefore ensures that all customer data, new and old, is encrypted with the latest security technology.

Snowflake rekeys data files online, in the background, without any impact to currently running customer workloads. Data that is being rekeyed is always available to you. No service downtime is necessary to rekey data, and you encounter no performance impact on your workload. This benefit is a direct result of Snowflake’s architecture of separating storage and compute resources.

Impact of Rekeying on Time Travel and Fail-safe

Time Travel and Fail-safe retention periods are not affected by rekeying. Rekeying is transparent to both features. However, some additional storage charges are associated with rekeying of data in Fail-safe (see next section).

Impact of Rekeying on Storage Utilization

Snowflake customers are charged with additional storage for Fail-safe protection of data files that were rekeyed. For these files, 7 days of Fail-safe protection is charged. That is, the data files with the old key on S3 are already protected by Fail-safe, and the data files with the new key on S3 are also added to Fail-safe, leading to a second charge, but only for the 7-day period.

Amazon CloudHSM

Snowflake relies on AWS CloudHSM (online hardware security module) as a tamper-proof, highly secure way to generate, store, and use the root keys of the key hierarchy. Using CloudHSM provides the following security benefits:

  • The top-most keys of the key hierarchy never leave the HSM. All cryptographic operations are performed within the security modules themselves.
  • Wrapped lower-level keys in the key hierarchy cannot be unwrapped without authorized access to the HSM devices.
  • In addition to generating new encryption keys when creating new accounts and tables, CloudHSM generates secure, random encryption keys during key rotation and rekeying.

Snowflake uses CloudHSM’s high-availability configuration with an additional offline backup device to reduce the possibility of service outages and to be safe from losing the most important keys in the hierarchy.

Key hierarchy rooted in Amazon CloudHSM

Tri-Secret Secure and Customer-Managed Keys

Tri-Secret Secure lets you control access to your data using a master encryption key that you maintain in Amazon’s AWS Key Management Service (KMS).

With Tri-Secret Secure enabled for your account, Snowflake combines your key with a Snowflake-maintained key to create a composite master key. This composite master key is then used to encrypt all data in your account. If either key in the composite master key is revoked, your data cannot be decrypted, providing a level of security and control above Snowflake’s standard encryption. This dual-key encryption model, together with Snowflake’s built-in user authentication, enables the three levels of data protection offered by Tri-Secret Secure.

For more details about the implementation of customer-managed keys, see this blog post.

Benefits of Customer-Managed Keys

Benefits of customer-managed keys include:

  • Control over data access: You have complete control over your master key in AWS KMS and, therefore, your data in Snowflake. It is impossible to decrypt data stored in your Snowflake account without you releasing this key.
  • Ability to disable access in the event of a data breach: If you experience a security breach, you can disable access to your key and halt all data operations running in your Snowflake account.
  • Ownership of the data lifecycle: Using customer-managed keys, you can align your data protection requirements with your business processes. Explicit control over your key provides safeguards throughout the entire data lifecycle, from creation to deletion.

Important Requirements for Customer-Managed Keys

Customer-managed keys provide significant security benefits, but they also have crucial, fundamental requirements that you must continuously follow to safeguard your master key:

  • Confidentiality: You must keep your key secure and confidential at all times.
  • Integrity: You must ensure your key is protected against improper modification or deletion.
  • Availability: To execute queries and access your data, you must ensure your key is continuously available to Snowflake.

By design, an invalid or unavailable key will result in a disruption to your Snowflake data operations until a valid key is made available again to Snowflake.

However, Snowflake is designed to handle temporary availability issues (up to 10 minutes) caused by common issues, such as network communication failures. After 10 minutes, if the key remains unavailable, all data operations in your Snowflake account will cease completely. Once access to the key is restored, data operations can be started again.

Failure to comply with these requirements can significantly jeopardize the integrity of your data, ranging from your data being temporarily inaccessible to it being permanently disabled. In addition, Snowflake cannot be responsible for 3rd-party issues that occur or administrative mishaps caused by your organization in the course of maintaining your key.

For example, if an issue with AWS KMS results in your key becoming unavailable, your data operations will be impacted. These issues must be resolved between you and AWS Support. Similarly, if your key is tampered with or destroyed, all existing data in your Snowflake account will become unreadable until the key is restored.

Attention

Before engaging with Snowflake to enable Tri-Secret Secure for your account, you should carefully consider your responsibility for safeguarding your key. If you have any questions or concerns, we are more than happy to discuss them with you.

Note that Snowflake also bears the same responsibility for the keys that we maintain. As with all security-related aspects of our service, we treat this responsibility with the utmost care and vigilance. All of our keys are maintained under strict policies that have enabled us to earn the highest security accreditations, including SOC 2 Type II, PCI-DSS, and HIPAA.

Enabling Tri-Secret Secure

To enable Snowflake Tri-Secret Secure for your Enterprise for Sensitive Data (ESD) account, please contact Snowflake Support.

As a precondition, you must first create a master key in AWS KMS (in your AWS account) and grant usage of the key to Snowflake. For instructions, see the KMS documentation provided by AWS.