Configuring an Azure Container for Loading Data

This topic describes how to configure secure access to data files stored in a Microsoft Azure container.

The following options are supported:

Option 1

Configure a storage integration object to delegate authentication responsibility for external cloud storage to an Azure service principal. A service principal is an identity created for use with services such as Snowflake to access Azure resources.

Note

We highly recommend this option, which avoids the need to supply a SAS token when creating stages or loading data.

Option 2

Generate a shared access signature (SAS) token token to grant Snowflake limited access to objects in your storage account. You can then access an external (Azure) stage that references the container using the SAS token.

Note

Completing the instructions in this topic requires administrative access to Azure. If you are not an Azure administrator, ask your Azure administrator to perform these tasks.

Important

A Microsoft Azure administrator in your organization can limit access to your Azure storage account (i.e. your containers and the objects in those containers) to Snowflake. This security restriction grants access to your storage account to traffic from your Snowflake virtual network (VNet) subnet while blocking requests that originate from outside the VNet subnet. The process involves whitelisting the Snowflake VNet subnet IDs for your account.

This security feature currently requires that your storage account is located in the same Azure region as your Snowflake account.

To whitelist the Snowflake VNet subnet IDs:

  1. Contact Snowflake Support to obtain a pair of Snowflake VNet subnet IDs for the Azure region in which your account is deployed: one each for Snowflake services and virtual warehouses.

  2. Log into the Azure CLI.

  3. Execute the following command to whitelist each of the provided Snowflake VNet subnet IDs to access your storage account:

    $ az storage account network-rule add --account-name <account_name> --resource-group myRG --subnet "<snowflake_vnet_subnet_id>"
    

    Where:

    • account_name is the name of the Azure storage account you are granting access to Snowflake.

    • snowflake_vnet_subnet_id is one of the VNet subnet IDs provided by Snowflake Support.

    For example:

    $ az storage account network-rule add --account-name my_storage_account --resource-group myRG --subnet "/subscriptions/abcd1234-0123-456e-78f9-1a2bcde3ef4g5/resourceGroups/otherRG/providers/Microsoft.Network/virtualNetworks/otherVNET/subnets/default"
    

    Note

    The Azure client may return an error similar to the following:

    Unable retrieve endpoint status for one or more subnets. Status 'insufficent permissions' indicates lack of subnet read permissions ('Microsoft.Network/virtualNetworks/subnets/read').
    

    The error indicates that your Azure storage account may not initiate connections to Snowflake because those permissions are not granted. You can ignore this error. It will not block the whitelist feature.

For additional options for managing your virtual network rules, including using PowerShell or the Azure portal, see the Azure documentation.

For additional help regarding this configuration process or any of the other Azure configuration steps, please contact the Azure administrator for your organization.

In this Topic:

Option 1: Configuring a Snowflake Storage Integration

This topic describes how to use storage integrations to allow Snowflake to read data from and write data to an Azure container referenced in an external (Azure) stage. Integrations are named, first-class Snowflake objects that avoid the need for passing explicit cloud provider credentials such as secret keys or access tokens. Integration objects store an Azure identity and access management (IAM) user ID called the app registration. An administrator in your organization grants this app the necessary permissions in the Azure account.

An integration must also specify containers (and optional paths) that limit the locations users can specify when creating external stages that use the integration.

Note

Completing the instructions in this topic requires permissions in Azure to manage storage accounts. If you are not an Azure administrator, ask your Azure administrator to perform these tasks.

In this Section:

Step 1: Create a Cloud Storage Integration in Snowflake

Create a storage integration using the CREATE STORAGE INTEGRATION command. A storage integration is a Snowflake object that stores a generated service principal for your Azure cloud storage, along with an optional set of allowed or blocked storage locations (i.e. containers). Cloud provider administrators in your organization grant permissions on the storage locations to the generated service principal. This option allows users to avoid supplying credentials when creating stages or loading data.

Note

Only account administrators (users with the ACCOUNTADMIN role) or a role with the global CREATE INTEGRATION privilege can execute this SQL command.

CREATE STORAGE INTEGRATION <integration_name>
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = AZURE
  ENABLED = TRUE
  AZURE_TENANT_ID = '<tenant_id>'
  STORAGE_ALLOWED_LOCATIONS = ('azure://<account>.blob.core.windows.net/<container>/<path>/', 'azure://<account>.blob.core.windows.net/<container>/<path>/')
  [ STORAGE_BLOCKED_LOCATIONS = ('azure://<account>.blob.core.windows.net/<container>/<path>/', 'azure://<account>.blob.core.windows.net/<container>/<path>/') ]

Where:

  • integration_name is the name of the new integration.

  • tenant_id is the ID for your Office 365 tenant that the allowed and blocked storage accounts belong to. A storage integration can authenticate to only one tenant, and so the allowed and blocked storage locations must refer to storage accounts that all belong this tenant.

    To find your tenant ID, log into the Azure portal and click Azure Active Directory » Properties. The tenant ID is displayed in the Directory ID field.

  • container is the name of a Azure container that stores your data files (e.g. mycontainer). The STORAGE_ALLOWED_LOCATIONS and STORAGE_BLOCKED_LOCATIONS parameters allow or block access to these containers, respectively, when stages that reference this integration are created or modified.

  • path is an optional path that can be used to provide granular control over logical directories in the container.

The following example creates an integration that explicitly limits external stages that use the integration to reference either of two containers and paths. In a later step, we will create an external stage that references one of these containers and paths. Multiple external stages that use this integration can reference the allowed containers and paths:

CREATE STORAGE INTEGRATION azure_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = AZURE
  ENABLED = TRUE
  AZURE_TENANT_ID = 'a123b4c5-1234-123a-a12b-1a23b45678c9'
  STORAGE_ALLOWED_LOCATIONS = ('azure://myaccount.blob.core.windows.net/mycontainer1/mypath1/', 'azure://myaccount.blob.core.windows.net/mycontainer2/mypath2/')
  STORAGE_BLOCKED_LOCATIONS = ('azure://myaccount.blob.core.windows.net/mycontainer1/mypath1/sensitivedata/', 'azure://myaccount.blob.core.windows.net/mycontainer2/mypath2/sensitivedata/');

Step 2: Grant Snowflake Access to the Storage Locations

  1. Execute the DESCRIBE INTEGRATION command to retrieve the consent URL:

    DESC STORAGE INTEGRATION <integration_name>;
    

    Where:

Note the URL in the AZURE_CONSENT_URL column, which has the following format:

https://login.microsoftonline.com/<tenant_id>/oauth2/authorize?client_id=<snowflake_application_id>

Also note the value in the AZURE_MULTI_TENANT_APP_NAME column. This is the name of the Snowflake client application created for your account. Later in these instructions, you will need to grant this application the permissions necessary to obtain an access token on your allowed storage locations.

  1. In a web browser, navigate to the URL in the AZURE_CONSENT_URL URL column. The page displays a Microsoft permissions request page.

  2. Click the Accept button. This allows the Azure service principal created for your Snowflake account to obtain an access token on any resource inside your tenant. Obtaining an access token succeeds only if you grant the service principal the appropriate permissions on the container (see the next step).

  3. Log into the Microsoft Azure portal.

  4. Navigate to Azure Services » Storage Accounts. Click on the name of the storage account you are granting the Snowflake service principal access to.

  5. Click Access Control (IAM) » Add role assignment.

  6. Select the desired role to grant to the Snowflake service principal:

    • Storage Blob Data Reader grants read access only. This allows loading data from files staged in the storage account.

    • Storage Blob Data Contributor grants read and write access. This allows loading data from or unloading data to files staged in the storage account.

  7. Search for the Snowflake service principal. This is the identity in the AZURE_MULTI_TENANT_APP_NAME property in the DESC STORAGE INTEGRATION output (in Step 1).

  8. Click the Save button.

Step 3: Create an External Stage

Create an external (Azure) stage that references the storage integration you created in Step 1: Create a Cloud Storage Integration in Snowflake (in this topic).

Note

Creating a stage that uses a storage integration requires a role that has the CREATE STAGE privilege for the schema as well as the USAGE privilege on the integration. For example:

GRANT CREATE STAGE ON SCHEMA public TO ROLE myrole;

GRANT USAGE ON INTEGRATION azure_int TO ROLE myrole;

Create the stage using the CREATE STAGE command.

For example, set mydb.public as the current database and schema for the user session, and then create a stage named my_azure_stage. In this example, the stage references the Azure container and path mycontainer1/path1, which are supported by the integration. The stage also references a named file format object called my_csv_format:

USE SCHEMA mydb.public;

CREATE STAGE my_azure_stage
  STORAGE_INTEGRATION = azure_int
  URL = 'azure://container1/path1'
  FILE_FORMAT = my_csv_format;

Note

  • To load or unload data from or to a stage that uses an integration, a role must have the USAGE privilege on the stage. It is not necessary to have the USAGE privilege on the integration.

  • The STORAGE_INTEGRATION parameter is handled separately from other stage parameters, such as FILE_FORMAT. Support for these other parameters is the same regardless of the integration used to access your Azure container.

  • By specifying a named file format object (or individual file format options) for the stage, it is not necessary to later specify the same file format options in the COPY command used to load data from the stage. For more information about file format objects and options, see CREATE FILE FORMAT.

Option 2: Generating a SAS Token

Step 1: Generate the SAS Token

The following step-by-step instructions describe how to generate an SAS token to grant Snowflake limited access to objects in your storage account:

  1. Log into the Azure portal.

  2. From the home dashboard, choose Storage Accounts » <storage_account> » Settings » Shared access signature.

    Shared access signature in Azure portal
  3. Select the following Allowed resource types:

    • Container (required to list objects in the storage account)

    • Object (required to read/write objects from/to the storage account)

  4. Select the following allowed permissions to load data files from Azure resources:

    • Read

    • List

    The additional Write, Add, and Create permissions are also required if you plan to unload files to a container.

  5. Specify start and expiry dates/times for the SAS token. As part of a general security plan, you could generate a different SAS token periodically.

  6. Leave the Allowed IP addresses field blank, and specify either HTTPS only or HTTPS and HTTP under Allowed protocols.

  7. Click the Generate SAS button. Record the full value in the SAS token field, starting with and including the ?. This is your SAS token. You will specify this token when you create an external stage.

Step 2: Create an External Stage

Create an external (Azure) stage that references the SAS token you generated in Step 1: Generate the SAS Token (in this topic).

The following example uses SQL to create an external stage named my_azure_stage that includes Azure credentials and a master encryption key. The stage URL references the Azure myaccount account. The data files are stored in the mycontainer container and /load/files path. The stage references a named file format object called my_csv_format:

CREATE OR REPLACE STAGE my_azure_stage
  URL='azure://myaccount.blob.core.windows.net/mycontainer/load/files'
  CREDENTIALS=(AZURE_SAS_TOKEN='?sv=2016-05-31&ss=b&srt=sco&sp=rwdl&se=2018-06-27T10:05:50Z&st=2017-06-27T02:05:50Z&spr=https,http&sig=bgqQwoXwxzuD2GJfagRg7VOS8hzNr3QLT7rhS8OFRLQ%3D')
  ENCRYPTION=(TYPE='AZURE_CSE' MASTER_KEY = 'kPxX0jzYfIamtnJEUTHwq80Au6NbSgPH5r4BDDwOaO8=')
  FILE_FORMAT = my_csv_format;

Note that the AZURE_SAS_TOKEN and MASTER_KEY values used in this example are for illustration purposes only.

Note

By specifying a named file format object (or individual file format options) for the stage, it is not necessary to later specify the same file format options in the COPY command used to load data from the stage. For more information about file format objects and options, see CREATE FILE FORMAT.

Data File Encryption

Enable Azure Storage Service Encryption (SSE) for Data at Rest on your storage account directly, and Snowflake will handle it correctly. For more information, see the Azure documentation on SSE.

In addition, Snowflake supports client-side encryption to decrypt files staged in Azure containers.

  • Client-side encryption:

    • AZURE_CSE: Requires a MASTER_KEY value.

      Note

      Block blobs and append blobs support client-side encryption but page blobs do not.

Next: Creating an Azure Stage