Replicating Databases Across Multiple Accounts

This topic describes the steps necessary to replicate databases across multiple Snowflake accounts in different regions and keeping the database objects and stored data synchronized.

In this Topic:

Replicating a Database to Another Account

The instructions in this section explain how to prepare your accounts for replication, promote a local database to serve as a primary database, and perform the initial replication of this primary database to another account.

Step 2: Viewing All Accounts in Your Organization

Retrieve the list of accounts in your organization. Any existing permanent or transient database in these accounts can be modified to serve as a primary database. Replicas of a primary database (i.e. secondary databases) can only be created in these accounts.

Note

Only account administrators (users with the ACCOUNTADMIN role) can execute the SQL statement in this section.

To view the list of accounts in your organization, query SHOW GLOBAL ACCOUNTS.

SHOW GLOBAL ACCOUNTS;

+------------------+---------------------------------+---------------+------------+
| snowflake_region | created_on                      | name          | comment    |
|------------------+---------------------------------+---------------+------------|
| AWS_US_WEST_2    | 2018-11-19 16:11:12.720 -0700   | MYACCOUNT1    |            |
| AWS_US_EAST_1    | 2019-06-02 14:12:23.192 -0700   | MYACCOUNT2    |            |
+------------------+---------------------------------+---------------+------------+

The following table displays the complete list of Snowflake Region IDs:

Snowflake Region IDs

Region

Region ID

Snowflake Region ID

Notes

Amazon Web Services (AWS)

US West (Oregon)

us-west-2

aws_us_west_2

US East (N. Virginia)

us-east-1

aws_us_east_1

Canada (Central)

ca-central-1.aws

aws_ca_central_1

EU (Ireland)

eu-west-1

aws_eu_west_1

EU (Frankfurt)

eu-central-1

aws_eu_central_1

Asia Pacific (Singapore)

ap-southeast-1

aws_ap_southeast_1

Asia Pacific (Sydney)

ap-southeast-2

aws_ap_southeast_2

Microsoft Azure

East US 2

east-us-2.azure

azure_eastus2

Canada Central

canada-central.azure

azure_canadacentral

West Europe

west-europe.azure

azure_westeurope

Australia East

australia-east.azure

azure_australiaeast

Southeast Asia

southeast-asia.azure

azure_southeastasia

Step 3: Promoting a Local Database to Serve as a Primary Database

Modify an existing permanent or transient database to serve as a primary database using an ALTER DATABASE … ENABLE REPLICATION TO ACCOUNTS statement. Provide a comma-separated list of accounts in your organization that can store a replica of this database (i.e. a secondary database), allowing users in those accounts to query objects in the secondary database.

Note

Only account administrators (users with the ACCOUNTADMIN role) can execute the SQL statement in this section.

Example

Promote local database mydb1 (in region aws_us_west_2) to serve as a primary database and specify that accounts myaccount2 and myaccount3 (in regions aws_us_east_1 (AWS) and azure_westeurope (Azure), respectively) can each store a replica of this database:

ALTER DATABASE mydb1 ENABLE REPLICATION TO ACCOUNTS aws_us_east_1.myaccount2, azure_westeurope.myaccount3;

Step 4: Creating a Secondary Database

Create a replica of an existing primary database in the same account that stores the primary database, or a different account (in the same or a different region). Note that you can only create a secondary database in an account specified in the ALTER DATABASE … ENABLE REPLICATION TO ACCOUNTS statement in Step 3: Promoting a Local Database to Serve as a Primary Database.

Execute a CREATE DATABASE … AS REPLICA OF statement in each target account to create a replica of the specified primary database. To help manage your database replication flow, we recommend giving each secondary database the same name as its primary database.

To view the list of primary and secondary databases in your organization, query the SHOW REPLICATION DATABASES function.

Note

Only account administrators (users with the ACCOUNTADMIN role) can execute the SQL statement in this section.

Example

The following example creates a replica of the aws_us_west_2.myaccount1.mydb1 primary database in the aws_us_east_1.myaccount2 account, with automatic refreshing of materialized views in the replica enabled. The SQL statement is executed in the same AWS region group (public) but a different region from the account that stores the primary database:

-- Log into the AWS_US_EAST_1.MYACCOUNT2 account.

-- Query the set of primary and secondary databases in your organization.
-- In this example, the AWS_US_WEST_2.MYACCOUNT1 primary database is available to replicate.
SHOW REPLICATION DATABASES;

+------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------+
| snowflake_region | created_on                    | account_name    | name     | comment | is_primary | primary                                  | snowflake_region | replication_allowed_to_accounts                                  | failover_allowed_to_accounts    |
|------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------|
| AWS_US_WEST_2    | 2019-11-15 00:51:45.473 -0700 | MYACCOUNT1      | MYDB1    | NULL    | true       | PUBLIC.AWS_US_WEST_2.MYACCOUNT1.MYDB1    | AWS_US_WEST_2    | PUBLIC.AWS_US_EAST_1.MYACCOUNT2, PUBLIC.AWS_US_WEST_2.MYACCOUNT1 | PUBLIC.AWS_US_WEST_2.MYACCOUNT1 |
+------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------+

-- Create a replica of the 'mydb1' primary database
CREATE DATABASE mydb1
  AS REPLICA OF aws_us_west_2.myaccount1.mydb1
  AUTO_REFRESH_MATERIALIZED_VIEWS_ON_SECONDARY = TRUE;

-- Verify the secondary database
SHOW REPLICATION DATABASES;

+------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------+
| snowflake_region | created_on                    | account_name    | name     | comment | is_primary | primary                                  | snowflake_region | replication_allowed_to_accounts                                  | failover_allowed_to_accounts    |
|------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------|
| AWS_US_WEST_2    | 2019-11-15 00:51:45.473 -0700 | MYACCOUNT1      | MYDB1    | NULL    | true       | PUBLIC.AWS_US_WEST_2.MYACCOUNT1.MYDB1    | AWS_US_WEST_2    | PUBLIC.AWS_US_EAST_1.MYACCOUNT2, PUBLIC.AWS_US_WEST_2.MYACCOUNT1 | PUBLIC.AWS_US_WEST_2.MYACCOUNT1 |
| AWS_US_EAST_1    | 2019-08-15 15:51:49.094 -0700 | MYACCOUNT2      | MYDB1    | NULL    | false      | PUBLIC.AWS_US_WEST_2.MYACCOUNT1.MYDB1    | AWS_US_EAST_1    |                                                                  |                                 |
+------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------+

Refreshing Each Secondary Database

The instructions in this sections explain how refresh a secondary database from a snapshot of its primary database (using ALTER DATABASE … REFRESH). A snapshot includes changes to the objects and data.

Note that the owner of the secondary database (role with the OWNERSHIP privilege on the database) owns any new objects added as a result of a database refresh.

Note

To refresh a secondary database, the role used to perform the operation must have the OWNERSHIP privilege on the database.

To verify the current region after you log into an account, query the CURRENT_REGION function.

ALTER DATABASE mydb1 REFRESH​;

Increasing the Statement Timeout for the Initial Replication

Database replication uses Snowflake-provided compute resources instead of your own virtual warehouse to copy objects and data. However, the STATEMENT_TIMEOUT_IN_SECONDS session/object parameter still controls how long a statement runs before it is canceled. The default value is 172800 (2 days). Because the initial replication of a very large primary database can take longer than 2 days to complete (depending on the amount of metadata in the database as well as the amount of data in database objects), we recommend increasing the STATEMENT_TIMEOUT_IN_SECONDS value to 604800 (7 days, the maximum value) for the session in which you run the replication operation.

Run the following ALTER SESSION statement prior to executing the ALTER DATABASE secondary_db_name REFRESH statement in the same session:

ALTER SESSION SET STATEMENT_TIMEOUT_IN_SECONDS = 604800;

Note that the STATEMENT_TIMEOUT_IN_SECONDS parameter also applies to the active warehouse in a session. The parameter honors the lower value set at the session or warehouse level. If you have an active warehouse in the current session, set STATEMENT_TIMEOUT_IN_SECONDS to 604800 for this warehouse (using ALTER WAREHOUSE), too.

For example:

-- determine the active warehouse in the current session (if any)
SELECT CURRENT_WAREHOUSE();

+---------------------+
| CURRENT_WAREHOUSE() |
|---------------------|
| MY_WH               |
+---------------------+

-- change the STATEMENT_TIMEOUT_IN_SECONDS value for the active warehouse

ALTER WAREHOUSE my_wh SET STATEMENT_TIMEOUT_IN_SECONDS = 604800;

You can reset the parameter value to the default after the replication operation is completed:

ALTER WAREHOUSE my_wh UNSET STATEMENT_TIMEOUT_IN_SECONDS;

Refreshing a Secondary Database on a Schedule

As a best practice, we recommend scheduling your secondary database refreshes. This section provides instructions for starting a database refresh automatically on a specified schedule.

The steps involve creating the following objects:

  • Stored procedure that starts the database refresh using caller’s rights.

  • Task that calls the stored procedure on a schedule.

Note

We recommend that you execute the initial replication of a primary database manually (using ALTER DATABASE secondary_db_name REFRESH​), and only schedule subsequent refreshes.

Complete the steps in this section to start a database refresh automatically on a specified schedule.

The steps involve creating the following objects:

  • Stored procedure that starts the database refresh using caller’s rights.

  • Task that calls the stored procedure on a schedule.

Prerequisites

The following Snowflake objects are required in the account that stores the secondary database:

  • The secondary database.

  • A separate database to store the new objects created in this section. Because secondary databases are ready-only, this database must be separate from the secondary database. This database must also include the following objects:

    • Schema. Use the PUBLIC schema, or create a new schema using CREATE SCHEMA.

    • Warehouse to provide compute resources for the database refresh. Create a new warehouse using CREATE WAREHOUSE.

Required privileges

The steps in this section require a role with the following privileges in the account in which the stored procedure and task are created:

Object Type

Object

Privilege

Notes

Account

Account that stores the secondary database

EXECUTE TASK

Required to run the new task.

Database

Secondary database

OWNERSHIP

Required to refresh the secondary database.

Database

Database that stores the new procedure and task.

USAGE

Schema

Schema that stores the new procedure and new task

USAGE, CREATE PROCEDURE, CREATE TASK

Procedure

OWNERSHIP

The role that creates the stored procedure owns the object by default. Ownership can be transferred to a different role using GRANT <privileges> … TO ROLE.

Task

OWNERSHIP

The role that creates the task owns the object by default. Ownership can be transferred to a different role using GRANT privileges … TO ROLE.

Warehouse

Warehouse used to run the task

USAGE

Steps

Complete the following steps for each secondary database you want to refresh on a schedule:

  1. Create a task that starts the database refresh on a schedule (using CREATE TASK).

    Syntax
    CREATE [ OR REPLACE ] TASK [ IF NOT EXISTS ] <name>
      WAREHOUSE = <string>
      SCHEDULE = { <number> MINUTE | USING CRON <expr> <time_zone> } | AFTER <string>
    AS
      ALTER DATABASE <secondary_db_name> REFRESH;
    

    For example, create a task named refresh_mydb1_task that refreshes a secondary database named mydb1 every 10 minutes. The task runs using the existing warehouse mywh:

    CREATE TASK refresh_mydb1_task
      WAREHOUSE = mywh
      SCHEDULE = '10 minute'
    AS
      ALTER DATABASE mydb1 REFRESH;
    
  2. A task is suspended by default when it is created. Resume the task to allow it to run based on the parameters specified in the task definition:

ALTER TASK refresh_mydb1_task RESUME;

Monitoring the Progress of a Database Refresh

To determine the current status of the initial database replication or a subsequent secondary database refresh, query the DATABASE_REFRESH_PROGRESS , DATABASE_REFRESH_PROGRESS_BY_JOB table function (in the Information Schema).

A database refresh operation can require several hours or longer to complete depending on the amount of data to replicate.

To view the replication history for a specified database within a specified date range, query the REPLICATION_USAGE_HISTORY table function.

Example

Monitor the progress of the mydb1 secondary database refresh:

select *
  from table(information_schema.database_refresh_progress(mydb1));

Viewing the Database Refresh History

To view the history of secondary database refresh operations, query the DATABASE_REFRESH_HISTORY table function (in the Information Schema). This function returns database refresh activity within the last 14 days.

Example

View the history of the mydb1 secondary database refresh operation:

select *
  from table(information_schema.database_refresh_history(mydb1));

Comparing Data Sets in Primary and Secondary Databases

Optionally use the HASH_AGG function to compare the rows in a random set of tables in a primary and secondary database to verify data consistency. The HASH_AGG function returns an aggregate signed 64-bit hash value over the (unordered) set of input rows. Query this function on all or a random subset of tables in a secondary database and on the primary database (as of the timestamp for the primary database snapshot) and compare the output.

Example

  1. On the secondary database, query the DATABASE_REFRESH_HISTORY table function (in the Information Schema). Note the END_TIME column for the row indicating when the latest refresh operation completed. This is the timestamp for the latest snapshot of the primary database.

  2. Query the HASH_AGG function for a specified table. The following query returns a hash value for all rows in the mytable table:

    SELECT HASH_AGG( * ) FROM mytable;
    
  3. On the primary database, query the HASH_AGG function for the same table. Using Time Travel, specify the timestamp when the latest snapshot was taken for the secondary database:

    SELECT HASH_AGG( * ) FROM mytable AT(TIMESTAMP => <primary_snapshot_timestamp>);
    
  4. Compare the results from the two queries. The output should be identical.