Automatic Clustering

As DML operations (INSERT, UPDATE, DELETE, MERGE, COPY) are performed on a clustered table, the data in the table may become less clustered. Periodic/regular reclustering of the table is required to maintain optimal clustering.

Automatic Clustering, the Snowflake reclustering service, manages all reclustering of clustered tables (i.e. tables with clustering keys explicitly defined).

With Automatic Clustering enabled for your account:

  • All clustered tables created or defined after the feature is enabled are seamlessly and continually reclustered, as needed.
  • Clustered tables that already existed when the feature was enabled can be included (i.e. “opted in”) by explicitly enabling Automatic Clustering for these tables.

Note

After Automatic Clustering is enabled (for an account or individual table), reclustering does not start immediately:

  • For any existing clustered tables, Automatic Clustering must be explicitly enabled for the tables before any reclustering activity occurs.
  • Regardless of whether a table is new or existing, Snowflake only reclusters the table if it will benefit from the operation.

In this Topic:

Benefits of Automatic Clustering

Ease-of-maintenance

With Automatic Clustering, you do not need to perform the following maintenance tasks:

  • Monitoring the state of clustered tables and then manually running ALTER TABLE … RECLUSTER statements to ensure the tables maintain optimal clustering.

    Instead, as DML is performed on these tables, Snowflake monitors and evaluates the tables to determine whether they would benefit from reclustering, and automatically reclusters them, as needed.

  • Designating warehouses in your account to use for reclustering.

    Snowflake performs automatic reclustering in the background, and you do not need to specify a warehouse to use.

All you need to do is define clustering keys for your tables (as desired/appropriate) and Snowflake manages all future maintenance.

Full Control

Automatic Clustering can be enabled for each individual table:

  • You can choose whether to enable Automatic Clustering for any tables for which it is not currently enabled.

  • You can suspend and resume automatic clustering for a particular table at any time using ALTER TABLE … SUSPEND / RESUME RECLUSTER.

    While Automatic Clustering is suspended for a table, the table is never automatically reclustered, regardless of its clustering state and, therefore, does not incur any related credit charges.

  • If desired, you can manually recluster a table at any time, regardless of whether the table has Automatic Clustering enabled.

Non-blocking DML

Automatic Clustering is transparent and does not block DML statements issued against tables while they are being reclustered. In contrast, while a table is being manually reclustered, the table is locked.

Optimal Efficiency

With Automatic Clustering, Snowflake internally manages the state of clustered tables, as well as the resources (servers, memory, etc.) used for all automated clustering operations. This allows Snowflake to dynamically allocate resources as needed, resulting in more efficient and effective reclustering, in comparison to manual reclustering.

Also, Automatic Clustering does not perform any unnecessary reclustering. Reclustering is triggered only if/when the table would benefit from the operation.

Warehouses and Credit Usage for Automatic Clustering

Automatic Clustering consumes Snowflake credits, but does not require you to provide a virtual warehouse. Instead, Snowflake internally manages and achieves efficient resource utilization for reclustering the tables.

Your account is billed only for the actual credits consumed by automatic clustering operations on your clustered tables.

Important

After enabling or resuming Automatic Clustering on a clustered table, if it has been a while since the table was reclustered, you may experience reclustering activity and corresponding credit charges as Snowflake brings the table to an optimally-clustered state. Once the table is optimally-clustered, the reclustering activity will level off.

Likewise, defining clustering keys on an existing table or changing the clustering keys on a clustered table may trigger reclustering and corresponding credit charges.

To prevent any unexpected credit charges, we recommend starting with one or two selected clustered tables that have been manually reclustered in the past. This will help you establish a baseline for the number of credits consumed by past reclustering activity, which you can then compare to the credits consumed by Automatic Clustering.

Enabling or Resuming Automatic Clustering for a Table

To enable (i.e. “opt in”) or resume Automatic Clustering for a clustered table, use the ALTER TABLE command with a RESUME RECLUSTER clause. For example:

ALTER TABLE t1 RESUME RECLUSTER;

SHOW TABLES LIKE 't1';

+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+
|           created_on            | name | database_name | schema_name | kind  | comment | cluster_by | rows | bytes |  owner   | retention_time | automatic_clustering |
+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+
| Thu, 12 Apr 2018 13:29:01 -0700 | T1   | TESTDB        | MY_SCHEMA   | TABLE |         | LINEAR(C1) | 0    | 0     | SYSADMIN | 1              | ON                   |
+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+

Tip

Before you enable or resume Automatic Clustering on a clustered table, consider the following, which may cause reclustering activity and corresponding credit charges:

  • The table is not optimally-clustered (e.g. significant DML has been performed on the table since it was last reclustered). For more details, see Understanding Snowflake Table Structures.
  • The clustering keys on the table have changed.

As such, we recommend starting with one or two selected clustered tables and assessing the impact of Automatic Clustering on these tables. Once you are comfortable/familiar with how Automatic Clustering performs reclustering, you can then enable/resume it for all your other clustered tables.

Suspending Automatic Clustering for a Table

To suspend Automatic Clustering for a table, use the ALTER TABLE command with a SUSPEND RECLUSTER clause. For example:

ALTER TABLE t1 SUSPEND RECLUSTER;

SHOW TABLES LIKE 't1';

+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+
|           created_on            | name | database_name | schema_name | kind  | comment | cluster_by | rows | bytes |  owner   | retention_time | automatic_clustering |
+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+
| Thu, 12 Apr 2018 13:29:01 -0700 | T1   | TESTDB        | MY_SCHEMA   | TABLE |         | LINEAR(C1) | 0    | 0     | SYSADMIN | 1              | OFF                  |
+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+

Viewing the Automatic Clustering Status for a Table

With Automatic Clustering enabled for your account, you can view whether Automatic Clustering is enabled for an individual table using SQL:

The AUTO_CLUSTERING_ON column in the output for these commands/views displays the Automatic Clustering status for each table. Note that this column is not visible if Automatic Clustering is not enabled for your account.

In addition, the CLUSTERING_KEY column displays the column(s) defined as the clustering key(s) for each table.

Viewing Automatic Clustering Billing

Users with the ACCOUNTADMIN role can view the billing for Automatic Clustering using either the web interface or SQL:

Web Interface:

Click on Account Account tab » Billing & Usage

The billing for Automatic Clustering shows up as a separate warehouse named Snowflake logo in blue (no text) AUTOMATIC_CLUSTERING.

SQL:

Query the AUTOMATIC_CLUSTERING_HISTORY table function (in the Information Schema).