Automatic Clustering

Automatic Clustering is the Snowflake service that seamlessly and continually manages all reclustering, as needed, of clustered tables.

Note that, after a clustered table is defined, reclustering does not necessarily start immediately. Snowflake only reclusters a clustered table if it will benefit from the operation.

Note

If manual reclustering is still available in your account, Automatic Clustering may not be enabled yet for your account. For more details, see Manual Reclustering — Deprecated.

In this Topic:

Benefits of Automatic Clustering

Ease-of-maintenance

Automatic Clustering eliminates the need for performing any of the following tasks:

  • Monitoring the state of clustered tables.

    Instead, as DML is performed on these tables, Snowflake monitors and evaluates the tables to determine whether they would benefit from reclustering, and automatically reclusters them, as needed.

  • Designating warehouses in your account to use for reclustering.

    Snowflake performs automatic reclustering in the background, and you do not need to specify a warehouse to use.

All you need to do is define clustering keys for your tables (if appropriate) and Snowflake manages all future maintenance.

Full Control

You can suspend and resume Automatic Clustering for a clustered table at any time using ALTER TABLE … SUSPEND / RESUME RECLUSTER. While Automatic Clustering is suspended for a table, the table is never automatically reclustered, regardless of its clustering state and, therefore, does not incur any related credit charges.

You can also drop the clustering keys on a clustered table at any time, which prevents all future reclustering on the table.

Non-blocking DML

Automatic Clustering is transparent and does not block DML statements issued against tables while they are being reclustered.

Optimal Efficiency

With Automatic Clustering, Snowflake internally manages the state of clustered tables, as well as the resources (servers, memory, etc.) used for all automated clustering operations. This allows Snowflake to dynamically allocate resources as needed, resulting in the most efficient and effective reclustering.

Also, Automatic Clustering does not perform any unnecessary reclustering. Reclustering is triggered only if/when the table would benefit from the operation.

Credit Usage and Warehouses for Automatic Clustering

Automatic Clustering consumes Snowflake credits, but does not require you to provide a virtual warehouse. Instead, Snowflake internally manages and achieves efficient resource utilization for reclustering the tables.

Your account is billed only for the actual credits consumed by automatic clustering operations on your clustered tables.

Important

After enabling or resuming Automatic Clustering on a clustered table, if it has been a while since the table was reclustered, you may experience reclustering activity (and corresponding credit charges) as Snowflake brings the table to an optimally-clustered state. Once the table is optimally-clustered, the reclustering activity will drop off.

Likewise, defining clustering keys on an existing table or changing the clustering keys on a clustered table may trigger reclustering and credit charges.

To prevent any unexpected credit charges, we recommend starting with one or two selected tables and observing the credit charges associated with keeping the tables well-clustered as DML is performed. This will help you establish a baseline for the number of credits consumed by reclustering activity.

Enabling Automatic Clustering for a Table

No tasks are required to enable Automatic Clustering for a table. You simply define one or more clustering keys.

Tip

Before you define clustering keys for a table, consider the following conditions, which may cause reclustering activity (and corresponding credit charges):

As such, we recommend starting with one or two selected tables and assessing the impact of Automatic Clustering on these tables. Once you are comfortable/familiar with how Automatic Clustering performs reclustering, you can then define clustering keys for your other tables.

Viewing the Automatic Clustering Status for a Table

You can use SQL to view whether Automatic Clustering is enabled for a table:

The AUTO_CLUSTERING_ON column in the output displays the Automatic Clustering status for each table, which be used to determine whether to suspend or resume Automatic Clustering for a given table.

In addition, the CLUSTER_BY column (SHOW TABLES) or CLUSTERING_KEY column (TABLES view) displays the column(s) defined as the clustering key(s) for each table.

Note

Automatic Clustering status is not yet displayed in the TABLES view (in the Account Usage shared database). It will be added in a future release.

Suspending Automatic Clustering for a Table

To suspend Automatic Clustering for a table, use the ALTER TABLE command with a SUSPEND RECLUSTER clause. For example:

ALTER TABLE t1 SUSPEND RECLUSTER;

SHOW TABLES LIKE 't1';

+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+
|           created_on            | name | database_name | schema_name | kind  | comment | cluster_by | rows | bytes |  owner   | retention_time | automatic_clustering |
+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+
| Thu, 12 Apr 2018 13:29:01 -0700 | T1   | TESTDB        | MY_SCHEMA   | TABLE |         | LINEAR(C1) | 0    | 0     | SYSADMIN | 1              | OFF                  |
+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+

Resuming Automatic Clustering for a Table

To resume Automatic Clustering for a clustered table, use the ALTER TABLE command with a RESUME RECLUSTER clause. For example:

ALTER TABLE t1 RESUME RECLUSTER;

SHOW TABLES LIKE 't1';

+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+
|           created_on            | name | database_name | schema_name | kind  | comment | cluster_by | rows | bytes |  owner   | retention_time | automatic_clustering |
+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+
| Thu, 12 Apr 2018 13:29:01 -0700 | T1   | TESTDB        | MY_SCHEMA   | TABLE |         | LINEAR(C1) | 0    | 0     | SYSADMIN | 1              | ON                   |
+---------------------------------+------+---------------+-------------+-------+---------+------------+------+-------+----------+----------------+----------------------+

Tip

Before you resume Automatic Clustering on a clustered table, consider the following conditions, which may cause reclustering activity (and corresponding credit charges):

  • The table is not optimally-clustered (e.g. significant DML has been performed on the table since it was last reclustered).
  • The clustering keys on the table have changed.

For more details, see Micro-partitions & Data Clustering and Clustering Keys & Clustered Tables.

Viewing Automatic Clustering Billing

Users with the ACCOUNTADMIN role can view the billing for Automatic Clustering using either the web interface or SQL:

Web Interface:

Click on Account Account tab » Billing & Usage

The billing for Automatic Clustering shows up as a separate warehouse named Snowflake logo in blue (no text) AUTOMATIC_CLUSTERING.

SQL:

Query the AUTOMATIC_CLUSTERING_HISTORY table function (in the Information Schema).