Categories:

System Functions (System Information)

SYSTEM$CLUSTERING_INFORMATION

Returns clustering information, including average clustering depth, for a table based on one or more columns in the table.

See also:

SYSTEM$CLUSTERING_DEPTH

Syntax

SYSTEM$CLUSTERING_INFORMATION( '<table_name>' , '( <col1> [ , <col2> ... ] )' )

Arguments

table_name

Table for which you want to return clustering information.

col1 [ , col2 ... ]

Column(s) in the table for which clustering information is returned:

  • For a table with no clustering key, this argument is required. If this argument is omitted, an error is returned.

  • For a table with a clustering key, this argument is optional; if the argument is omitted, Snowflake uses the defined clustering key to return clustering information.

Note

You can use this argument to return clustering information for any columns in the table, regardless of whether a clustering key is defined for the table.

In other words, you can use this to help you decide what clustering to use in the future.

Usage Notes

  • All arguments are strings (i.e. they must be enclosed in single quotes).

Output

The function returns a JSON object containing the following name/value pairs:

cluster_by_keys

Columns in table used to return clustering information; can be any columns in the table.

notes

This column can contain suggestions to make clustering more efficient. For example, this field might contain a warning if the cardinality of the clustering column is extremely high.

This column can be empty.

total_partition_count

Total number of micro-partitions that comprise the table.

total_constant_partition_count

Total number of micro-partitions for which the value of the specified columns have reached a constant state (i.e. the micro-partitions will not benefit significantly from reclustering). The number of constant micro-partitions in a table has an impact on pruning for queries. The higher the number, the more micro-partitions can be pruned from queries executed on the table, which has a corresponding impact on performance.

average_overlaps

Average number of overlapping micro-partitions for each micro-partition in the table. A high number indicates the table is not well-clustered.

average_depth

Average overlap depth of each micro-partition in the table. A high number indicates the table is not well-clustered.

This value is also returned by SYSTEM$CLUSTERING_DEPTH.

partition_depth_histogram

A histogram depicting the distribution of overlap depth for each micro-partition in the table. The histogram contains buckets with widths:

  • 0 to 16 with increments of 1.

  • For buckets larger than 16, increments of twice the width of the previous bucket (e.g. 32, 64, 128, …).

For more information about micro-partition overlap and depth and their impact on query pruning, see Understanding Snowflake Table Structures.

Examples

Return the clustering information for a table using two columns in the table:

SELECT SYSTEM$CLUSTERING_INFORMATION('test2', '(col1, col3)');

+--------------------------------------------------------------+
| SYSTEM$CLUSTERING_INFORMATION('TEST2', '(COL1, COL3)')       |
|--------------------------------------------------------------|
| {                                                            |
|   "cluster_by_keys" : "(COL1, COL3)",                        |
|   "total_partition_count" : 1156,                            |
|   "total_constant_partition_count" : 0,                      |
|   "average_overlaps" : 117.5484,                             |
|   "average_depth" : 64.0701,                                 |
|   "partition_depth_histogram" : {                            |
|     "00000" : 0,                                             |
|     "00001" : 0,                                             |
|     "00002" : 3,                                             |
|     "00003" : 3,                                             |
|     "00004" : 4,                                             |
|     "00005" : 6,                                             |
|     "00006" : 3,                                             |
|     "00007" : 5,                                             |
|     "00008" : 10,                                            |
|     "00009" : 5,                                             |
|     "00010" : 7,                                             |
|     "00011" : 6,                                             |
|     "00012" : 8,                                             |
|     "00013" : 8,                                             |
|     "00014" : 9,                                             |
|     "00015" : 8,                                             |
|     "00016" : 6,                                             |
|     "00032" : 98,                                            |
|     "00064" : 269,                                           |
|     "00128" : 698                                            |
|   }                                                          |
| }                                                            |
+--------------------------------------------------------------+

This example indicates that the test2 table is not well-clustered for the following reasons:

  • Zero (0) constant micro-partitions out of 1156 total micro-partitions.

  • High average of overlapping micro-partitions.

  • High average of overlap depth across micro-partitions.

  • Most of the micro-partitions are grouped at the lower-end of the histogram, with the majority of micro-partitions having an overlap depth between 64 and 128.