Estimating Approximate Percentile Values Using t-Digest

t-Digest is a space and time efficient way of estimating percentile values in data.

In this Topic:


Snowflake provides an implementation of the t-Digset algorithm by Dunning and Ertl. It has been implemented through the APPROX_PERCENTILE family of functions.

As documented in the t-Digest papers, the algorithm has a constant relative error. Note that the algorithm has substantial empirical support, but no rigorous proof of any accuracy guarantees.

SQL Functions

The following Aggregate Functions are provided for using t-Digest to approximate percentile values:

Implementation Notes

  • The estimation uses a constant amount of space regardless of the size of the input.
  • The t-Digest state is independent from the percentile value. This enables calculating the t-Digest state once, and then querying the state for multiple percentile values.