Data Generation Functions

Data Generation functions allow you to generate data. In many cases, they generate random data, which can be useful for testing. The SEQ1 / SEQ2 / SEQ4 / SEQ8 functions generate a sequence of numbers, which can be useful for providing unique ID numbers for records that do not already have unique identifiers.

In this Topic:

List of Functions

There are two sub-categories of data generation functions:

  • Random
  • Controlled Distribution

Random data generation functions produce a random value each time. Each value is independent of the other values generated by other calls to the function. (The underlying algorithm produces pseudo-random values, and thus the values are not truly random or independent, but without knowing the algorithm, the values are essentially unpredictable, usually evenly distributed if the sample size is large, and pseudo-independent of each other.)

Controlled-distribution data generation functions produce values that are not independent. For example, the NORMAL function returns values that have an approximately “normal” (bell-shaped) distribution based on a specified mean and standard deviation. Thus each new value generated is at least indirectly influenced by previously generated values as the function tries to maintain the specified distribution. As another example, the SEQ family of functions return a sequence of values.

Note

The UNIFORM function is listed as a controlled-distribution function, but is intended to generate evenly-distributed values. In other words, it acts as though it’s a “Random” function, but we refer to it as a controlled-distribution function because the distribution is explicitly specified and because you can choose a data-generation function that produces non-uniform values over a large sample size.

Sub-category Function Notes
Random RANDOM Returns a pseudo-random 64-bit integer.
RANDSTR Returns a random string of specified length.
UUID_STRING Returns a random RFC 4122-compliant UUID as a formatted string.
Controlled Distribution NORMAL Returns a normal-distributed floating point number, with specified mean and standard deviation.
UNIFORM Returns a uniformly random number within the specified range.
ZIPF Returns a Zipf-distributed integer.
SEQ1 / SEQ2 / SEQ4 / SEQ8 Returns a sequence of monotonically increasing integers.

Usage Notes

  • Random distribution functions are deterministic.

  • Each random distribution function takes a generator expression, gen, as its last argument. The generator expression, gen, can be constant or variable:

    • If constant, then the result of the random distribution function is constant (unless there are other, variable arguments, which is currently only supported for the RANDSTR function).
    • If variable, then the result of the random distribution function is variable.
  • Generator expressions must be a type 64-bit integer, although implicit conversions are allowed. Any expression that can be converted into a 64-bit integer can be used as a generator expression.

  • The randomness of any random distribution function is directly linked to the randomness of its generator expression. For most practical purposes, the RANDOM data generation function is the best choice for randomly-generated integer values.

  • Sequences generated by data generation functions are not guaranteed to be ordered and gap-free. This is because the numbers may be generated in parallel, in an unsynchronized fashion.

    For more details about sequences in Snowflake, see Using Sequences.