Data Generation Functions¶
Data Generation functions allow you to generate data. In many cases, they generate random data, which can be useful for testing. The SEQ1 / SEQ2 / SEQ4 / SEQ8 functions generate a sequence of numbers, which can be useful for providing unique ID numbers for records that do not already have unique identifiers.
In this Topic:
List of Functions¶
There are two sub-categories of data generation functions:
- Controlled Distribution
Random data generation functions produce a random value each time. Each value is independent of the other values generated by other calls to the function. (The underlying algorithm produces pseudo-random values, and thus the values are not truly random or independent, but without knowing the algorithm, the values are essentially unpredictable, usually evenly distributed if the sample size is large, and pseudo-independent of each other.)
Controlled-distribution data generation functions produce values that are not
independent. For example, the
NORMAL function returns values that have an
approximately “normal” (bell-shaped) distribution based on a specified mean
and standard deviation. Thus each new value generated is at least indirectly
influenced by previously generated values as the function tries to maintain
the specified distribution. As another example, the
SEQ family of functions
return a sequence of values.
UNIFORM function is listed as a controlled-distribution function,
but is intended to generate evenly-distributed values. In other words,
it acts as though it’s a “Random” function, but we refer to it as a
controlled-distribution function because the distribution is explicitly
specified and because you can choose a data-generation function that
produces non-uniform values over a large sample size.
|Random||RANDOM||Returns a pseudo-random 64-bit integer.|
|RANDSTR||Returns a random string of specified length.|
|UUID_STRING||Returns a random RFC 4122-compliant UUID as a formatted string.|
|Controlled Distribution||NORMAL||Returns a normal-distributed floating point number, with specified mean and standard deviation.|
|UNIFORM||Returns a uniformly random number within the specified range.|
|ZIPF||Returns a Zipf-distributed integer.|
|SEQ1 / SEQ2 / SEQ4 / SEQ8||Returns a sequence of monotonically increasing integers.|
Random distribution functions are deterministic.
Each random distribution function takes a generator expression,
gen, as its last argument. The generator expression,
gen, can be constant or variable:
- If constant, then the result of the random distribution function is constant (unless there are other, variable arguments, which is currently only supported for the RANDSTR function).
- If variable, then the result of the random distribution function is variable.
Generator expressions must be a type 64-bit integer, although implicit conversions are allowed. Any expression that can be converted into a 64-bit integer can be used as a generator expression.
The randomness of any random distribution function is directly linked to the randomness of its generator expression. For most practical purposes, the RANDOM data generation function is the best choice for randomly-generated integer values.
Sequences generated by data generation functions are not guaranteed to be ordered and gap-free. This is because the numbers may be generated in parallel, in an unsynchronized fashion.
For more details about sequences in Snowflake, see Using Sequences.