- Categories:
- Query Syntax

# SAMPLE / TABLESAMPLE¶

Returns a subset of rows sampled randomly from the specified table. The following sampling methods are supported:

- Sample a fraction of a table, with a specified probability for including a given row. The number of rows returned depends on the size of the table and the requested probability. A seed can be specified to make the sampling deterministic.
- Sample a fixed, specified number of rows. The exact number of specified rows is returned unless the table contains fewer rows.

SAMPLE and TABLESAMPLE are synonymous and can be used interchangeably.

## Syntax¶

```
SELECT ...
FROM ...
{ SAMPLE | TABLESAMPLE } [ samplingMethod ] ( { <probability> | <num> ROWS } ) [ { REPEATABLE | SEED } ( <seed> ) ]
[ ... ]
```

Where:

samplingMethod ::= { { BERNOULLI | ROW } | { SYSTEM | BLOCK } }

`BERNOULLI | ROW`

or .`SYSTEM | BLOCK`

Specifies the sampling method to use:

`BERNOULLI`

(or`ROW`

): Includes each row with a

of*probability*`p/100`

. Similar to flipping a weighted coin for each row.`SYSTEM`

(or`BLOCK`

): Includes each block of rows with a

of*probability*`p/100`

. Similar to flipping a weighted coin for each block of rows. This method does not support fixed-size sampling.

Sampling method is optional. If no method is specified, the default is

`BERNOULLI`

.

or .*probability**num*ROWSSpecifies whether to sample based on a fraction of the table or a fixed number of rows in the table, where:

specifies the percentage probability to use for selecting the sample. Can be any decimal number between*probability*`0`

(no rows selected) and`100`

(all rows selected) inclusive.

specifies the number of rows (up to 1,000,000) to sample from the table. Can be any integer between*num*`0`

(no rows selected) and`1000000`

inclusive.

`REPEATABLE | SEED (`

*seed*)- Specifies a seed value to make the sampling deterministic. Can be any integer between
`0`

and`2147483647`

inclusive.

## Usage Notes¶

The following keywords can be used interchangeably:

`SAMPLE | TABLESAMPLE`

`BERNOULLI | ROW`

`SYSTEM | BLOCK`

`REPEATABLE | SEED`

The number of rows returned depends on the sampling method specified:

Fraction-based: - For
`BERNOULLI | ROW`

sampling, the expected number of returned rows is`(p/100)*n`

. - For
`SYSTEM | BLOCK`

sampling, the sample might be biased, in particular for small tables.

Note

For very large tables, the difference between the two methods should be negligible.

Also, because sampling is a probabilistic process, the number of rows returned is not exactly equal to

`(p/100)*n`

rows, but will be close.- If no

is specified, SAMPLE generates different results when the same query is repeated.*seed* - If a table does not change, and the same

and*seed*

are specified, SAMPLE generates the same result. However, sampling on a copy of a table may not return the same result as sampling on the original table, even if the same*probability*

and*probability*

are specified.*seed*

Fixed-size: If the table is larger than the requested number of rows, the number of requested rows is always returned.

If the table is smaller than the requested number of rows, the entire table is returned.

`SYSTEM | BLOCK`

and

are not supported for fixed-size sampling. For example, the following queries produce errors:*seed*select * from example_table sample system (10 rows); select * from example_table sample row (10 rows) seed (99);

- For
Sampling with a

is not supported on views or subqueries. For example, the following query produces an error:*seed*select * from (select * from example_table) sample (1) seed (99);

In addition to using literals to specify

and*probability*|*num*ROWS

, session or bind variables can also be used.*seed*

## Performance Considerations¶

`SYSTEM | BLOCK`

sampling is often faster than`BERNOULLI | ROW`

sampling.- Sampling without a

is often faster than sampling with a*seed*

.*seed* - Fixed-size sampling can be slower than equivalent fraction-based sampling because fixed-size sampling prevents some query optimization.

## Examples¶

### Fraction-based Row Sampling¶

Return a sample of a table in which each row has a 10% probability of being included in the sample:

SELECT * FROM testtable SAMPLE (10);

Return a sample of a table in which each row has a 20.3% probability of being included in the sample:

SELECT * FROM testtable TABLESAMPLE BERNOULLI (20.3);

Return an entire table, including all rows in the table:

SELECT * FROM testtable TABLESAMPLE (100);

Return an empty sample:

SELECT * FROM testtable SAMPLE ROW (0);

### Fraction-based Block Sampling (with Seeds)¶

Return a sample of a table in which each block of rows has a 3% probability of being included in the sample, and set the seed to 82:

SELECT * FROM testtable SAMPLE SYSTEM (3) SEED (82);

Return a sample of a table in which each block of rows has a 0.012% probability of being included in the sample, and set the seed to 99992:

SELECT * FROM testtable SAMPLE BLOCK (0.012) REPEATABLE (99992);

Note

If either of these queries are run again without making any changes to the table, they return the same sample set.

### Fixed-size Row Sampling¶

Return a fixed-size sample of 10 rows in which each row has a `max(1, 10/n)`

probability of being included in the sample, where `n`

is the number of rows in the table:

SELECT * FROM testtable SAMPLE (10 ROWS);