Practical Data Analysis with SQL/

...

/

Sampling

Sampling

Learn to sample a subset of a table using SQL.

We'll cover the following...

Sampling with LIMIT
Using TABLESAMPLE
- The SYSTEM sampling method
- The BERNOULLI sampling method
Repeatable sampling
Performance comparison

Extracting a small subset of a table is often called sampling. There are various reasons to use sampling, for example:

Performing estimations on large datasets: When working on large tables, we are sometimes willing to compromise accuracy in favor of speed. By sampling a portion of the table we can produce less accurate results more quickly.
Producing a training set: When doing data analysis using machine learning models, it is often necessary to train the model on a portion of the data. This portion is known as a training set. The training set can be produced by sampling the table.

Sampling with `LIMIT`

A simple way to fetch a random portion of a table is combining random with LIMIT:

Press + to interact