What is reservoir sampling?

Reservoir sampling is a randomized algorithm that is used to select kk out of nn samples; nn is usually very large or unknown. For example, reservoir sampling can be used to obtain a sample of size kk from a population of people with brown hair. This algorithm takes O(n)O(n) to select kk elements with uniform probability.

Reservoir sampling is used to randomly take k out n samples. Here k = 4.
Reservoir sampling is used to randomly take k out n samples. Here k = 4.

Algorithm

  1. Copy the first kk elements from the input array to the output array.

  2. Iterate from kk to n1n-1 (both inclusive). In each iteration jj:

    2.1 Generate a random number numnum from 0 to jj.

    2.2 If numnum is less than kk, replace the element at index numnum in the output array with the item at index jj in the input array.

Code

Copyright ©2024 Educative, Inc. All rights reserved