We are given an array of positive integers representing the weights of their respective indices. We need to implement an algorithm that selects indices randomly with respect to their weights.
Let:
Here, we need to implement the pickIndex()
function that returns an index with the probability
.
pickIndex()
can be called multiple times on the given distribution of weights.
For , pickIndex()
must have:
= = ≈ 16.6% chance of picking index 0
= ≈ 8.3% chance of picking index 1
= ≈ 33.3% chance of picking index 2
≈ 41.6% of picking index 3
First, we preprocess the array of weights and calculate the prefix sums. Let be an array of prefix sums, where .
For random selection with particular weights, the following technique can then be used:
We can use a binary search algorithm for the second step as the array of prefix sums is sorted (weights are positive integers).
Hence, we detect the th interval of prefix sums in which the randomly chosen value lies. The length of the th interval is . Therefore, the probability that index is to be selected is .
#include <iostream>#include <vector>#include <algorithm>#include <random>using namespace std;class WeightedRandomGenerator {public:WeightedRandomGenerator(const vector<int>& w):randomGen_{random_device{}()}{if(w.empty())return;//calculate prefix sums for given weightsprefixSums_.resize(w.size());prefixSums_[0] = w[0];for(size_t i = 1; i < w.size(); ++i){prefixSums_[i] = prefixSums_[i-1] + w[i];}}int pickIndex(){if(prefixSums_.empty())return -1;//get sum of all weightsint totalSum = prefixSums_.back();//get random value between 0 and totalSum - 1int value = getRandom(0, totalSum - 1);//get index of the first element in prefix sums greater than valueauto it = upper_bound(prefixSums_.begin(), prefixSums_.end(), value);return distance(prefixSums_.begin(), it);}private:int getRandom(int left, int right){//C++11's random number generation facilitiesstd::uniform_int_distribution<> distrib(left, right);return distrib(randomGen_);}private:vector<int> prefixSums_;std::mt19937 randomGen_;};int main(){//create vector with specific weights distributionvector<int> w = {2, 1, 4, 5};//create generator objectWeightedRandomGenerator generator(w);//test generator: calculate frequencies of choosing each index during 100 callsvector<int> counts(w.size());for(int i = 0; i < 100; ++i){++counts[generator.pickIndex()];}//output testing resultsfor(size_t j = 0; j < counts.size(); ++j){cout << "Index " << j << " is chosen " << counts[j] << " times." << endl;}return 0;}
Let us take as the number of weights. The preprocessing step
(construction of the WeightedRandomGenerator
object) has time complexity. This is the time complexity as we iterate over the weights and calculate prefix sums in linear time.
pickIndex()
searches a random number in a sorted array of prefix sums using a binary search. Therefore, the
time complexity of each pickIndex()
call is logarithmic – .
The space complexity is for storing prefix sums array.