Linear-Time Selection

Understand the efficiency and effectiveness of the linear-time selection algorithm.

We'll cover the following...

During our discussion of quick-sort, we claimed in passing that we can find the median of an unsorted array in linear time. The first such algorithm was discovered by Manuel Blum, Bob Floyd, Vaughan Pratt, Ron Rivest, and Bob Tarjan in the early 1970s. Their algorithm actually solves the more general problem of selecting the kk-th smallest element in an nn-element array, given the array and the integer kk as input, using a variant of an algorithm called quick-select (or one-armed quick-sort). Quick-select was first described by Tony Hoare in 1961, literally on the same page where he first published quick-sort.

Quick-select

The generic quick-select algorithm chooses a pivot element, partitions the array using the same PartitionPartition subroutine as QuickSortQuickSort, and then recursively searches only one of the two subarrays, specifically the one that contains the kk-th smallest element of the original input array. The pseudocode for quick-select is shown below.

Algorithm


QuickSelect(A[1..n],k):if n=1return A[i]elseChoose a pivot element A[p]rPartition(A[1..n],p)if k<rreturn QuickSelect(A[1..r1],k)else if k>rreturn QuickSelect(A[r+1..n],kr)elsereturn A[r]\underline{QuickSelect(A[1 .. n], k):} \\ \hspace{0.4cm} if\space n=1 \\ \hspace{1cm} return\space A[i] \\ \hspace{0.4cm} else \\ \hspace{1cm} Choose\space a\space pivot\space element\space A[p] \\ \hspace{1cm} r ← Partition(A[1 .. n], p) \\ \hspace{1cm} if\space k < r \\ \hspace{1.7cm} return\space QuickSelect(A[1 .. r − 1], k) \\ \hspace{1cm} else\space if\space k > r \\ \hspace{1.7cm} return\space QuickSelect(A[r + 1 .. n], k − r) \\ \hspace{1cm} else \\ \hspace{1.7cm} return\space A[r]


Implementation

Press + to interact
#include <iostream>
#include <vector>
using namespace std;
int Partition(vector<int> &A, int p)
{
int pivot = A[p];
swap(A[p], A[A.size() - 1]);
int i = 0;
for (int j = 0; j < A.size() - 1; j++)
{
if (A[j] < pivot)
{
swap(A[i], A[j]);
i++;
}
}
swap(A[i], A[A.size() - 1]);
return i;
}
int QuickSelect(vector<int> &A, int k)
{
if (A.size() == 1)
{
return A[0];
}
int p = A.size() / 2;
int r = Partition(A, p);
if (k < r)
{
vector<int> leftSubarray(A.begin(), A.begin() + r);
return QuickSelect(leftSubarray, k);
}
else if (k > r)
{
vector<int> rightSubarray(A.begin() + r + 1, A.end());
return QuickSelect(rightSubarray, k - r - 1);
}
else
{
return A[r];
}
}
int main()
{
vector<int> A = { 10, 5, 2, 0, 7, 6, 4 };
int k = 3;
int result = QuickSelect(A, k - 1); // k-1 because array indexing starts from 0
cout << "The " << k << "th smallest element is: " << result << endl;
return 0;
}

Explanation

  • Lines 5–21: The Partition function takes a vector A and an index p as input. It selects a pivot element A[p], moves it to the end of the array, and initializes two pointers i and j. It then scans the array from left to right with j, swapping any element less than the pivot with A[i] and incrementing i. Finally, it swaps the pivot element back to its sorted position, A[i]. The function then returns the index i of the pivot element after it has been sorted.

  • Lines 30–31: The code selects a pivot index p in the middle of the array and partitions the array using the Partition function to obtain the pivot element’s sorted index r.

  • Lines 32–36: If k is less than r, it recurses on the left subarray from the beginning of A to r.

  • Lines 37–41: If k is greater than r, it recurses on the right subarray from r + 1 to the end of A. The value of k is adjusted by subtracting r + 1 to account for the pivot element’s index in the left subarray.

  • Lines 42–45: If k is equal to r, it returns the pivot element at index r.

This algorithm has two important features. First, just like quick-sort, the correctness of quick-select doesn’t depend on how the pivot is chosen. Second, even if we really only care about selecting medians (the special case k=n/2k = n/2 ...