Top K Elements: Introduction

Let’s go over the Top K Elements pattern, its real-world applications, and some problems we can solve with it.

About the pattern

The top k elements pattern is an important technique in coding that helps us efficiently find a specific number of elements, known as kk, from a set of data. This is particularly useful when we’re tasked with identifying the largest, smallest, or most/least frequent elements within an unsorted collection.

To solve tasks like these, one might think to sort the entire collection first, which takes O(nlog(n))O(n \log(n)) time, and then select the top k elements, taking additional O(k)O(k) time. However, the top k elements pattern bypasses the need for full sorting, reducing the time complexity to O(nlogk)O(n \log k) by managing which elements we compare and keep track of.

Which data structure can we use to solve such problems? A heap is the best data structure to keep track of the smallest or largest kk elements. With this pattern, we either use a max heap or a min heap to find the smallest or largest kk elements, respectively, because they allow us to efficiently maintain a collection of elements ordered in a way that gives us quick access to the smallest (min heap) or largest (max heap) element.

For example, let’s look at how this pattern operates to solve the problem of finding the top kk largest elements (by using min heap) or top kk smallest elements (by using max heap):

  1. Insert the first kk elements from the given set of elements into a heap. If we’re looking for the largest elements, use a min heap to keep the smallest of the large elements at the top. Conversely, for the smallest elements, use a max heap to keep the largest of the small elements at the top.

  2. Iterate through the remaining elements of the given set.

    1. For a min heap, if we find an element larger than the top, remove the top element (the smallest of the large elements) and insert the new, larger element. This ensures the heap always contains the largest elements seen so far.
    2. For a max heap, if we find an element smaller than the top, remove the top element (the largest of the small elements) and insert the new, smaller element, keeping the heap filled with the smallest elements seen so far.

The efficiency of this pattern comes from the ability of the heap to insert and remove elements in O(logk)O(\log k) time. Because we only maintain kk elements in the heap, these operations are quick, and we can process all nn elements in the given set in O(nlogk)O(n \log k) time.

It’s important to note that while accessing the top element of the heap can be done in O(1)O(1) time, retrieving all k elements, if necessary, involves removing them one by one. This process takes O(klogk)O(k \log k) time because each removal necessitates reorganizing the heap.

Let’s look at the following illustration to understand how to use min heap to find the top three largest elements.