Solution: K Closest Points to Origin

Let's solve the K Closest Points to Origin problem using the Top K Elements pattern.

Statement

Given a list of points on a plane, where the plane is a 2-D array with (x, y) coordinates, find the kk closest points to the origin (0,0)(0, 0).

Note: Here, the distance between two points on a plane is the Euclidean distance: x2+y2 \sqrt{x^2 + y^2}

Constraints:

  • 11 \leq k \leq points.length 103\leq 10^3
  • 104<-10^4 < x[i], y[i]<104< 10^4

Solution

So far, you’ve probably brainstormed some approaches and have an idea of how to solve this problem. Let’s explore some of these approaches and figure out which one to follow based on considerations such as time complexity and any implementation constraints.

Naive approach

When thinking about how to solve this problem, it may help to solve a simpler problem—find the point closest to the origin. This would involve a linear scan through the unsorted list of points, with, at each step, a comparison between the closest point discovered so far and the current point from the list. The point closer to the origin would then continue as the candidate solution. This has a runtime complexity of O(n)O(n) as opposed to the naive solution of sorting the points by distance from the origin and picking the first one, which has a complexity of O(nlogn)O(n \log n).

For this reason, when extending the solution to the kk closest points to the origin, we’d ideally like to do one scan through the list of points. However, if we have to check the current set of kk closest points with the current point under consideration at each step, we’ll end up with a time complexity of O(n.k)O(n . k).

Optimized approach using top K elements:

The intuition behind this approach centers around efficiently managing comparisons and updates using a max-heap data structure. By maintaining a max-heap of the closest points sorted by their Euclidean distance from the origin, the algorithm ensures O(1)O(1) access to the point farthest from the origin among the top k closest points. This allows for a direct comparison between the farthest point in the heap and each new candidate point. If a candidate point is closer, it replaces the farthest point in the heap, keeping the heap filled with the closest points.

By organizing our set of kk closest points in a max-heap, we reduce the number of comparisons at each step. Instead of comparing all kk points with the next point from the list, we simply compare the farthest point in the max-heap with the next point. If the new point is closer to the origin, it replaces the farthest point in the heap. By the end of the process, the max-heap contains exactly the kk closest points to the origin.

In this way, at every step of the scan through the list, the max-heap acts like a sieve, picking out the top kk points in terms of their distance from the origin.

And as we’ll see, the time complexity is much better than O(n.k)O(n . k).

The Euclidean distance between a point P(x, y) and the origin can be calculated using the following formula:

x2+y2\sqrt{x^2 + y^2}

​​ Now that we can calculate the distance between (0,0)(0, 0) and all the points, how will we find the kk nearest points? As discussed above, the heap data structure is ideal for this purpose—we’ll use a custom comparison function to define the order of the elements in a heap. Since we plan to populate the heap with coordinate pairs, we’ll define a class Point and implement a custom less-than function in it (__lt__(self, other) in Python) for use by the heapify process. In this function, we compare the distances of the two points relative to the origin. The point closer to the origin will be considered less than the other point. We’ll iterate through the given list of points, and if we find one that is closer to the origin than the point at the root of the max-heap, we do the following two things:

  • Pop from the max-heap—that is, remove the point in the heap farthest from the origin.
  • Push the point that is closer to the origin onto the max-heap.

As we move through the given list of points, this will ensure that we always have the kk points in our heap that are the closest to the origin.

Below is an illustration of this process.