K Closest Points to Origin: Heap and Quickselect Solutions
LeetCode K Closest Points to Origin (Problem 973) asks you to find the k closest points to the origin (0, 0) from a list of points on the 2D plane. The distance is Euclidean, but since you only need relative ordering, you can compare squared distances to avoid computing square roots.
Example: points = [[3,3],[5,-1],[-2,4]], k = 2. The squared distances are 18, 26, and 20. The two closest points are [3,3] and [-2,4].
The problem
You are given an array of points where points[i] = [xi, yi] represents a point on the X-Y plane, and an integer k. Return the k closest points to the origin (0, 0). The answer is guaranteed to be unique (except for the order of the points).
The Euclidean distance from a point (x, y) to the origin is sqrt(x^2 + y^2). But since sqrt is a monotonically increasing function, you can skip it entirely and compare x^2 + y^2 directly. This is both faster and avoids floating-point issues.
The sorting approach
The simplest approach: compute the squared distance for every point, sort the array by distance, and return the first k elements.
def kClosest(points, k):
points.sort(key=lambda p: p[0] ** 2 + p[1] ** 2)
return points[:k]
This works and is clean, but it runs in O(n log n) time. You are sorting the entire array even though you only need k elements. When k is much smaller than n, that is wasted work.
The heap approach
A better idea: use a max-heap of size k. Scan through each point and push it onto the heap. If the heap grows larger than k, pop the element with the largest distance. After processing all points, the heap contains exactly the k closest.
Why a max-heap and not a min-heap? Because you want to quickly evict the farthest point. The max-heap keeps the farthest point at the root, so popping is O(log k).
Python's heapq is a min-heap, so you negate the distances to simulate a max-heap.
You can compare x*x + y*y directly instead of computing sqrt(x*x + y*y). This avoids floating-point precision issues and is faster. Since you only care about relative ordering, squared distances work perfectly.
Step-by-step walkthrough
Let's trace through the example with points = [[3,3],[5,-1],[-2,4]] and k = 2, using the max-heap approach.
Step 1: Compute squared distances for all points
No need for sqrt. Squared distances preserve relative ordering.
Step 2: Push (3,3) onto max-heap. Size 1, room for more (k=2).
Heap has one entry. The max-heap root tracks the farthest point in the heap.
Step 3: Push (5,-1) onto max-heap. Size 2 = k. Heap is full.
Both points fit. The max-heap root is (5,-1) with d²=26, the farthest in the heap.
Step 4: Push (-2,4) d²=20. Size exceeds k. Pop the farthest.
(5,-1) with d²=26 is popped. It was farther than the new point. The heap now holds the 2 closest.
Step 5: All points processed. The heap contains the k=2 closest.
Result: [[3,3], [-2,4]]. The max-heap kept the smallest distances by evicting the largest.
The code
import heapq
def kClosest(points, k):
heap = []
for x, y in points:
dist = x * x + y * y
heapq.heappush(heap, (-dist, x, y))
if len(heap) > k:
heapq.heappop(heap)
return [[x, y] for _, x, y in heap]
For each point, you compute the squared distance, negate it (so the max distance becomes the min in Python's min-heap), and push it. If the heap exceeds size k, you pop the root, which removes the point with the largest distance.
At the end, the heap holds exactly k points, and they are the k closest.
Time: O(n log k). Each push/pop is O(log k), and you do it for all n points. When k is small, this is much better than sorting.
Space: O(k) for the heap.
You could also do this in one line with heapq.nlargest by sorting on negative distance, but the explicit version above is better for interviews because it shows you understand the mechanics.
The quickselect alternative
Quickselect can solve this in O(n) average time. The idea is the same as finding the kth smallest element: partition the points array around a pivot distance, then recurse into whichever side contains the kth position. Once the pivot lands at index k, everything to its left is closer.
import random
def kClosest(points, k):
def dist(p):
return p[0] ** 2 + p[1] ** 2
def quickselect(left, right):
pivot_idx = random.randint(left, right)
pivot_dist = dist(points[pivot_idx])
points[pivot_idx], points[right] = points[right], points[pivot_idx]
store = left
for i in range(left, right):
if dist(points[i]) < pivot_dist:
points[store], points[i] = points[i], points[store]
store += 1
points[store], points[right] = points[right], points[store]
if store == k:
return
elif store < k:
quickselect(store + 1, right)
else:
quickselect(left, store - 1)
quickselect(0, len(points) - 1)
return points[:k]
Quickselect gives O(n) average time but O(n^2) in the worst case. The max-heap approach guarantees O(n log k) every time. For interviews, the heap version is safer and easier to get right. Quickselect is worth mentioning to show you know the tradeoff.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Sorting | O(n log n) | O(n) |
| Max-heap of size k | O(n log k) | O(k) |
| Quickselect | O(n) average | O(1) |
When k is small relative to n, the heap approach is a big win over sorting. When k is close to n, sorting and the heap converge in cost, and sorting might be simpler.
The building blocks
Max-heap for k smallest
Using a max-heap to maintain the k smallest elements is a fundamental pattern. You have seen it in Kth Largest Element (where you keep the k largest with a min-heap) and Top K Frequent Elements (where you keep the k most frequent). K Closest Points applies the same template, just with distance as the comparison key.
The template:
import heapq
heap = []
for item in items:
heapq.heappush(heap, (-score(item), item))
if len(heap) > k:
heapq.heappop(heap)
# heap contains the k items with the smallest score
Squared distance comparison
Whenever a problem asks about Euclidean distance and you only need relative ordering (closer vs. farther, not the actual distance value), compare squared distances. This is faster (no sqrt call) and avoids floating-point errors. It works because sqrt is monotonically increasing: if a^2 is less than b^2 for non-negative values, then a is less than b.
Edge cases
- k equals n. Return all points. Both sorting and heap work, though the heap does unnecessary push/pop cycles. You could short-circuit this check.
- All points at the same distance. Any k of them is valid. The algorithms handle this naturally since they do not rely on strict ordering.
- Points on axes. One coordinate is 0. No special handling needed,
0^2 + y^2 = y^2works fine. - Single point.
points = [[1,2]],k = 1. Return[[1,2]]. Both approaches handle this.
From understanding to recall
The max-heap approach for this problem is almost identical to Kth Largest Element. The only twist is that your comparison key is squared distance instead of the value itself. If you have drilled the "max-heap of size k" pattern, you can adapt it to K Closest Points in seconds.
The part that trips people up in interviews is remembering to negate the distance for Python's min-heap. It is a small detail, but under pressure, small details get forgotten. Spaced repetition locks in these mechanical steps so you can focus on the problem structure instead of fighting with the heap API.
Related posts
- Kth Largest Element in an Array - The foundational quickselect/heap pattern. K Closest Points is a direct application of this selection technique.
- Top K Frequent Elements - Another "find k best" problem using heaps, but with frequency as the metric instead of distance.
- Sort an Array - The sorting approach connects to general sorting strategies, and understanding when sorting is good enough versus when you need a more targeted approach.