Skip to content
← All posts

Find K Pairs with Smallest Sums: Min-Heap on Sorted Pairs

4 min read
leetcodeproblemmediumarraysheap

LeetCode 373 gives you two integer arrays sorted in ascending order (call them nums1 and nums2) and asks you to return the k pairs (u, v) with the smallest sums u + v, where u comes from nums1 and v comes from nums2. The answer can be in any order.

Example: nums1 = [1, 7, 11], nums2 = [2, 4, 6], k = 3. The three smallest pairs by sum are [1, 2] (sum 3), [1, 4] (sum 5), and [1, 6] (sum 7).

The brute force (enumerate all n * m pairs, sort by sum, take the first k) works but blows past the time limit for large inputs. To do better, you need to exploit the fact that both arrays are sorted.

The key insight

Because nums2 is sorted, fixing any index i into nums1 gives you a sequence of pairs in ascending order by sum:

(nums1[i], nums2[0]), (nums1[i], nums2[1]), (nums1[i], nums2[2]), ...

Each row i is its own sorted sequence. You have n such rows, and you need to pick the k globally smallest elements across all of them. That is exactly the "merge k sorted sequences" problem, and the classic tool for that is a min-heap.

nums2[0]=2nums2[1]=4nums2[2]=617113#1(0,0)5#2(0,1)7#3(0,2)9H(1,0)11(1,1)13(1,2)13H(2,0)15(2,1)17(2,2)
Extracted (green, #1–#3)In heap (H)Not yet reached
3x3 sum grid for nums1=[1,7,11], nums2=[2,4,6]. Green cells are the first 3 pairs extracted. H marks cells still in the heap after those extractions.

The approach

Seed the heap with the first pair from each row: push (nums1[i] + nums2[0], i, 0) for i in range(min(len(nums1), k)). Then run the standard heap merge loop:

  1. Pop the minimum entry (sum, i, j).
  2. Append [nums1[i], nums2[j]] to results.
  3. If j + 1 < len(nums2), push (nums1[i] + nums2[j + 1], i, j + 1), the next pair from the same row.
  4. Stop when you have k results or the heap empties.

The heap stores at most min(n, k) entries at any time. Each pop surfaces the globally smallest remaining pair.

The code

import heapq

def kSmallestPairs(nums1, nums2, k):
    if not nums1 or not nums2:
        return []
    heap = []
    for i in range(min(len(nums1), k)):
        heapq.heappush(heap, (nums1[i] + nums2[0], i, 0))
    result = []
    while heap and len(result) < k:
        _, i, j = heapq.heappop(heap)
        result.append([nums1[i], nums2[j]])
        if j + 1 < len(nums2):
            heapq.heappush(heap, (nums1[i] + nums2[j + 1], i, j + 1))
    return result

A few details worth noting:

  • The tuple stores the sum first so Python's heap orders by sum automatically.
  • The index i acts as a tiebreaker when two sums are equal, preventing Python from trying to compare the rest of the tuple (which would require a custom comparison).
  • The guard j + 1 < len(nums2) keeps you from pushing past the end of nums2.

Why only seed min(len(nums1), k) rows? You can never pop more than k times total, so seeding more than k rows wastes memory. If nums1 has 10,000 elements but k = 3, you only need three rows in the heap. Rows beyond that can never contribute to the answer.

Step 0: Initialize heap

Seed the heap with (nums1[i]+nums2[0], i, 0) for i in range(min(3, k)=3). Push (3,0,0), (9,1,0), (13,2,0).

Heap before pop

(empty)

Heap after push

  • (3, i=0, j=0)
  • (9, i=1, j=0)
  • (13, i=2, j=0)

Output so far

[]

We only seed min(len(nums1), k) = 3 rows. Each entry (sum, i, j) lets us recover the pair and advance to j+1.

Step 1: Pop min, push next in row

Pop (3, i=0, j=0) → output [nums1[0], nums2[0]] = [1, 2]. Push (nums1[0]+nums2[1], 0, 1) = (5, 0, 1).

Heap before pop

  • (3, i=0, j=0)← min
  • (9, i=1, j=0)
  • (13, i=2, j=0)

Heap after push

  • (5, i=0, j=1)
  • (9, i=1, j=0)
  • (13, i=2, j=0)

Output so far

[[1,2]]

Row 0 advances from j=0 to j=1. The new entry (5, 0, 1) becomes the heap minimum.

Step 2: Pop min, push next in row

Pop (5, i=0, j=1) → output [nums1[0], nums2[1]] = [1, 4]. Push (nums1[0]+nums2[2], 0, 2) = (7, 0, 2).

Heap before pop

  • (5, i=0, j=1)← min
  • (9, i=1, j=0)
  • (13, i=2, j=0)

Heap after push

  • (7, i=0, j=2)
  • (9, i=1, j=0)
  • (13, i=2, j=0)

Output so far

[[1,2], [1,4]]

Row 0 advances from j=1 to j=2. Still the cheapest row to pull from.

Step 3: Pop min — k=3 reached, stop

Pop (7, i=0, j=2) → output [nums1[0], nums2[2]] = [1, 6]. We have k=3 results. Done.

Heap before pop

  • (7, i=0, j=2)← min
  • (9, i=1, j=0)
  • (13, i=2, j=0)

Heap after push

  • (9, i=1, j=0)
  • (13, i=2, j=0)

Output so far

[[1,2], [1,4], [1,6]]

We would push (nums1[0]+nums2[3]) but j+1=3 is out of bounds for nums2. No push needed. Result complete.

Complexity

TimeSpace
Initial heapO(min(n,k) log min(n,k))O(min(n,k))
k popsO(k log min(n,k))
TotalO(k log min(n,k))O(min(n,k))

The dominant term is the k pops, each costing O(log min(n, k)). For most practical inputs where k is much smaller than n * m, this is a large win over the brute-force O(n * m * log(n * m)).

Building blocks

This problem is a direct application of the "merge k sorted sequences with a min-heap" pattern. Each row is one sorted sequence. The heap tracks the frontier: one entry per row, pointing to the cheapest unused pair in that row.

The same building block shows up in Merge K Sorted Lists, where each linked list is a sorted sequence and you maintain one heap entry per list head. The mechanics are identical: pop the min, advance the pointer, push the next element.

Find Median from Data Stream also uses heaps to maintain order statistics over a stream, which is a related idea: heaps excel at problems where you need to efficiently track the minimum (or maximum) across a changing set.

Edge cases

  • k larger than all pairs. If k > len(nums1) * len(nums2), the heap empties before you collect k results. The while heap and len(result) < k guard handles this naturally and you return all valid pairs.
  • Either array empty. The early return covers nums1 = [] or nums2 = [].
  • One array has length 1. Only one row gets seeded. The loop advances j through all of nums2 collecting pairs.
  • Duplicate sums. Two entries with the same sum and different (i, j) positions sit in the heap simultaneously. Python resolves the ordering by comparing i next. No special handling needed.

From understanding to recall

The mental model is a 2D grid where each row and each column is sorted. You want the k smallest cells by value. A diagonal sweep from the top-left corner would work conceptually, but the grid is too large to scan. The heap tracks just the frontier (one cell per row) and expands lazily, touching only what it needs.

Once that picture is clear, the seed step follows: one entry per row, all starting at column 0. The pop-and-push loop follows: after consuming a cell at (i, j), the next candidate from row i is (i, j+1). The bound on seeds follows: you will never need more than k pops, so you only need min(n, k) rows seeded.

If you can draw the grid, point to the initial heap entries, and trace one pop-and-push cycle, you can write the full solution from that picture.

Related posts