Kth Largest Element in a Stream: Min-Heap Design
You are given a stream of integers and need to find the kth largest element at any point. This is LeetCode 703: Kth Largest Element in a Stream. You design a class KthLargest that takes an integer k and an initial array nums, then supports an add(val) method that returns the kth largest element after inserting val into the stream.
For example, with k = 3 and nums = [4, 5, 8, 2], the 3rd largest is 4. If you then call add(3), the answer is still 4 (because 3 does not crack the top 3). But add(5) bumps the answer up to 5, and add(10) pushes it further. The kth largest can only stay the same or increase as new elements arrive.
Why this problem matters
This problem sits at the intersection of two important concepts: heap-based design and streaming algorithms. In real systems, you often cannot store every piece of data that flows through a pipeline. You need to maintain a rolling summary, like "what are the top k items right now?" This is exactly what LeetCode 703 asks you to build.
The streaming constraint is what makes this different from Kth Largest Element in an Array. In the array version, you have all the data upfront and can use quickselect or a one-pass heap. Here, new elements arrive one at a time, and you need to answer the query after each insertion. Quickselect would require re-scanning the entire dataset every time, which is too expensive. You need a data structure that supports efficient insertion and efficient access to the kth largest element.
That data structure is a min-heap of size k. It is one of the most practical patterns in algorithm interviews, and once you internalize it, you will recognize it in problems like Top K Frequent Elements and Find Median from Data Stream.
The naive approach
The simplest idea is to keep a sorted list of all elements seen so far. Each time you call add(val), insert val into the sorted list and return the element at position len(list) - k.
This works, but it is slow. Inserting into a sorted list takes O(n) time in the worst case (you need to shift elements). After m calls to add, you have done O(m * n) total work. For large streams, this is not going to cut it.
You could also sort the entire list from scratch on every add call, but that is O(n log n) per query, which is even worse. The problem is asking you to do better.
The min-heap insight
Here is the key insight: you do not need to track all the elements. You only care about the k largest. Everything smaller than the kth largest is irrelevant to your answer.
So you maintain a min-heap of size k. The smallest element in this heap is the kth largest overall. When a new element arrives, you compare it to the heap's root:
- If the new element is smaller than or equal to the root, it cannot be in the top k. Ignore it.
- If the new element is larger than the root, it belongs in the top k. Push it onto the heap, then pop the root (which is no longer in the top k).
After this operation, the root of the heap is the new kth largest element. Return it.
import heapq
class KthLargest:
def __init__(self, k: int, nums: list[int]):
self.k = k
self.heap = nums
heapq.heapify(self.heap)
while len(self.heap) > k:
heapq.heappop(self.heap)
def add(self, val: int) -> int:
heapq.heappush(self.heap, val)
if len(self.heap) > self.k:
heapq.heappop(self.heap)
return self.heap[0]
The constructor heapifies the initial array and trims it down to size k by popping the smallest elements. After that, each add call is just a push and a conditional pop. Clean and efficient.
Step-by-step walkthrough
Step 1: Constructor - KthLargest(k=3, nums=[4, 5, 8, 2])
Build a min-heap from nums, keeping only the k=3 largest. Heap = [4, 5, 8]. The root (4) is the 3rd largest.
Step 2: add(3) - Compare 3 with heap root (4)
3 is smaller than the root. It cannot be in the top 3. Discard it. Return heap[0] = 4.
Step 3: add(5) - Compare 5 with heap root (4)
5 is larger than the root. Push 5, then pop the smallest (4). Heap becomes [5, 5, 8]. Return heap[0] = 5.
Step 4: add(10) - Compare 10 with heap root (5)
10 is larger than the root. Push 10, then pop the smallest (5). Heap becomes [5, 10, 8]. Return heap[0] = 5.
Step 5: add(9) - Compare 9 with heap root (5)
9 is larger than the root. Push 9, pop the smallest (5). Heap becomes [8, 10, 9]. Return heap[0] = 8.
Step 6: add(4) - Compare 4 with heap root (8)
4 is smaller than the root. It cannot be in the top 3. Discard it. Return heap[0] = 8.
Why a min-heap and not a max-heap?
This trips people up. You want the kth largest, so your instinct might be to use a max-heap. But think about what happens with a max-heap of size k: the root would be the largest element, not the kth largest. You would need to peek all the way to the bottom of the heap to find the kth largest, which defeats the purpose.
With a min-heap of size k, the root is the smallest of the top k elements. That smallest-of-the-top-k is exactly the kth largest overall. You get O(1) access to the answer by simply reading heap[0].
Think of the min-heap as a "bouncer at the door." It keeps the k biggest VIPs inside and rejects anyone too small to get in. The bouncer (the root) is the weakest VIP. If a newcomer is bigger than the bouncer, they swap places: the newcomer gets in, and the old bouncer gets kicked out.
Complexity analysis
| Operation | Time | Space |
|---|---|---|
| Constructor | O(n log k) | O(k) |
| add() | O(log k) | O(k) |
The constructor does O(n) work to heapify, then pops n - k elements at O(log k) each, giving O(n log k) total. Each add call does at most one push and one pop on a heap of size k, both O(log k). Space is O(k) because the heap never grows beyond k elements.
When k is small relative to the total number of elements, this is very efficient. You are maintaining a tiny heap regardless of how many millions of elements flow through the stream.
Edge cases to watch for
- Initial array smaller than k. If
numshas fewer than k elements, the heap starts with fewer than k entries. The first fewaddcalls simply push without popping until the heap reaches size k. The code above handles this naturally because it only pops whenlen(self.heap) > self.k. - k = 1. You are tracking the maximum. The heap has one element, which is always the largest seen so far.
- Duplicate values.
nums = [3, 3, 3],k = 2. The answer is3. The heap does not require distinct elements. - Negative numbers. The heap works the same way. Python's
heapqhandles negative values correctly. - Very large k. If k equals the total number of elements, the heap holds everything, and the root is the overall minimum. The problem guarantees
1 <= k <= len(nums) + 1, so this is a valid scenario.
The building blocks
Min-heap of size k
This is the core pattern. Maintain a heap with at most k elements. Push new items in, pop when the heap exceeds size k. The root always gives you the kth largest. You will see this same structure in Top K Frequent Elements (where you push frequency-element pairs) and in any problem that asks for "the k largest" or "the k smallest" of a stream.
Streaming design class
LeetCode 703 is a design problem. You are building a class with state that persists across method calls. The constructor sets up the data structure, and each method call updates it incrementally. This pattern shows up in LRU Cache, Min Stack, and Find Median from Data Stream. The key skill is choosing the right data structure so that each operation is efficient.
From understanding to recall
The min-heap-of-size-k pattern is one of those ideas that feels obvious once you see it, but easy to fumble under pressure. The part that tends to slip is the constructor logic: heapify, then trim down to size k. If you forget to trim, your first few add calls will give wrong answers because the heap is too large.
Spaced repetition locks this in. You rebuild the class from scratch, typing the constructor and add method from understanding rather than memorization. A few reps at increasing intervals and the whole pattern becomes second nature. You will not need to re-derive it during an interview.
Related posts
- Kth Largest Element in an Array - The one-shot version of this problem, solvable with quickselect or a heap
- Top K Frequent Elements - Same min-heap-of-size-k pattern, combined with frequency counting
- Find Median from Data Stream - Another streaming design problem using two heaps