Kth Missing Positive Number: Binary Search on Missing Counts
Kth Missing Positive Number is LeetCode 1539. You are given a sorted array of positive integers arr in strictly increasing order and an integer k. Return the kth positive integer that is missing from the array.
Example: arr = [2, 3, 4, 7, 11], k = 5. The missing positive integers are 1, 5, 6, 8, 9, 10, 12, ... The 5th missing one is 9.
Why this problem matters
This is a great problem for learning binary search on a derived property rather than on the array values themselves. Instead of searching for a specific value in the array, you compute a "missing count" at each index and binary search on that. This technique shows up in many problems where you need to find a boundary defined by a computed quantity rather than a stored value. If you have worked through Koko Eating Bananas, you already know the "binary search on something other than an array element" pattern. Kth Missing Positive Number is a cleaner, simpler version of the same idea.
The key insight
For any index i, arr[i] - (i + 1) tells you how many positive integers are missing before arr[i]. Think about it: if no values were missing, the value at index i would be i + 1 (since the array starts from positive integers). The difference between the actual value and the expected value is exactly the number of values that got skipped.
Since the array is sorted and strictly increasing, this missing count is non-decreasing as you move right. That monotonic property is exactly what makes binary search possible. You can binary search for the first index where the missing count reaches or exceeds k.
The solution
def findKthPositive(arr, k):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = lo + (hi - lo) // 2
if arr[mid] - (mid + 1) < k:
lo = mid + 1
else:
hi = mid - 1
return lo + k
When the loop ends, lo is the number of array elements that come before the kth missing value. Each of those elements "pushes" the answer up by one position from where it would be if the array were empty. So the answer is simply k + lo. If no array elements come before the kth missing value (e.g., if k = 1 and arr[0] = 5), then lo = 0 and the answer is k itself.
The "binary search on a derived monotonic property" pattern is powerful. Whenever you can compute a non-decreasing quantity from array indices and need to find where that quantity crosses a threshold, binary search applies, even if you are not searching for a value that exists in the array.
Visual walkthrough
Setup: arr = [2, 3, 4, 7, 11], k = 5
For each index i, the missing count is arr[i] - (i + 1). Index 0: 2 - 1 = 1. Index 1: 3 - 2 = 1. Index 2: 4 - 3 = 1. Index 3: 7 - 4 = 3. Index 4: 11 - 5 = 6. We binary search for the first index where the missing count is at least k = 5.
Step 1: lo=0, hi=4, mid=2. arr[2]=4, missing = 4 - 3 = 1. Since 1 < 5, the kth missing is after index 2.
Missing count at mid is 1, which is less than k=5. Move lo = mid + 1 = 3.
Step 2: lo=3, hi=4, mid=3. arr[3]=7, missing = 7 - 4 = 3. Since 3 < 5, the kth missing is after index 3.
Missing count at mid is 3, still less than k=5. Move lo = mid + 1 = 4.
Step 3: lo=4, hi=4, mid=4. arr[4]=11, missing = 11 - 5 = 6. Since 6 >= 5, the kth missing is before or at index 4.
lo == hi, search complete. lo = 4. The answer is k + lo = 5 + 4 = 9.
Result
The binary search converges at lo = 4. The formula k + lo = 5 + 4 = 9 gives us the 5th missing positive integer. Why does this work? At the end, lo equals the number of array elements that come before the answer. Those elements each "shift" the answer up by one, so the kth missing value is k + lo.
Three steps to find the answer on a 5-element array. Binary search makes this O(log n) regardless of the size of the array or the value of k.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Binary search | O(log n) | O(1) |
Time: O(log n). The binary search halves the range each iteration. For an array of length n, this takes at most log2(n) steps. The missing count at each index is computed in O(1) with a single subtraction.
Space: O(1). Only a few variables (lo, hi, mid) are used. No auxiliary data structures.
You could also solve this in O(n) by iterating through the array and counting missing values as you go, but the binary search approach is strictly better for large inputs.
The building blocks
1. Missing count formula: arr[i] - (i + 1)
missing_before = arr[mid] - (mid + 1)
This single expression is the heart of the solution. In a "perfect" sequence with no gaps, position i would hold value i + 1. The difference between the actual value and the expected value counts exactly how many values were skipped. You do not need to enumerate the missing values or store them anywhere.
2. Binary search on a derived property
while lo <= hi:
mid = lo + (hi - lo) // 2
if arr[mid] - (mid + 1) < k:
lo = mid + 1
else:
hi = mid - 1
This is the standard binary search template, but instead of comparing arr[mid] to a target, you compare a computed quantity (the missing count) to k. When the missing count at mid is less than k, the kth missing value must be to the right, so you move lo up. Otherwise, you move hi down. The result is that lo ends up at the first index where the missing count is at least k.
Edge cases
- All missing values come before the array starts:
arr = [5, 6, 7],k = 3. Every value from 1 to 4 is missing. The 3rd missing value is 3. The binary search never moveslopast 0, so the answer is0 + 3 = 3. - All missing values come after the array ends:
arr = [1, 2, 3, 4],k = 2. No values are missing within the array. The binary search pusheslotolen(arr) = 4, giving4 + 2 = 6. k = 1: the smallest missing positive integer. Ifarr[0] > 1, the answer is 1. Otherwise the binary search finds the first gap.- Single element array:
arr = [2],k = 1. Missing count at index 0 is2 - 1 = 1, which equalsk. Sohimoves to-1,lostays at 0, and the answer is0 + 1 = 1. - Large gaps:
arr = [1, 1000000],k = 999998. The missing count at index 1 is1000000 - 2 = 999998. The binary search handles this in two steps regardless of the gap size.
From understanding to recall
The logic here is compact: compute missing counts with a subtraction, binary search for the threshold, return lo + k. You probably see why it works right now. But under interview pressure, the details matter. Is the comparison < k or <= k? Do you return lo + k or lo + k - 1? Do you initialize hi to len(arr) - 1 or len(arr)?
Spaced repetition locks in these details. You write the solution from scratch today, again tomorrow, and again in a few days. After a few rounds, the formula and the boundary conditions are automatic. You spend your interview time thinking about the problem, not second-guessing the template.
Related posts
- Binary Search - The foundational binary search pattern this problem builds on
- Find First and Last Position of Element in Sorted Array - Binary search for boundary conditions
- Koko Eating Bananas - Another binary search on a derived property
CodeBricks breaks problems like Kth Missing Positive Number into atomic building blocks and schedules them for spaced repetition. Instead of re-solving the whole problem each time, you drill the missing count formula and the binary search template separately until both are automatic. Then you combine them effortlessly.