Maximum Gap: Bucket Sort for Linear Time
LeetCode Maximum Gap (problem 164) gives you an unsorted integer array nums. Your task: return the maximum difference between two successive elements in the array's sorted form. If the array has fewer than two elements, return 0. The catch is that your solution must run in O(n) time and O(n) space.
Example: given [3, 6, 9, 1], the sorted form is [1, 3, 6, 9]. The successive differences are 2, 3, and 3. The maximum gap is 3.
The naive approach is to sort the array in O(n log n) and scan for the largest gap. That works but violates the linear time constraint. To hit O(n), you need bucket sort combined with the pigeonhole principle.
The pigeonhole insight
Take n numbers with minimum value lo and maximum value hi. The total range is hi - lo. If you spread n numbers evenly across that range, the gap between consecutive numbers would be exactly (hi - lo) / (n - 1).
The pigeonhole principle tells you that the maximum gap must be at least ceil((hi - lo) / (n - 1)). Why? If every gap were smaller than that, the total span of all gaps would be less than hi - lo, which contradicts the actual range. So at least one gap is at least that large.
This means you can create buckets of size ceil((hi - lo) / (n - 1)). Any two numbers that fall in the same bucket are closer together than the guaranteed minimum gap. So the maximum gap can never occur between two numbers in the same bucket. It must occur between the maximum of one bucket and the minimum of the next non-empty bucket.
This is the key insight that makes the problem linear. You do not need to sort inside the buckets. You only need to track the min and max of each bucket, then scan across buckets to find the largest gap.
The approach
Here is the full algorithm:
- Find
lo(minimum) andhi(maximum) of the array - Compute
bucketSize = ceil((hi - lo) / (n - 1))wherenis the array length - Create
n - 1buckets. For each number, compute its bucket index as(num - lo) // bucketSize - For each bucket, track only the min and max value that landed in it
- Scan through the buckets in order. For each pair of consecutive non-empty buckets, compute the gap as
min(next bucket) - max(previous bucket). Track the largest such gap.
The maximum gap is the largest value found in step 5.
Python solution
def maximum_gap(nums: list[int]) -> int:
n = len(nums)
if n < 2:
return 0
lo, hi = min(nums), max(nums)
if lo == hi:
return 0
bucket_size = max(1, (hi - lo) // (n - 1))
bucket_count = (hi - lo) // bucket_size + 1
bucket_min = [float('inf')] * bucket_count
bucket_max = [float('-inf')] * bucket_count
for num in nums:
idx = (num - lo) // bucket_size
bucket_min[idx] = min(bucket_min[idx], num)
bucket_max[idx] = max(bucket_max[idx], num)
max_gap = 0
prev_max = bucket_max[0]
for i in range(1, bucket_count):
if bucket_min[i] == float('inf'):
continue
max_gap = max(max_gap, bucket_min[i] - prev_max)
prev_max = bucket_max[i]
return max_gap
Visual walkthrough
Let's trace through nums = [3, 6, 9, 1] step by step.
Step 1: Find min and max
Scan once to find min = 1, max = 9, n = 4. Compute bucketSize = ceil((9 - 1) / (4 - 1)) = 3.
Step 2: Place 3 into bucket (3 - 1) / 3 = 0
3 falls in bucket 0. Set min = max = 3.
Step 3: Place 6 into bucket (6 - 1) / 3 = 1
6 falls in bucket 1. Set min = max = 6.
Step 4: Place 9 into bucket (9 - 1) / 3 = 2
9 falls in bucket 2. Set min = max = 9.
Step 5: Place 1 into bucket (1 - 1) / 3 = 0
1 falls in bucket 0. Update min to 1. Bucket 0 now has min=1, max=3.
Step 6: Scan consecutive non-empty buckets for the maximum gap
Gap B0-B1: min(B1) - max(B0) = 6 - 3 = 3. Gap B1-B2: min(B2) - max(B1) = 9 - 6 = 3. Maximum gap = 3.
The answer is 3. Notice that the gap between bucket 0 (max=3) and bucket 1 (min=6) equals 3, and the gap between bucket 1 (max=6) and bucket 2 (min=9) also equals 3. Both are tied for the maximum.
Why you only need min and max per bucket
This is the part that trips people up. Why not store all elements?
Because the bucket size is chosen so that the maximum gap cannot occur within a single bucket. The bucket size equals ceil((hi - lo) / (n - 1)), which is the minimum possible maximum gap. Any two numbers in the same bucket differ by less than bucketSize. So the largest gap must span at least one empty bucket or cross a bucket boundary. That means you only need to compare the max of one non-empty bucket against the min of the next non-empty bucket.
Storing all elements would still be O(n) space, but tracking only min and max makes the logic cleaner and the constant factor smaller.
Think of the buckets as bins on a number line. The pigeonhole principle guarantees that at least one bin-to-bin gap is large enough to be the answer. No intra-bin gap can compete.
Complexity
| Time | Space | |
|---|---|---|
| Sort and scan | O(n log n) | O(1) or O(n) |
| Bucket sort | O(n) | O(n) |
The bucket sort approach makes three linear passes: one to find min/max, one to place elements in buckets, and one to scan buckets. Each is O(n). The space is O(n) for the bucket arrays.
The building blocks
This problem combines two fundamental ideas:
Bucket sort
The core pattern: divide a range into equal-sized intervals, place each element in the matching interval, then process intervals in order. Unlike comparison-based sorting (which has an O(n log n) lower bound), bucket sort exploits the numeric range of the data to achieve linear time. You will see this same idea in Top K Frequent Elements (frequency buckets) and radix sort.
bucket_size = max(1, (hi - lo) // (n - 1))
for num in nums:
idx = (num - lo) // bucket_size
# process bucket idx
Pigeonhole principle
If you have more items than containers, at least one container must hold more than one item. Here the reasoning runs in reverse: if the range is hi - lo and there are n - 1 gaps between n sorted numbers, at least one gap is at least (hi - lo) / (n - 1). This lets you set the bucket size so that the answer is guaranteed to cross a bucket boundary.
The pigeonhole principle shows up in many problems: finding duplicates in a bounded range, proving existence of collisions in hash functions, and bounding worst-case behavior in scheduling problems.
Edge cases
Fewer than 2 elements. Return 0 immediately. There are no successive pairs.
All values are the same. lo == hi, so the gap is 0. Handle this before computing bucket size to avoid division by zero.
Two elements. Only one gap exists: max(nums) - min(nums). The bucket approach still works, but you can also return this directly.
Evenly spaced values. If nums = [1, 4, 7, 10], every gap is 3. The bucket approach produces three buckets, each with one element, and correctly reports 3 as the maximum gap.
Watch out for the bucket_size = 0 edge case when lo == hi. Always guard against this before dividing, or use max(1, ...) when computing bucket size.
From understanding to recall
Maximum Gap is one of those problems where the key insight (pigeonhole principle applied to bucket ranges) does all the work. Once you see why the maximum gap must cross a bucket boundary, the implementation follows naturally. But seeing it once is not enough.
The real challenge is rebuilding this reasoning under time pressure. You need to recall that the bucket size should be ceil((hi - lo) / (n - 1)), that you only track min and max per bucket, and that you scan consecutive non-empty buckets for the answer. These details slip away if you do not practice reconstructing them.
Spaced repetition turns this from "a clever trick I read about" into a pattern you can reconstruct from first principles. You drill the bucket size formula, the placement logic, and the gap scan as separate building blocks. After a few cycles, you do not memorize the solution. You derive it because the underlying reasoning is automatic.
Related posts
- Sort Colors - Another linear-time sorting problem that avoids comparison-based sorting by exploiting constraints on the input values.
- Contains Duplicate - Uses hash-based lookups for O(n) detection, the same "trade space for time" philosophy that bucket sort relies on.