Set Intersection Size At Least Two: Greedy Interval Covering
You are given a list of closed intervals, and you need to find the smallest set of integers that shares at least two elements with every interval. This is LeetCode 757: Set Intersection Size At Least Two, a hard greedy problem that becomes surprisingly clean once you see the right sorting strategy.
The key insight is to sort intervals by their right endpoint ascending (breaking ties by left endpoint descending), then greedily pick points from the right end of each interval. By processing the tightest intervals first, the points you choose have the best chance of overlapping with future intervals.
Problem statement
Given a 2D integer array intervals where intervals[i] = [start_i, end_i] represents all integers from start_i to end_i inclusive, find the minimum size of a set S such that for every interval, the intersection of S with the integers in that interval contains at least 2 elements.
In other words, for every [a, b] in the input, the set S must contain at least two integers x and y where a <= x <= b and a <= y <= b.
Approach: greedy with endpoint sorting
The algorithm has four parts:
-
Sort by right endpoint ascending. For ties (same right endpoint), sort by left endpoint descending. This ensures you process the tightest (shortest) intervals first. When two intervals share the same right endpoint, the one with the larger left endpoint is more constrained, so you want to handle it first.
-
Track the two largest chosen points. Maintain two variables,
p1andp2, representing the two largest points currently in your set (withp1 < p2). These are the only points that matter when checking coverage of the next interval, because the intervals are sorted by right endpoint. -
For each interval, count how many of
p1andp2fall inside it. Sincep2 >= p1and the interval's right endpoint is at least as large as all previous right endpoints, you only need to check whetherp1andp2are>= startof the current interval. -
Add new points as needed:
- If 0 points are covered, add
end - 1andend. - If 1 point is covered, add
end. - If 2 are already covered, skip the interval.
- If 0 points are covered, add
def intersectionSizeTwo(intervals):
intervals.sort(key=lambda x: (x[1], -x[0]))
res = []
p1, p2 = -1, -1
for start, end in intervals:
if start > p2:
p1 = end - 1
p2 = end
res.append(p1)
res.append(p2)
elif start > p1:
p1 = p2
p2 = end
res.append(p2)
return len(res)
Visual walkthrough
Step 1: Sort intervals by end ascending, then by start descending for ties.
No points chosen yet, so 0 are covered. Add end - 1 = 2 and end = 3. Sorted order: [1,3], [1,4], [2,5], [3,5].
Step 2: Process [1, 4].
Both 2 and 3 fall within [1, 4]. Already 2 points covered, so skip this interval.
Step 3: Process [2, 5].
Both 2 and 3 fall within [2, 5]. Already 2 points covered, so skip this interval.
Step 4: Process [3, 5].
Only 3 falls within [3, 5] (2 is out of range). 1 point covered, need 1 more. Add end = 5. Final set size = 3.
After processing all four intervals, the set contains 3 points: (2, 3, 5). Every interval has at least 2 points from S inside it. Notice how sorting by right endpoint meant the first interval [1, 3] forced us to pick 2 and 3, which happened to cover [1, 4] and [2, 5] for free. Only [3, 5] needed an extra point.
Why does sorting by right endpoint work? By processing the interval that ends earliest first, you are forced to pick points near that tight boundary. Those points are as far left as possible on the number line, which maximizes the chance they also fall inside later intervals (which all end at the same position or further right). This is the same principle behind the classic activity selection problem.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O(n log n) for sorting, plus O(n) for the greedy pass |
| Space | O(n) for the result set in the worst case, O(1) extra if you only track the two largest points |
Building blocks
1. Interval sorting by endpoint
Sorting intervals by their right endpoint is the foundation of many greedy interval problems. The idea is that the interval ending soonest is the most constrained, so handling it first produces the best global outcome.
intervals.sort(key=lambda x: (x[1], -x[0]))
The secondary sort by left endpoint descending handles ties: among intervals ending at the same point, the narrower one is more constrained and should be processed first. If you skip the tie-breaker, you might pick points that satisfy a wider interval but miss the narrower one.
2. Greedy point selection with coverage tracking
Once intervals are sorted, you walk through them one by one and maintain just enough state to decide whether new points are needed. Here, that state is the two largest chosen points p1 and p2.
p1, p2 = -1, -1
for start, end in intervals:
if start > p2:
p1 = end - 1
p2 = end
elif start > p1:
p1 = p2
p2 = end
The reason you only need to track the two largest is that all future intervals have end >= current end. So the only question is whether the current interval's start is small enough to capture p1 and p2. Points smaller than p1 are irrelevant because if p1 is not in range, nothing smaller would be either.
Edge cases
- Single interval
[[1, 5]]: you must pick at least 2 points. The answer is always 2 (pickend - 1andend). - All intervals identical
[[2, 4], [2, 4], [2, 4]]: the first interval forces you to pick 3 and 4. All others are already covered. Answer is 2. - Non-overlapping intervals
[[1, 2], [4, 5], [7, 8]]: no overlap at all, so each interval needs its own 2 points. Answer is 6. - Intervals of length 1 (containing only 2 integers) like
[[3, 4]]: you must pick both 3 and 4. There is no room to choose, so every such interval adds exactly 2 points unless they overlap with existing ones. - Large overlap
[[1, 10], [2, 9], [3, 8], [4, 7]]: the innermost interval [4, 7] is processed first (ends at 7, starts at 4). Picking 6 and 7 covers all four intervals since they all contain 6 and 7. Answer is 2.
From understanding to recall
You have read through the greedy sorting approach and it makes sense. Sort by end, track two points, update as needed. But the details matter under pressure: the tie-breaking rule (-x[0]), checking start > p2 versus start > p1, and remembering to shift p1 = p2 before assigning p2 = end. These are the kind of details that slip away after a few days.
Spaced repetition fixes that. You write the algorithm from scratch at increasing intervals until the pattern is automatic. After a few rounds, you see "minimum covering set with intersection constraints" and the sorting plus greedy template flows out without hesitation.
The greedy interval covering pattern is one of roughly 60 reusable building blocks that cover hundreds of LeetCode problems. Learning them individually and drilling them with spaced repetition is far more effective than grinding random problems and hoping they stick.
Related posts
- Merge Intervals - Sorting intervals by start point and merging overlaps, the foundational interval processing technique
- Non-overlapping Intervals - Greedy interval scheduling to minimize removals, using the same sort-by-endpoint strategy
- Minimum Number of Arrows to Burst Balloons - Greedy point covering where each interval needs at least 1 point instead of 2