Longest Consecutive Sequence: The Set Trick
LeetCode Longest Consecutive Sequence asks you to find the length of the longest consecutive elements sequence in an unsorted array. The twist: you have to do it in O(n) time.
That "O(n)" requirement is the whole point of this problem. Without it, you would just sort and scan. With it, you need a smarter approach. And the trick that makes it work is surprisingly simple.
The problem
Given an unsorted array of integers nums, return the length of the longest consecutive elements sequence.
For example, given [100, 4, 200, 1, 3, 2], the longest consecutive sequence is [1, 2, 3, 4], so you return 4.
The elements do not need to be adjacent in the array. They just need to form a consecutive sequence when put in order. The array can have up to 10^5 elements, and the values can range from -10^9 to 10^9.
The sorting approach (and why it is not enough)
The obvious first instinct: sort the array, then walk through it counting consecutive runs.
def longest_consecutive(nums: list[int]) -> int:
if not nums:
return 0
nums.sort()
longest = 1
current = 1
for i in range(1, len(nums)):
if nums[i] == nums[i - 1]:
continue # skip duplicates
elif nums[i] == nums[i - 1] + 1:
current += 1
else:
current = 1
longest = max(longest, current)
return longest
This works and it is clean. But sorting costs O(n log n), which violates the O(n) requirement. In a real interview, if the problem says "O(n) time," they expect you to get there. The sorting solution is a fine starting point to mention, but it is not the answer they want.
The O(n) constraint is not decoration. It is the signal that you need a hash-based approach instead of sorting.
The set trick: O(n) time
Here is the key insight. If you put all the numbers in a set, you can do O(1) lookups. And you only need to count forward from sequence starts, not from every element.
How do you know if a number is the start of a sequence? Simple: check if n - 1 is in the set. If it is not, then n is the beginning of a new sequence. If it is, then n is somewhere in the middle and some earlier number will kick off this sequence.
def longest_consecutive(nums: list[int]) -> int:
num_set = set(nums)
longest = 0
for n in num_set:
# Only start counting from sequence beginnings
if n - 1 not in num_set:
length = 1
while n + length in num_set:
length += 1
longest = max(longest, length)
return longest
That is the whole thing. Build a set, find starts, count forward.
Visual walkthrough
Let's trace through [100, 4, 200, 1, 3, 2] step by step.
Step 1: Convert array to a set for O(1) lookups.
num_set = {1, 2, 3, 4, 100, 200}. Now we scan for sequence starts.
Step 2: Check 100. Is 99 in the set? No. So 100 is a sequence start.
Count forward: 101 in set? No. Sequence length = 1.
Step 3: Check 4. Is 3 in the set? Yes. Skip it (not a start).
4 is not the start of its sequence. We will find it when we start from 1.
Step 4: Check 200. Is 199 in the set? No. So 200 is a sequence start.
Count forward: 201 in set? No. Sequence length = 1.
Step 5: Check 1. Is 0 in the set? No. So 1 is a sequence start!
This is a sequence start. Now count forward from 1...
Step 6: Count forward from 1: is 2 in set? Yes. 3? Yes. 4? Yes. 5? No.
Sequence 1,2,3,4 has length 4. That is the longest. Done!
Notice how we skip 4, 3, and 2 because they have a predecessor in the set. We only do the expensive "count forward" work starting from actual sequence beginnings. This is what keeps the total work at O(n).
Why this is O(n) and not O(n^2)
This is the part that trips people up. There is a while loop inside a for loop. That looks like O(n^2). But it is not.
The for loop iterates over every number in the set. That is n iterations. But the inner while loop only fires for numbers that are sequence starts, and it counts forward only through elements in that sequence.
Here is the critical observation: every element in the set participates in at most one "count forward" pass. The number 3 is counted when we start from 1, but we never start a separate counting pass from 3 because 2 is in the set and we skip it.
Total work across all iterations of both loops combined: each of the n elements is visited a constant number of times. Once in the outer for loop, and at most once by some inner while loop. That gives us O(n) total.
When you see nested loops and wonder about the complexity, ask: "How many times is each element touched across ALL iterations?" If each element is only touched O(1) times total, the overall complexity is O(n) regardless of the nesting.
This same amortized analysis shows up in many problems. Sliding window has two nested loops but is O(n) because the left pointer only moves forward. Stack problems iterate and push/pop but each element is pushed and popped at most once. The pattern is the same: total work across all iterations matters more than the loop nesting depth.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Sort then scan | O(n log n) | O(1)* |
| Hash set trick | O(n) | O(n) |
*In-place sort. Python's sorted() would use O(n) space.
The set approach trades O(n) extra memory for O(n) time. That is almost always a worthwhile tradeoff in interviews. If an interviewer asks about the space usage, mention the sort approach as a fallback and explain why you prefer the set solution.
Edge cases
A few things to handle:
- Empty array: return 0. The
forloop simply never executes. - All duplicates:
[1, 1, 1, 1]should return 1. Converting to a set handles this automatically since duplicates collapse. - Negative numbers:
[-3, -2, -1, 0]is a valid consecutive sequence of length 4. The algorithm handles this without modification because it only checksn - 1andn + length, which work the same for negative values. - Single element:
[7]returns 1. The number 7 has no predecessor (6 is not in the set), so it is a sequence start, and counting forward finds nothing. Length = 1.
The building blocks
This problem breaks down into two reusable pieces that CodeBricks drills independently:
1. Check-set membership
The O(1) lookup at the heart of this solution. You ask "is this value in my set?" before deciding what to do. This is the same operation whether you are checking for duplicates, looking for a complement in Two Sum, or detecting sequence boundaries here.
if n - 1 not in num_set:
# n is a sequence start
This single check is what separates the O(n) solution from an O(n^2) one. Without it, you would try to count forward from every element, doing redundant work. With it, you only count from sequence starts.
2. Count-forward scan
Once you identify a start point, you count consecutive elements forward using set lookups:
length = 1
while n + length in num_set:
length += 1
This micro-pattern of "start from a known position and extend while a condition holds" shows up in many forms. In sliding window problems, you extend the right pointer. In interval problems, you merge overlapping ranges. The core idea is the same: start from a boundary and grow.
These two pieces combine to make the O(n) solution possible. The check-set membership identifies where to start work. The count-forward scan does the work. Together, they ensure you never do redundant computation.
Common mistakes
1. Starting a count from every element. If you skip the n - 1 not in num_set check and count forward from every number, your solution degrades to O(n^2) in the worst case. Imagine the array [1, 2, 3, ..., 100000]. Without the start check, you count forward 100000 times from element 1, 99999 times from element 2, and so on. That is quadratic.
2. Iterating over the original array instead of the set. If the array has duplicates like [1, 1, 2, 2, 3], iterating over the array means you process duplicates multiple times. Iterate over num_set instead.
3. Forgetting the empty array check. If nums is empty, returning 0 early avoids an unnecessary set construction. Some solutions skip this and it still works (the for loop produces nothing), but it is good practice to handle it explicitly.
The n - 1 not in num_set check is the entire trick. It is what turns a brute-force O(n^2) idea into an O(n) algorithm. If you remember nothing else from this problem, remember that check.
When to use this pattern
Look for these signals:
- The problem involves consecutive or sequential elements in an unsorted collection
- You need to find connected components in a 1D value space
- The brute force involves sorting, but the problem demands better than O(n log n)
- You need to group elements that are "neighbors" by value rather than by position
This exact pattern also shows up in problems like Longest Consecutive Path in a matrix (where you use a set of coordinates instead of values) and in union-find problems where you need to group adjacent values.
Related posts
- Contains Duplicate - The simplest "put things in a set and check membership" problem. Master this first.
- Hash Map Patterns - The broader family of hash-based techniques that this problem belongs to.
- The Sliding Window Pattern - Another pattern where nested loops are actually O(n) thanks to amortized analysis.
The set trick for Longest Consecutive Sequence is one of those things that feels obvious after you see it. But "obvious after seeing it" and "obvious during an interview" are very different things. The gap between them is practice.
CodeBricks breaks this problem into the check-set-membership and count-forward-scan building blocks, then drills them independently with spaced repetition. After a few rounds, the pattern fires automatically when you see "consecutive" and "O(n)" in the same sentence.