Contains Duplicate: Three Solutions Compared
LeetCode Contains Duplicate is one of the first problems most people solve when they start grinding. It looks almost too simple: given an integer array nums, return True if any value appears at least twice. Return False if every element is distinct.
But "simple" does not mean "nothing to learn." This problem is a perfect case study in how the same question can be solved three completely different ways, with wildly different performance. And the optimal solution introduces a building block that shows up in dozens of harder problems later.
Let's walk through all three approaches.
Approach 1: Brute force (check every pair)
The most obvious approach. For each element, compare it against every other element. If any two match, return True.
def contains_duplicate(nums: list[int]) -> bool:
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] == nums[j]:
return True
return False
This works, but it is painfully slow. Two nested loops means O(n^2) time. For an array of 100,000 elements, that is up to 5 billion comparisons. LeetCode will time out on this.
Brute force is fine for understanding the problem, but never submit an O(n^2) solution when the input can be up to 10^5 elements. Interviewers want to see that you can do better.
Approach 2: Sort first, then scan
If you sort the array, duplicates will be adjacent. So you just walk through and check if any two neighbors are equal.
def contains_duplicate(nums: list[int]) -> bool:
nums.sort()
for i in range(1, len(nums)):
if nums[i] == nums[i - 1]:
return True
return False
Sorting costs O(n log n) and the scan is O(n), so the total is O(n log n). That is a big improvement over brute force. But there is a catch: sorting modifies the input array. Some interviewers care about that. And we can still do better on time.
Approach 3: The set approach (optimal)
Here is the key insight. Instead of comparing elements to each other, you keep a set of values you have already seen. For each new element, check if it is already in the set. If yes, you found a duplicate. If no, add it and move on.
def contains_duplicate(nums: list[int]) -> bool:
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return False
One pass through the array. Each set lookup is O(1) on average. Total time: O(n). Total space: O(n) for the set.
That is the winner.
There is an even shorter version using Python's built-in set:
def contains_duplicate(nums: list[int]) -> bool:
return len(nums) != len(set(nums))
This one-liner builds a set from the entire array and checks if the size changed. Same O(n) time and space. Both are valid in interviews, but the explicit loop version is better for showing your thought process.
Visual walkthrough: the set approach step by step
Let's trace through [1, 3, 5, 3, 7] to see exactly how the set solution works.
Step 1: Look at nums[0] = 1. Is 1 in our set?
Set is empty. Add 1 to the set.
Step 2: Look at nums[1] = 3. Is 3 in our set?
3 is not in {1}. Add 3 to the set.
Step 3: Look at nums[2] = 5. Is 5 in our set?
5 is not in {1, 3}. Add 5 to the set.
Step 4: Look at nums[3] = 3. Is 3 in our set?
3 IS in the set! Return True. Duplicate found.
That is the entire algorithm. At each index, you do one O(1) lookup and one O(1) insert. The moment you find a match, you stop early. In the worst case (no duplicates), you scan the whole array once.
Complexity comparison
| Approach | Time | Space | Modifies input? |
|---|---|---|---|
| Brute force (nested loops) | O(n^2) | O(1) | No |
| Sort then scan | O(n log n) | O(1)* | Yes |
| Hash set lookup | O(n) | O(n) | No |
*Sorting can be O(n) space if you use a non-in-place sort, but Python's sort() is in-place.
The set approach wins on time. It uses more memory, but for interview purposes, O(n) space is almost always acceptable when it buys you a linear time solution. This is the classic time-space tradeoff, and interviewers expect you to make it.
If an interviewer asks "can you do it in O(1) space?", the sort approach is your answer. But always mention the O(n) set solution first since it is the one they want to see you reach.
Why the set approach matters beyond this problem
Contains Duplicate is easy, but the pattern it teaches is not trivial. The "seen set" pattern is a building block that recurs constantly:
- Two Sum: instead of a set, use a dict to store values and indices. Same "have I seen this before?" check.
- Longest Substring Without Repeating Characters: use a set to track characters in the current window. When you hit a duplicate, shrink.
- Happy Number: use a set to track numbers you have computed. If you see one again, you are in a cycle.
- Valid Sudoku: use sets to track digits seen in each row, column, and 3x3 box.
The core idea is always the same: trade O(n) memory for O(1) lookups, and check membership before inserting.
The building blocks
This problem breaks down into two reusable pieces that CodeBricks drills independently:
1. Seen-set tracking
The act of maintaining a set that grows as you process elements. You initialize it empty, and on each iteration you add the current value after checking it. This micro-pattern is the backbone of duplicate detection, cycle detection, and membership tracking across dozens of problems.
seen = set()
for item in collection:
# ... do something with item ...
seen.add(item)
2. Check-set membership
The O(1) lookup that makes the whole thing work. Before adding a new element, you ask "is this already in my set?" In Python, that is if item in seen. This is the same operation whether you are checking for duplicates, looking for a complement in two-sum, or detecting cycles.
if item in seen:
# found a match / duplicate / cycle
return True
These two pieces combine here in the simplest possible way. But they also combine with sliding windows, hash map frequency counting, and two-pointer techniques in harder problems. Learning them in isolation means you will recognize them instantly when they show up in a more complex context.
Common mistakes
A few things to watch out for, even on a problem this straightforward:
1. Using a list instead of a set for lookups. Checking if num in my_list is O(n), not O(1). This turns your "optimal" solution into O(n^2) in disguise. Always use a set (or dict) for membership checks.
2. Forgetting that the sort approach modifies the input. If the problem says "do not modify the array" or if other code depends on the original order, sorting is not an option. The set approach avoids this entirely.
3. Overcomplicating it. Some people reach for Counter or defaultdict when a plain set is all you need. You do not need to count how many times each element appears. You just need to know if you have seen it before. A set is the right tool.
A set in Python is backed by a hash table, the same data structure behind dicts. The in operator on a set is O(1) average-case. That single fact is what makes this entire solution work. If you only remember one thing about this problem, remember that set lookup is O(1).
When to use this pattern in interviews
Look for these signals:
- The problem asks about duplicates, repeats, or uniqueness
- You need to detect cycles (repeated states)
- The brute force involves an inner loop that is just searching for something you have already processed
- The constraint mentions large input sizes (10^4 or more), ruling out O(n^2)
If you see any of these, the "seen set" pattern should be your first instinct.
Related posts
- Hash Map Patterns - Contains Duplicate is the simplest hash set problem. Hash maps extend the same idea to store values alongside keys.
- The Sliding Window Pattern - Many sliding window problems use a set to track what is currently in the window
- Stack-Based Patterns - Another fundamental pattern family worth drilling alongside set-based problems
Recognizing that Contains Duplicate is a "seen set" problem takes about two seconds once you have internalized the pattern. But getting to that point requires practice, not just reading. The first time you solve it, the set approach feels obvious. A week later, when you see a harder problem that uses the same building block, you want the pattern to fire automatically.
That is what spaced repetition is for. You drill the seen-set tracking and set membership check as individual building blocks, typing them from scratch at increasing intervals. After a few cycles, the code is in your fingers, not just your head.
New to algorithms? Understand why the hash set approach is O(n) with our Big O Notation Guide.