Find All Duplicates in an Array: Index as Hash Map
LeetCode Find All Duplicates in an Array (problem 442) gives you an integer array of length n where 1 <= a[i] <= n. Each element appears either once or twice. Your job is to find all the elements that appear twice, and you need to do it in O(n) time without using extra space beyond the output list.
The constraint 1 <= a[i] <= n is not just a detail. It is the entire solution.
Why this problem matters
Most duplicate-detection problems let you throw everything into a hash set and call it a day. This one explicitly asks for O(1) extra space. That rules out sets, maps, and sorting (which would need O(log n) stack space at minimum). You need to encode "I have seen this value before" directly inside the input array.
The critical observation: since every value falls between 1 and n, each value can serve as a valid index into the array (after subtracting 1). That means you can use the array itself as a hash map where the sign of each element acts as a visited flag.
The key insight
For each value v you encounter, look at position v - 1 in the array. If nums[v - 1] is positive, flip it to negative. This marks that you have seen value v. If nums[v - 1] is already negative, then you have seen v before, which means v is a duplicate.
This works because:
- Values range from 1 to
n, sov - 1is always a valid index (0 ton - 1) - You never lose information, because the absolute value of any element still tells you what was originally there
- Sign acts as a one-bit visited marker at each index
The solution
def find_duplicates(nums):
result = []
for i in range(len(nums)):
val = abs(nums[i])
idx = val - 1
if nums[idx] < 0:
result.append(val)
else:
nums[idx] = -nums[idx]
return result
Let's walk through each line:
val = abs(nums[i]): Take the absolute value because previous iterations may have negated this cell. The original value is what matters.idx = val - 1: Map the value to an index. Value 4 maps to index 3, value 1 maps to index 0, and so on.if nums[idx] < 0: If the element at the target index is already negative, we have visited this value before. It is a duplicate, so add it to the result.else: nums[idx] = -nums[idx]: Otherwise, negate the element at the target index to mark this value as seen.
This technique only works when values are bounded by the array length. The constraint 1 <= a[i] <= n guarantees every value maps to a valid index. Without that guarantee, you could index out of bounds or collide with unrelated positions.
Visual walkthrough
Let's trace through nums = [4, 3, 2, 7, 8, 2, 3, 1] step by step. Watch how each visit negates an element at the corresponding index, and how a second visit to the same index reveals a duplicate.
Start: [4, 3, 2, 7, 8, 2, 3, 1]
Visit nums[0] = 4. Go to index 4-1 = 3. nums[3] = 7 (positive), so negate it.
After marking index 3: nums[3] negated
Visit nums[1] = 3. Go to index 3-1 = 2. nums[2] = 2 (positive), so negate it.
After marking index 2: nums[2] negated
Visit nums[2] = |-2| = 2. Go to index 2-1 = 1. nums[1] = 3 (positive), so negate it.
After marking index 1: nums[1] negated
Visit nums[3] = |-7| = 7. Go to index 7-1 = 6. nums[6] = 3 (positive), so negate it.
After marking index 6: nums[6] negated
Visit nums[4] = 8. Go to index 8-1 = 7. nums[7] = 1 (positive), so negate it.
After marking index 7: nums[7] negated
Visit nums[5] = 2. Go to index 2-1 = 1. nums[1] = -3 (already negative). Duplicate found: 2!
Duplicate found! nums[1] was already negative
Visit nums[6] = |-3| = 3. Go to index 3-1 = 2. nums[2] = -2 (already negative). Duplicate found: 3!
Duplicate found! nums[2] was already negative
Visit nums[7] = |-1| = 1. Go to index 1-1 = 0. nums[0] = 4 (positive), negate it. Done. Result: [2, 3].
After processing all eight elements, we found two duplicates: 2 and 3. Each was detected the moment we tried to negate an index that was already negative.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Index negation | O(n) | O(1) |
Time: One pass through the array. Each element is visited exactly once. Looking up and negating at the target index is O(1). Total: O(n).
Space: The result list holds at most n/2 elements, but the problem statement says the output does not count as extra space. The only additional storage is a few variables (val, idx, i). That is O(1) auxiliary space.
Edge cases
No duplicates: nums = [1, 2, 3, 4]. Every negation hits a positive cell. The result list stays empty.
All duplicates: nums = [2, 2, 1, 1]. Both 2 and 1 are detected on their second occurrence. Result: [2, 1].
Single element: nums = [1]. Only one element, so no duplicate is possible. Result: [].
Maximum duplicates: nums = [1, 1, 2, 2, 3, 3, ..., n/2, n/2]. Every value appears exactly twice. The algorithm detects each one on the second pass and returns n/2 results.
Already negative after earlier marking: When you reach index 2 and nums[2] has been negated by an earlier step, you take abs(nums[2]) to recover the original value. The absolute value call is what makes the entire scheme work, because it separates "what the original value was" from "whether this index has been visited."
Do not forget the abs() call when reading values. Without it, a negated cell would give you a negative index, causing incorrect behavior or an out-of-bounds error.
The building blocks
Index as hash map
When array values are bounded by the array size, you can use the values themselves as keys into the array. Instead of allocating a separate hash map, you encode presence or absence directly in the array, typically using sign, placement, or value modification. This pattern shows up in:
- First Missing Positive (LeetCode 41) uses cyclic sort to place each value at its "home" index, then scans for the first gap.
- Find the Duplicate Number (LeetCode 287) treats array values as linked list pointers and uses Floyd's cycle detection to find the duplicate.
- Missing Number (LeetCode 268) uses XOR or index summation to identify which value is absent.
In-place marking
The sign-negation trick is a specific form of in-place marking. You modify the array to record state (visited or not visited) without allocating extra memory. The same idea appears whenever you need to track membership across a bounded set of values and the problem forbids extra space.
From understanding to recall
The algorithm is short, but recalling it under pressure requires practice. The core loop is four lines: take the absolute value, compute the index, check the sign, and either mark or record. Write it from scratch. Then do it again in two days. Then again next week.
The pattern to internalize is: "value maps to index, sign encodes visited." Once that principle is second nature, you can reconstruct the solution for this problem and adapt it to related problems without memorizing code line by line.
Related posts
- Find the Duplicate Number - Same family of problems where values map to indices, but this one uses Floyd's cycle detection instead of sign negation.
- First Missing Positive - Uses cyclic sort (another index-as-hash technique) to solve the harder variant where you find what is missing rather than what is duplicated.
- Missing Number - A simpler version where exactly one value is absent, solvable with XOR or Gauss summation.