Set Mismatch: Finding the Duplicate and Missing Number

May 21, 20265 min read

leetcodeproblemeasyarrayshash-mapbit-manipulation

LeetCode 645: Set Mismatch gives you an array nums of length n that was supposed to contain the numbers 1 through n. But something went wrong: one number got duplicated, which means another number went missing. Your job is to find both the duplicated number and the missing number, and return them as a list [duplicate, missing].

This is a small twist on the classic "find the missing number" and "find the duplicate" problems. Instead of just one task, you need to do both at the same time.

The array should contain [1, 2, 3, 4]. The number 2 appears twice (duplicate) and the number 3 is absent (missing). Return [2, 3].

Why this problem matters

Set Mismatch sits at the intersection of several fundamental patterns. You can solve it with a hash map, with math, with bit manipulation, or with in-place array marking. That makes it an excellent problem for building your toolkit, because you get to practice multiple approaches on a single question. The skills you develop here transfer directly to harder problems like Find All Duplicates in an Array and First Missing Positive.

The core question is always the same: "given a bounded range of integers, how do I detect what is present, what is missing, and what is repeated?" Once you internalize that framing, a whole family of problems opens up.

The approach

The cleanest solution uses a frequency count. Allocate an array of size n + 1, scan through nums, and count how many times each value appears. Then scan the count array: the value with count 2 is the duplicate, and the value with count 0 is the missing number.

def find_error_nums(nums):
    n = len(nums)
    count = [0] * (n + 1)
    for num in nums:
        count[num] += 1
    duplicate = missing = 0
    for i in range(1, n + 1):
        if count[i] == 2:
            duplicate = i
        elif count[i] == 0:
            missing = i
    return [duplicate, missing]

You can also solve this with pure math. The expected sum of 1 to n is n * (n + 1) // 2. The expected sum of squares is n * (n + 1) * (2 * n + 1) // 6. Comparing these with the actual sum and sum of squares gives you two equations and two unknowns. This runs in O(1) space, but the frequency count version is easier to get right in an interview.

Practice this problem with spaced repetition

Step-by-step walkthrough

Let's trace through nums = [1, 2, 2, 4]. The array has length 4, so it should contain [1, 2, 3, 4].

Build the count array: scan each element and increment its count. After processing all four elements, count = [0, 1, 2, 0, 1] (index 0 is unused).
Scan from index 1 to 4. At index 2, the count is 2, so the duplicate is 2. At index 3, the count is 0, so the missing number is 3.
Return [2, 3].

That is the whole algorithm. Two linear scans, no sorting, no nested loops.

Step 1: Build a frequency count

Create an array of size n+1 initialized to zero. Walk through nums, incrementing count[num] for each element.

Step 2: Scan count for the duplicate (count = 2)

The value whose count equals 2 is the duplicate number.

Step 3: Scan count for the missing (count = 0)

The index (from 1 to n) whose count is still 0 is the missing number.

Step 4: Return the result

The answer is [duplicate, missing] = [2, 3]. Two passes through the data, O(n) time total.

Alternative: In-place marking (O(1) space)

By negating array values in-place, you avoid allocating a separate count array. The index that stays positive reveals the missing number.

Complexity analysis

Approach	Time	Space
Frequency count	O(n)	O(n)
In-place marking	O(n)	O(1)
Math (sum + sum of squares)	O(n)	O(1)

The frequency count approach uses O(n) extra space for the count array. The in-place marking approach avoids that by negating elements in the original array. The math approach uses constant space but involves more careful arithmetic.

Build your pattern library with CodeBricks

Edge cases to watch for

Duplicate is 1 or n. The duplicate can be at either boundary of the range. Make sure your loop checks all values from 1 through n, not 0 through n - 1.
Missing is 1 or n. Similarly, the missing number can be the smallest or largest in the expected range. The frequency count handles both naturally.
Array of length 2. The smallest valid input is nums = [1, 1] (duplicate 1, missing 2) or nums = [2, 2] (duplicate 2, missing 1). Both work without special handling.
Large n. With n up to 10^4, the O(n) solution runs easily within time limits. An O(n^2) brute force approach would be too slow.

The building blocks

Frequency counting

Allocate a count array indexed by value, scan the input, and increment. This is the same pattern used in counting sort, anagram detection, and bucket-based problems. The key requirement is that the value range is bounded and small enough to fit in memory.

count = [0] * (n + 1)
for num in nums:
    count[num] += 1

Index-as-hash (in-place marking)

When values range from 1 to n, each value maps to a valid array index. You can use sign negation to mark "I have seen this value" without allocating extra space. This pattern is the backbone of Find All Duplicates in an Array and related problems.

for i in range(len(nums)):
    val = abs(nums[i])
    if nums[val - 1] < 0:
        duplicate = val
    else:
        nums[val - 1] = -nums[val - 1]

After the scan, the index that is still positive tells you which value never appeared, which is the missing number.

From understanding to recall

Set Mismatch combines two sub-problems you have likely seen before: finding a duplicate and finding a missing number. The real skill is recognizing that they can be solved simultaneously with a single scan (or two). In an interview, you want the frequency count approach to come to mind immediately when you see "numbers from 1 to n with one duplicate and one missing."

Practice writing the solution from scratch at increasing intervals. After a few rounds, the pattern becomes automatic. You will not need to re-derive the logic every time you encounter a bounded-range detection problem.

Missing Number - The simpler variant where one number is absent from the range. Solvable with XOR or Gauss summation. Set Mismatch adds the duplicate detection on top.
Find All Duplicates in an Array - Uses the same index-as-hash and sign-negation technique to find all duplicated values in O(1) space.
Single Number - The classic XOR cancellation problem. A different bit manipulation approach for detecting the "odd one out."

Try CodeBricks free