Skip to content
← All posts

Find the Duplicate Number: Floyd's Cycle Detection

6 min read
leetcodeproblemmediumarraystwo-pointersbinary-search

Find the Duplicate Number is LeetCode #287, and it is one of the most elegant problems on the platform. You are given an array of n + 1 integers where each value is in the range [1, n]. There is exactly one duplicate. Your job is to find it. The catch: you cannot modify the array and you must use only O(1) extra space. That rules out sorting and hash sets. What remains is a beautiful application of Floyd's cycle detection algorithm.

Array1[0]3[1]4[2]2[3]2[4]Implicit linked list (index chain)0val=11val=33val=22val=44val=2cycleentry = 2
Array [1,3,4,2,2] treated as a linked list. Following index pointers creates a cycle. The cycle entrance (index 2) is the duplicate value.

The insight: array as linked list

The trick that unlocks this problem is a change in perspective. Instead of looking at the array as a list of numbers, treat each value as a pointer. Index i points to index nums[i]. Since every value is between 1 and n, and the array has n + 1 slots (indices 0 through n), every pointer is valid. Index 0 is the entry point because no value points back to it (values range from 1 to n, never 0).

Now think about what a duplicate means in this model. If two indices both contain the value 2, then two arrows point to index 2. That is exactly what happens in a linked list with a cycle: the cycle entrance node has two predecessors. The duplicate value is the cycle entrance.

This reframing turns "find the duplicate" into "find the cycle entrance in an implicit linked list." And that is a problem you already know how to solve.

Floyd's tortoise and hare

Floyd's algorithm has two phases:

Phase 1 uses two pointers. The slow pointer follows one link at a time (slow = nums[slow]). The fast pointer follows two links at a time (fast = nums[nums[fast]]). Both start at index 0. Since a cycle exists, they are guaranteed to meet somewhere inside the cycle. The gap between them shrinks by exactly 1 each step, so fast always catches slow.

Phase 2 finds the cycle entrance. Reset one pointer back to 0. Now advance both pointers one step at a time. The index where they meet is the cycle entrance, which is the duplicate value.

def findDuplicate(nums: list[int]) -> int:
    slow = fast = 0
    while True:
        slow = nums[slow]
        fast = nums[nums[fast]]
        if slow == fast:
            break
    slow = 0
    while slow != fast:
        slow = nums[slow]
        fast = nums[fast]
    return slow

Step-by-step walkthrough

Let's trace through the array [1, 3, 4, 2, 2]. Following the index pointers creates the chain 0 -> 1 -> 3 -> 2 -> 4 -> 2 -> 4 ... with a cycle between indices 2 and 4. Watch slow (green) and fast (amber) as they move through the implicit linked list.

Phase 1: Find where slow and fast meet (fast moves 2x)

Start: Both pointers at index 0

0val=11val=33val=22val=44val=2slowfast

slow moves 1 step per turn (follows one pointer). fast moves 2 steps per turn (follows two pointers).

Step 1: slow to index 1, fast to index 3

0val=11val=33val=22val=44val=2slowfast

slow = nums[0] = 1. fast = nums[nums[0]] = nums[1] = 3. No match yet.

Step 2: slow to index 3, fast to index 4

0val=11val=33val=22val=44val=2slowfast

slow = nums[1] = 3. fast = nums[nums[3]] = nums[2] = 4. Still no match.

Step 3: slow to index 2, fast to index 4

0val=11val=33val=22val=44val=2slowfast

slow = nums[3] = 2. fast = nums[nums[4]] = nums[2] = 4. Not yet.

Step 4: slow to index 4, fast to index 4. They meet!

0val=11val=33val=22val=44val=2slowfast

slow = nums[2] = 4. fast = nums[nums[4]] = nums[2] = 4. Both at index 4. Phase 1 complete.

Phase 2: Find the cycle entrance (both move 1 step)

Reset: slow back to index 0, fast stays at index 4

0val=11val=33val=22val=44val=2slowfast

Reset slow to the start (index 0). fast stays at the meeting point (index 4). Now both move 1 step at a time.

Step 1: slow to index 1, fast to index 2

0val=11val=33val=22val=44val=2slowfast

slow = nums[0] = 1. fast = nums[4] = 2. No match.

Step 2: slow to index 3, fast to index 4

0val=11val=33val=22val=44val=2slowfast

slow = nums[1] = 3. fast = nums[2] = 4. No match.

Step 3: slow to index 2, fast to index 2. They meet at the entrance!

0val=11val=33val=22val=44val=2slowfast

slow = nums[3] = 2. fast = nums[4] = 2. Both land on index 2. The duplicate is 2!

Phase 1 took 4 steps to find the meeting point at index 4. Phase 2 took 3 steps: slow walked from index 0 through the chain, fast walked from index 4 through the cycle, and they both landed on index 2. That is the cycle entrance, and nums[2] = 4... wait, index 2 is where both pointers meet, and since the function returns slow (which equals 2), the duplicate value is 2. The algorithm returns the index that is the cycle entrance, and that index value is the duplicate.

Complexity analysis

ApproachTimeSpace
Floyd's cycle detectionO(n)O(1)
Hash set (not allowed)O(n)O(n)
Sorting (not allowed)O(n log n)O(1) or O(n)

Floyd's algorithm is optimal for this problem. Phase 1 runs in O(n) because the fast pointer closes the gap by 1 each step, so they meet within one full cycle. Phase 2 also runs in O(n) because the distance from the start to the cycle entrance is at most n. Together, that is O(n) total with no extra memory beyond two pointer variables.

The hash set approach is correct but violates the O(1) space constraint. Sorting would work but modifies the array, which is also not allowed. Floyd's is the only approach that satisfies both constraints.

The building blocks

Array-as-linked-list mapping

The core insight is treating array values as next pointers. When nums[i] tells you the next index to visit, the array becomes an implicit linked list. Index 0 is a natural starting point because no value in the range [1, n] points to it, making it the "head" of the list. A duplicate value means two indices share the same target, which creates a node with two incoming edges, and that creates a cycle.

This mapping technique is not common, but it appears in a handful of problems. Recognizing it requires noticing that the value range matches the index range, which is your signal that values can serve as pointers.

Floyd's two-phase cycle detection

Phase 1 detects the cycle using the speed difference between slow and fast pointers. Phase 2 exploits a mathematical property: the distance from the start of the list to the cycle entrance equals the distance from the meeting point back around to the cycle entrance (modulo the cycle length). By resetting one pointer to the start and advancing both at the same speed, they converge at exactly the cycle entrance.

If you have solved Linked List Cycle II, you already know this algorithm. The only difference here is that the linked list is implicit (stored as an array) rather than explicit (stored as node objects with next pointers).

Edge cases

  • Duplicate at position 1 and n. The cycle might be very short or very long. Floyd's algorithm handles both because the speed difference guarantees convergence regardless of cycle length.
  • All elements the same. For example, [2, 2, 2, 2, 2]. Every index points to index 2, which points to itself. Phase 1 detects the self-loop quickly. Phase 2 finds index 2 as the entrance. Works correctly.
  • Duplicate appears more than twice. The problem guarantees exactly one number is repeated, but it may appear more than twice. Multiple edges pointing to the same node still create a cycle entrance at that node.
  • Large arrays. The algorithm visits each node at most a constant number of times per phase, so performance scales linearly even for arrays with millions of elements.

The problem says there is exactly one duplicate number, but it may be repeated more than once. Floyd's algorithm does not care how many times it appears. It only needs the cycle to exist, and any duplicate in the pointer structure guarantees one.

From understanding to recall

Floyd's cycle detection applied to an array feels clever the first time you see it. The hard part is not the algorithm itself, but the insight that the array can be treated as a linked list. That reframing is what separates people who solve this in five minutes from people who stare at it for an hour.

The fix is repetition. Type the solution from memory, then wait a day and do it again. By the third or fourth rep, the "array as linked list" mapping becomes automatic, and you will recognize it instantly when you see value ranges matching index ranges.

Related posts

  • Linked List Cycle - The foundation: detecting whether a cycle exists using Floyd's fast-slow pointer technique
  • Linked List Cycle II - Finding the cycle entrance node, which is the exact same phase 2 used in this problem
  • Missing Number - Another problem where the relationship between values and indices is the key insight