Skip to content
← All posts

Random Pick Index: Reservoir Sampling

6 min read
leetcodeproblemmediummathhash-map

LeetCode 398, Random Pick Index, asks you to pick a random index from an array where the value equals a given target, and each valid index must have an equal probability of being chosen. The trick is that duplicates are allowed, so you might have multiple indices that qualify. This is a classic application of reservoir sampling, and once you see the pattern, it becomes one of the cleanest solutions in the randomized algorithms toolkit.

The problem

Given an integer array nums with possible duplicates, implement a class that picks a random index i where nums[i] == target. Each valid index should be returned with equal probability.

nums = [1, 2, 3, 3, 3]

pick(3)  # returns 2, 3, or 4 with equal probability (1/3 each)
pick(1)  # always returns 0
pick(2)  # always returns 1

You will call pick multiple times, potentially with different targets.

nums = [1, 2, 3, 3, 3] — target = 31i=02i=13i=23i=33i=4pick(3) returns 2, 3, or 4 with equal probability
Indices 2, 3, and 4 all hold the target value 3. Reservoir sampling picks one uniformly at random in a single pass.

Why reservoir sampling works

The idea behind reservoir sampling is beautifully simple. You scan through the array once. Every time you encounter the target value, you increment a counter count. Then you generate a random number between 1 and count (inclusive). If the random number equals 1, you replace your current pick with the current index. Otherwise, you keep your existing pick.

Why does this give equal probability? Consider the k-th occurrence of the target. The probability of picking it is 1/k (from the random roll). But it also needs to survive all subsequent rolls without being replaced. The probability it survives the (k+1)-th roll is k/(k+1), the (k+2)-th roll is (k+1)/(k+2), and so on until the last (n-th) roll where the survival probability is (n-1)/n. Multiply them all together:

1/k * k/(k+1) * (k+1)/(k+2) * ... * (n-1)/n = 1/n

Everything cancels in a telescoping product, leaving exactly 1/n for each valid index.

There is an alternative approach: precompute a hash map that maps each value to a list of its indices. Then pick(target) just selects a random element from the list. This works great too, but it uses O(n) extra space. Reservoir sampling uses O(1) extra space beyond storing the input array, making it the better choice when memory is tight or when the array is too large to duplicate index data.

The key insight is the telescoping product. Each index's probability of being chosen and surviving all future rolls simplifies to exactly 1/n, where n is the total count of the target. You do not need to know n in advance.

Step-by-step walkthrough

Let's trace through reservoir sampling on nums = [1, 2, 3, 3, 3] with target = 3. We scan left to right, tracking count (number of target occurrences seen so far) and pick (the currently selected index).

Step 1: Scan index 0 (value = 1)

1i=02i=13i=23i=33i=4scancount = 0pick = none

Value 1 does not match target 3. Skip it. count stays 0, no pick yet.

Step 2: Scan index 1 (value = 2)

1i=02i=13i=23i=33i=4scancount = 0pick = none

Value 2 does not match target 3. Skip it. count stays 0, still no pick.

Step 3: Scan index 2 (value = 3, first match!)

1i=02i=13i=23i=33i=4scancount = 1rand(1, 1) = 1pick = index 2

First occurrence of target 3. count becomes 1. rand(1, 1) = 1, so pick = index 2. The first match is always chosen.

Step 4: Scan index 3 (value = 3, second match)

1i=02i=13i=23i=33i=4scancount = 2rand(1, 2) = 2pick = index 2

Second occurrence. count becomes 2. rand(1, 2) = 2, which is not 1, so we keep pick = index 2. Probability of replacing was 1/2.

Step 5: Scan index 4 (value = 3, third match)

1i=02i=13i=23i=33i=4scancount = 3rand(1, 3) = 1pick = index 4

Third occurrence. count becomes 3. rand(1, 3) = 1, so we replace! pick changes to index 4. Probability of replacing was 1/3.

Notice how each target index had exactly a 1/3 chance of being the final pick. Index 2 was chosen first, then had to survive two replacement attempts. Index 3 could have replaced it with probability 1/2. Index 4 replaced with probability 1/3. The math works out so every valid index gets the same shot.

The code

Here is the Python solution for LeetCode 398, Random Pick Index:

import random

class Solution:
    def __init__(self, nums):
        self.nums = nums

    def pick(self, target):
        count = 0
        result = -1
        for i, num in enumerate(self.nums):
            if num == target:
                count += 1
                if random.randint(1, count) == 1:
                    result = i
        return result

The constructor simply stores the array. All the work happens in pick. We iterate through every element, and whenever we find a match, we increment count and roll the dice. random.randint(1, count) returns 1 with probability 1/count, which is exactly what reservoir sampling needs. If we roll a 1, we update our result to the current index.

A common mistake is using random.randint(0, count) instead of random.randint(1, count). The range matters: randint(0, count) gives count + 1 possible outcomes, skewing the probability. Always use a 1-based range matching the current count.

Complexity analysis

ApproachTimeSpace
Hash map precomputationO(n) init, O(1) pickO(n)
Reservoir samplingO(1) init, O(n) pickO(1) extra

The hash map approach front-loads the work: build the index lists once, then look up and randomly select in O(1). Reservoir sampling defers the work: initialization is instant, but each pick call scans the entire array. Choose based on your constraints. If pick is called many times with the same target, the hash map wins. If memory is the bottleneck or the array changes, reservoir sampling is more flexible.

The building blocks

Reservoir sampling

Reservoir sampling is a family of algorithms for randomly choosing k items from a stream of unknown (or very large) length. The variant used here is the simplest case: k = 1. You maintain a single candidate and replace it with decreasing probability as you encounter more items.

This same technique appears in several related problems:

  • Linked List Random Node (LeetCode 382): Pick a random node from a linked list without knowing its length. Same reservoir sampling, just on a linked list instead of an array.
  • Shuffle an Array (LeetCode 384): Fisher-Yates shuffle uses a similar "swap with random index" idea, but for permuting an entire array rather than selecting one element.
  • Random Point in Non-overlapping Rectangles (LeetCode 497): Uses weighted random selection, which is a cousin of reservoir sampling.

Once you internalize reservoir sampling for k = 1, extending to k items is natural. For each new element at position i, you include it with probability k/i and randomly evict one of the k current picks. This generalization powers many real-world streaming algorithms.

Edge cases

  • Single occurrence: If the target appears exactly once, the algorithm always returns that index. The random roll is randint(1, 1), which is always 1.
  • All elements are the target: Every index is valid. The algorithm degrades gracefully, giving each index an equal 1/n probability.
  • Target at the very end: The last element always has a 1/count chance of being picked, same as every other occurrence. Position in the array does not bias the result.
  • Large arrays with rare targets: Reservoir sampling still scans the full array each time. If the target is rare and you call pick frequently, the hash map approach may be more practical despite its space cost.
  • Multiple calls to pick: Each call is independent. The algorithm does not cache or remember previous results, so consecutive calls to pick(3) may return different indices.

From understanding to recall

Understanding reservoir sampling is one thing. Being able to reproduce it cleanly under interview pressure is another. The algorithm has a small number of moving parts (a counter, a random roll, a conditional replacement), but getting the probability boundaries right demands practice. Spaced repetition helps you move this pattern from "I've seen it before" to "I can write it in my sleep." Revisit this problem at increasing intervals, and you will find the logic becomes automatic.

Related posts

Reservoir sampling is one of those patterns that feels like magic the first time you see it, then feels obvious once the telescoping proof clicks. LeetCode 398 is the perfect entry point. Nail this problem, and you will be ready for any reservoir sampling variant that comes your way.