Linked List Random Node: Reservoir Sampling in One Pass
Linked List Random Node is LeetCode problem 382. You are given a singly linked list and need to return a random node's value such that every node has an equal probability of being chosen. The twist: you do not know the length of the list ahead of time.
You could walk the list once to count nodes, then walk again to pick a random index. That works, but it requires two passes. Reservoir sampling does it in a single pass, and it generalizes to streams where you truly cannot know the size upfront. That is the technique interviewers want to see.
The problem
You are given the head of a singly linked list. Implement a class with:
__init__(self, head)- stores the head of the linked listgetRandom()- returns a random node's value where each node must be equally likely to be chosen
You must solve this without converting the list to an array first (or assume you cannot for the follow-up about very large or streaming data).
Why not just count and index?
The simplest approach: walk the list to find its length n, generate a random number between 0 and n - 1, then walk again to that index. Two passes, O(n) time, O(1) space.
def getRandom(self):
n = 0
node = self.head
while node:
n += 1
node = node.next
idx = random.randint(0, n - 1)
node = self.head
for _ in range(idx):
node = node.next
return node.val
This is fine for a fixed list. But consider the follow-up: what if the list is extremely large, or the data is streaming and you only get one pass? That is where reservoir sampling shines.
Reservoir sampling: Algorithm R
Reservoir sampling is a family of algorithms for choosing a random sample from a stream of unknown length. The simplest version, Algorithm R, keeps a "reservoir" of size 1 (one chosen element). As you scan through items:
- For the i-th item (1-indexed), replace the current choice with probability
1/i
After scanning all n items, every item has exactly 1/n probability of being the final choice.
The intuition: the first item always gets picked. The second item replaces it with probability 1/2. The third replaces whatever is there with probability 1/3. And so on. The probabilities compound in exactly the right way so that every item ends up with equal weight.
Why does every node get 1/n?
Let's prove it for a list of length 5. For node k to be the final result, two things must happen:
- Node k must be picked when we visit it (probability
1/k) - No later node must replace it (nodes
k+1through 5 must all fail to replace)
The probability of not being replaced at step j is (j - 1)/j. So the combined probability for node k is:
(1/k) * (k / (k+1)) * ((k+1) / (k+2)) * ... * (4/5)
This is a telescoping product. Everything cancels, leaving 1/5. The same argument works for any list length n - every node ends up with probability 1/n.
The telescoping cancellation is the heart of reservoir sampling. If an interviewer asks you to prove correctness, this is the argument. Write out the product for one specific node and watch the terms cancel.
Visual walkthrough
Let's trace through the list 1 -> 2 -> 3 -> 4 -> 5 with one specific run. At each node, we roll a random number from 1 to i. If we roll 1, we replace our current result. Watch how the result evolves.
Step 1: Visit node 1 (i = 1)
rand(1, 1) = 1. Only one option, so result = 1. This always happens for the first node.
Step 2: Visit node 2 (i = 2)
rand(1, 2) = 1. We rolled 1, so replace result with 2. Probability of replacing: 1/2.
Step 3: Visit node 3 (i = 3)
rand(1, 3) = 2. Not 1, so we keep result = 2. Probability of replacing was 1/3.
Step 4: Visit node 4 (i = 4)
rand(1, 4) = 1. We rolled 1, so replace result with 4. Probability of replacing: 1/4.
Step 5: Visit node 5 (i = 5)
rand(1, 5) = 3. Not 1, so we keep result = 4. Final answer: 4.
In this particular run, the final result was 4. A different set of random rolls would give a different result, but over many trials each node appears exactly 1/5 of the time.
The code
import random
class Solution:
def __init__(self, head):
self.head = head
def getRandom(self):
result = 0
node = self.head
i = 1
while node:
if random.randint(1, i) == 1:
result = node.val
node = node.next
i += 1
return result
Four lines of real logic inside getRandom. Let's break them down:
- Initialize: Start at the head with
i = 1andresult = 0. - Loop: For each node, generate a random integer from 1 to i.
- Replace: If the roll equals 1 (probability
1/i), update result to the current node's value. - Advance: Move to the next node and increment i.
When the loop ends, result holds a uniformly random node value.
Why check random.randint(1, i) == 1 rather than == i or some other value? It does not matter which specific value you check for, as long as the probability is 1/i. Checking for 1 is the convention, but checking for i works identically.
Complexity
| Time | Space | |
|---|---|---|
__init__ | O(1) | O(1) |
getRandom | O(n) | O(1) |
| Per call | O(n) | O(1) |
Each call to getRandom walks the entire list once. There is no way around this without storing extra data, because you cannot index into a linked list. The space is O(1) since we only keep a single variable for the result.
Building blocks
This problem is built from one core technique:
Reservoir sampling (Algorithm R). Maintain a reservoir of size k (here k = 1). For each new element at position i, include it in the reservoir with probability k/i. For k = 1, this simplifies to replacing the current choice with probability 1/i.
This technique appears in more contexts than you might expect:
- Random node from a stream. If data arrives one item at a time and you cannot store it all, reservoir sampling is the standard approach. This is the canonical use case.
- Random sampling from a database. When you need k random rows from a table of unknown size without counting first, Algorithm R (generalized to k > 1) gives you a uniform sample in one pass.
- Shuffle an Array (LeetCode 384) also relies on randomization guarantees. Fisher-Yates shuffle and reservoir sampling share the same underlying probability argument about each element having equal likelihood of landing in any position.
- Weighted random selection. Reservoir sampling extends to weighted variants where each item has a different probability of being chosen, useful in load balancing and recommendation systems.
Once you understand why the telescoping product yields uniform probability, the rest is just applying the pattern to different container sizes.
Edge cases
- Single node.
istarts at 1,randint(1, 1)always returns 1, so the single node is always chosen. Correct. - Two nodes. The first node is picked, then the second replaces it with probability 1/2. Each has 50% chance. Correct.
- Very long list. No issue. The algorithm is O(n) time, O(1) space regardless of length. The random number generator handles large i values fine.
- Multiple calls to getRandom. Each call is independent. The list is traversed fresh each time, and a new sequence of random numbers is generated.
From understanding to recall
Reservoir sampling is one of those algorithms that is easy to understand but tricky to reproduce from memory. The core loop is only a few lines, but getting the indexing right (1-indexed counter, checking for 1, incrementing after the replacement) requires practice.
The mental anchor: "for the i-th element, roll a die with i faces. If you roll a 1, swap it in." That single sentence encodes the entire algorithm. Practice typing it from that prompt until the code flows automatically.
Related posts
- Reverse Linked List - fundamental linked list traversal
- Linked List Cycle - another single-pass linked list algorithm
- Shuffle an Array - another randomization problem using Fisher-Yates