Skip to content
← All posts

Find Elements in a Contaminated Binary Tree: DFS Recovery with Hash Set

6 min read
leetcodeproblemmediumtreeshash-map

You are given a binary tree where every node's value has been set to -1. The tree structure itself is still intact, but all the data is gone. Your job is to recover the values using a simple rule: the root is 0, every left child has value 2 * parent + 1, and every right child has value 2 * parent + 2. Once the tree is recovered, you need to support find(target) queries that return whether a given value exists in the tree.

This is LeetCode 1261: Find Elements in a Contaminated Binary Tree, and it combines two ideas you have probably seen separately: DFS tree traversal and hash set lookup. The recovery phase walks the entire tree once, and the query phase answers each find() call in constant time.

The reason this problem is worth studying is that it teaches you to think about tree problems in two phases: a preprocessing step (recover the tree) and a query step (answer lookups). That two-phase pattern shows up constantly in interview problems involving trees, graphs, and custom data structures.

contaminated (all -1)-1-1-1-1-1-1-1recovered0root12*0+122*0+232*1+142*1+252*2+162*2+2
Every node starts as -1. Root becomes 0, then each child is computed from its parent: left = 2*x+1, right = 2*x+2.

Approach

The approach has two parts. First, during initialization, you run a DFS starting from the root. You set the root's value to 0, then for every node you visit, you compute the left child's value as 2 * x + 1 and the right child's value as 2 * x + 2. As you visit each node, you add its recovered value to a hash set.

Second, when find(target) is called, you simply check whether target is in the hash set. That gives you O(1) lookup per query.

The DFS recovery is the same preorder traversal pattern you see in dozens of tree problems: process the current node, then recurse into left and right children. The only difference here is that you are computing each node's value from its parent rather than reading it from the node itself.

The solution

class FindElements:
    def __init__(self, root):
        self.values = set()
        root.val = 0
        self._recover(root)

    def _recover(self, node):
        if node is None:
            return
        self.values.add(node.val)
        if node.left:
            node.left.val = 2 * node.val + 1
            self._recover(node.left)
        if node.right:
            node.right.val = 2 * node.val + 2
            self._recover(node.right)

    def find(self, target):
        return target in self.values

The constructor sets the root value to 0 and kicks off the DFS. The _recover method does the actual work: it adds the current node's value to the set, computes the left and right child values using the formula, and recurses. The find method is a one-liner that checks set membership.

Notice that we compute and assign the child's value before recursing into it. This is important because when we recurse into node.left, we need node.left.val to already be set so the next level of recursion can use it to compute its own children's values.

The formulas 2 * x + 1 and 2 * x + 2 are the same ones used in array-based binary heaps. If a parent is at index x in a zero-indexed array, its left child is at 2x + 1 and its right child is at 2x + 2. This is not a coincidence. The recovery rule assigns each node the same index it would have in a level-order array representation of the tree.

Visual walkthrough

Step 1: All values are contaminated (-1). The tree structure is intact but every value is lost.

-1-1-1-1-1-1-1

The original values are gone. We need to reconstruct them using the parent-child formula.

Step 2: Set the root value to 0. This is our starting point for recovery.

0root = 0-1-1-1-1-1-1

The problem guarantees root.val = 0 after recovery. This anchors the entire reconstruction.

Step 3: DFS from root. Left child = 2*0+1 = 1. Right child = 2*0+2 = 2.

012*0+122*0+2-1-1-1-1

Each child value is computed from its parent. Left uses 2*x+1, right uses 2*x+2.

Step 4: Continue DFS. Node 1's children: 2*1+1 = 3 and 2*1+2 = 4. Node 2's children: 2*2+1 = 5 and 2*2+2 = 6.

01232*1+142*1+252*2+162*2+2

The formula works recursively. Every node computes its children's values and recurses deeper.

Step 5: All values recovered. Store {0, 1, 2, 3, 4, 5, 6} in a hash set for O(1) find queries.

0123find(3)?456set: {0, 1, 2, 3, 4, 5, 6}

find(3) checks the set and returns true. find(7) checks the set and returns false. Each query is O(1).

Each step recovers one level of the tree. The DFS visits every node exactly once, computing its value from the parent and adding it to the hash set. Once the traversal is complete, any find() query becomes a simple set lookup.

Complexity analysis

ApproachTimeSpace
DFS recovery + hash setO(n) init, O(1) findO(n)

Time: The constructor runs DFS over all n nodes, doing O(1) work at each node (set the value, add to set, recurse). So initialization is O(n). Each find() call is O(1) because hash set lookup is constant time on average.

Space: The hash set stores one entry per node, so it uses O(n) space. The DFS call stack adds O(h) space where h is the tree height. For a balanced tree h = log n, and for a skewed tree h = n. The set dominates either way, so total space is O(n).

The building blocks

1. Binary tree node indexing formula

The mapping from parent value x to child values 2x + 1 (left) and 2x + 2 (right) is the foundation of this problem. It gives every node a unique non-negative integer based on its position in the tree.

left_child_val = 2 * parent_val + 1
right_child_val = 2 * parent_val + 2

This formula guarantees uniqueness: no two nodes in a binary tree can share the same positional index. That is why a set works perfectly for the find() queries. There are no collisions to worry about.

2. DFS tree traversal for recovery

The recovery uses a standard preorder DFS pattern. You process the current node (assign its value and add it to the set), then recurse into the left and right subtrees. The key detail is that you must assign the child's value before recursing into it.

def recover(node):
    if node is None:
        return
    values.add(node.val)
    if node.left:
        node.left.val = 2 * node.val + 1
        recover(node.left)
    if node.right:
        node.right.val = 2 * node.val + 2
        recover(node.right)

This is the same DFS skeleton you see in Invert Binary Tree, Maximum Depth, and many other tree problems. The only thing that changes is what you do at each node.

Edge cases

  • Single node tree (root only): Root gets value 0. The set contains just {0}. find(0) returns true, everything else returns false.
  • Skewed tree (all left children or all right children): The DFS still works. A left-only chain produces values 0, 1, 3, 7, 15... and a right-only chain produces 0, 2, 6, 14, 30... The set correctly stores whatever nodes exist.
  • find(target) where target is negative: The formula only produces non-negative values (root is 0, and 2x + 1 and 2x + 2 are always positive for non-negative x). So any negative target returns false.
  • Large target values: If the tree has n nodes, the maximum possible value depends on the tree shape, not n directly. A right-skewed tree of depth d has a leaf with value 2^(d+1) - 2. The set handles this fine since it stores the actual values.
  • Multiple find() calls: Each call is independent and O(1). The set does not change after initialization.

From understanding to recall

You just traced through a DFS recovery that assigns positional indices to every node in a contaminated tree, then saw how a hash set turns find() into a constant-time lookup. The two-phase design (preprocess with DFS, query with a set) is a pattern that appears in many tree and graph problems.

Understanding the solution now is one thing. Being able to write it from scratch in an interview is another. Spaced repetition bridges that gap. You practice reconstructing the DFS recovery and the set-based lookup at increasing intervals until the pattern becomes automatic. The building blocks here (preorder DFS and hash set storage) are the same ones you will use in dozens of other problems.

Related posts

CodeBricks breaks problems like this into the underlying building blocks (DFS traversal, hash set lookup, node indexing) and drills them individually with spaced repetition. Instead of memorizing full solutions, you internalize the patterns so you can assemble them on the fly.