Skip to content
← All posts

Find Duplicate Subtrees: Serialization with Hash Map

6 min read
leetcodeproblemmediumtreeshash-map

Given the root of a binary tree, return all duplicate subtrees. For each kind of duplicate subtree, you only need to return the root node of any one of them. Two trees are duplicates if they have the same structure with the same node values.

This is LeetCode 652: Find Duplicate Subtrees. It combines two skills you have probably already practiced separately: tree traversal and hash map counting. The challenge is figuring out how to represent a subtree in a way that lets you compare it against every other subtree efficiently.

Example: given root = [1, 2, 3, 4, null, 2, 4, null, null, 4], the duplicate subtrees are the leaf node 4 (appears three times) and the subtree 2 -> 4 (appears twice).

12342441,2,4,#,#,#,3,2,4,#,#,#,4,#,#full tree serial2,4,#,#,#duplicate (count: 2)4,#,#duplicate (count: 3)
Tree [1,2,3,4,null,2,4,null,null,4]. The subtree "4" appears three times and subtree "2->4" appears twice. Both are duplicates.

Why this problem matters

At first glance, comparing every subtree against every other subtree sounds expensive. A brute force approach that checks structural equality for each pair of nodes would be O(n^2) in the number of comparisons, and each comparison could itself be O(n). That is way too slow.

The insight is that you do not need to compare trees directly at all. Instead, you can serialize each subtree into a string during a single post-order traversal. If two subtrees produce the same string, they are duplicates. A hash map tracks how many times each serialization has appeared. When the count reaches exactly 2, you record that subtree as a duplicate.

This reduces the problem to: traverse once, serialize each subtree, count serializations. One pass. One hash map.

The approach

Here is the plan:

  1. Run a post-order traversal (left, right, root). This guarantees that by the time you process a node, both its children have already been serialized.
  2. At each node, build a serialization string from the node value and the serializations of its left and right children.
  3. Store each serialization in a hash map with a count. When the count hits 2, add that node to the result list.
  4. Return the result.

The key choice is your serialization format. You need to distinguish between different tree shapes. Including explicit null markers (like #) ensures that [1, null, 2] and [1, 2, null] produce different strings.

Use a delimiter between values in your serialization. A format like "node,left_serial,right_serial" with "#" for null children works well. Without delimiters, you could get collisions: node values 1, 23 and 12, 3 might produce the same concatenated string.

The code

def find_duplicate_subtrees(root):
    count = {}
    result = []

    def serialize(node):
        if node is None:
            return "#"
        left = serialize(node.left)
        right = serialize(node.right)
        key = f"{node.val},{left},{right}"
        count[key] = count.get(key, 0) + 1
        if count[key] == 2:
            result.append(node)
        return key

    serialize(root)
    return result

Line by line walkthrough

count = {} creates the hash map that tracks how many times each serialization has been seen.

result = [] will hold one root node from each group of duplicate subtrees.

def serialize(node) is a recursive helper that returns the serialization string for the subtree rooted at node.

if node is None: return "#" is the base case. Null nodes serialize to the marker "#". This is what encodes tree structure into the string.

left = serialize(node.left) recursively serializes the entire left subtree first. Because this is post-order, the left child is fully processed before we use its result.

right = serialize(node.right) does the same for the right subtree.

key = f"{node.val},{left},{right}" builds the serialization for the current subtree. It combines the node value with the left and right serializations, separated by commas.

count[key] = count.get(key, 0) + 1 increments the count for this serialization in the hash map.

if count[key] == 2: result.append(node) adds the node to the result the first time a duplicate is detected. We check for exactly 2 (not >= 2) so that we only add each duplicate subtree once, not on every subsequent occurrence.

return key passes the serialization back up to the parent node so it can build its own serialization string.

Visual walkthrough

Let's trace through the tree [1, 2, 3, 4, null, 2, 4, null, null, 4] step by step. Watch how each subtree gets serialized during post-order traversal and how the hash map catches duplicates.

Step 1: Post-order hits node 4 (left child of 2). It is a leaf.

1234visiting244hash map"4,#,#"1

Serialize leaf 4 as "4,#,#". Store in map with count 1. First time seeing this serialization.

Step 2: Back at node 2 (left child of root). Serialize as "2,4,#,#,#".

12visiting34244hash map"4,#,#"1"2,4,#,#,#"1

Node 2 has left child 4 and no right child. Its serialization is "2,4,#,#,#". Count is 1 in the map.

Step 3: Post-order visits node 4 (left child of inner 2). Serialize: "4,#,#".

1234244duplicate!hash map"4,#,#"2"2,4,#,#,#"1result: [4]

This serialization already exists in the map. Count goes to 2, so we add this subtree to results!

Step 4: Back at inner node 2 (left child of 3). Serialize: "2,4,#,#,#".

12342duplicate!44hash map"4,#,#"2"2,4,#,#,#"2result: [4, 2]

This matches the earlier subtree rooted at the other 2. Count hits 2, so we add it to results!

Step 5: Post-order visits node 4 (right child of 3). Serialize: "4,#,#".

123424count: 34hash map"4,#,#"3"2,4,#,#,#"2count > 2 = no new entry

Count for "4,#,#" goes to 3 but we only add on count 2, so no new entry. We already captured this duplicate.

Step 6: Traversal complete. Return the collected duplicates.

1234244final hash map"4,#,#"3"2,4,#,#,#"2"3,2,4,#,#,#,4,#,#"1"1,2,4,#,#,#,..."1result: [4, 2]

Post-order finished. Two unique duplicate subtrees found: the leaf 4 and the subtree 2->4.

The pattern is consistent: serialize bottom-up, count in the map, collect on the second occurrence. Post-order guarantees children are serialized before their parent, so each node's key is built from already-finalized child keys.

Complexity analysis

ApproachTimeSpace
Brute force (compare every pair)O(n^2) per pair, O(n) per comparisonO(n)
Serialization + hash mapO(n^2) worst case (string concatenation)O(n^2) for stored strings
Serialization with tuple keysO(n)O(n)

The string-based approach has O(n^2) worst-case time because building the serialization strings involves concatenation that can be O(n) per node in a skewed tree. For most practical inputs this is fine, but if you want true O(n) performance, you can assign integer IDs to each unique serialization using a tuple-to-ID mapping instead of building long strings.

For interviews, the string serialization approach shown above is the standard answer. It is clean, correct, and easy to explain under time pressure.

The building blocks

This problem uses two key building blocks:

Post-order subtree serialization. Post-order traversal (left, right, root) ensures that by the time you process a node, both children have already returned their serialized forms. You combine them into a single string that uniquely represents the subtree. This same technique appears in tree hashing, canonical form problems, and any situation where you need to compare subtree structures.

def serialize(node):
    if node is None:
        return "#"
    left = serialize(node.left)
    right = serialize(node.right)
    return f"{node.val},{left},{right}"

Hash map frequency counting. You maintain a dictionary mapping each serialization to the number of times it has appeared. When the count reaches a threshold (here, 2), you take action. This is the same counting pattern used in problems like Group Anagrams and Contains Duplicate, just applied to tree serializations instead of strings or numbers.

count[key] = count.get(key, 0) + 1
if count[key] == 2:
    result.append(item)

Edge cases to watch for

  • Empty tree (root is None): the serialize function returns "#" immediately. The result list stays empty. No duplicates to find.
  • All nodes have the same value: every leaf produces the same serialization, and larger subtrees may also collide. The algorithm handles this correctly because it only adds to the result when count equals exactly 2.
  • Single node (no children): one node has no duplicates. The serialization is unique and count stays at 1.
  • Deeply skewed tree (linked list shape): serialization strings grow linearly with depth. The algorithm still works correctly, but be aware of the O(n^2) total string length in this case.

From understanding to recall

You now see how post-order serialization turns a structural comparison problem into a string counting problem. The two building blocks (serialize bottom-up, count with a hash map) are individually simple. But combining them under pressure requires practice.

Could you write the serialize helper from scratch without looking at it? The recursive shape (base case returns "#", recurse left, recurse right, combine into key, update map, return key) is the kind of pattern that spaced repetition locks into memory. After a few reps at increasing intervals, you will not need to think about the structure. You will just write it.

Related posts