Clone Graph: DFS with a Hash Map

May 7, 20269 min read

leetcodeproblemmediumgraphs

You are given a reference to a node in a connected undirected graph. Return a deep copy (clone) of the entire graph. Each node has a value and a list of neighbors.

This is LeetCode 133: Clone Graph, and it teaches one of the most useful graph patterns: traversing a structure while building a parallel copy of it. The trick is a hash map that serves double duty as both a "visited" set and a lookup table for already-cloned nodes.

The original graph (blue) and its deep copy (green). The hash map (orange arrows) tracks which original node maps to which clone.

Why this problem matters

Deep copying a graph is harder than it looks. Unlike a linked list (where you just walk forward), a graph has cycles. If node 1 points to node 2 and node 2 points back to node 1, a naive recursive copy will loop forever.

The clone graph LeetCode problem forces you to solve this with a clean pattern: use a hash map that maps each original node to its clone. Before you recurse into any neighbor, check the map. If the neighbor is already there, reuse the existing clone instead of creating a new one. That single check prevents infinite loops and avoids duplicate nodes.

This pattern appears any time you need to deep copy a graph, duplicate a complex object structure, or traverse a graph while building something in parallel. It is also a great test of whether you truly understand how DFS and hash maps work together.

The key insight

The hash map {original_node: cloned_node} does two jobs at once:

Visited check: If a node is in the map, we have already cloned it. Do not clone it again.
Lookup: When wiring up neighbors, we need the clone of each neighbor. The map gives us O(1) access to it.

The algorithm:

Start at the given node. Create a clone. Store the mapping original -> clone in the hash map.
For each neighbor of the original node, check the hash map.
If the neighbor is NOT in the map, recursively clone it (this adds it to the map).
If the neighbor IS in the map, just grab the existing clone.
Either way, append the neighbor's clone to the current clone's neighbor list.
Return the clone of the starting node.

The DFS solution

class Node:
    def __init__(self, val=0, neighbors=None):
        self.val = val
        self.neighbors = neighbors if neighbors is not None else []

def clone_graph(node: 'Node | None') -> 'Node | None':
    if not node:
        return None

    clones: dict[int, Node] = {}  # original node id -> cloned node

    def dfs(original: Node) -> Node:
        if original.val in clones:
            return clones[original.val]

        # Create the clone and register it BEFORE recursing
        clone = Node(original.val)
        clones[original.val] = clone

        for neighbor in original.neighbors:
            clone.neighbors.append(dfs(neighbor))

        return clone

    return dfs(node)

Let's break this down.

The clones dictionary maps each node's value to its clone. When dfs is called on a node, it first checks: "Have I already cloned this node?" If yes, return the existing clone immediately. This is the cycle breaker. Without this check, the recursion would bounce back and forth between connected nodes forever.

If the node has not been cloned yet, we create a new Node, register it in the map right away, and then recurse into each neighbor. Registering before recursing is critical. If we waited until after the neighbor loop to register, a cycle would hit the node again before it was in the map, causing infinite recursion.

For each neighbor, the recursive call either creates a new clone (if unvisited) or returns an existing one (if visited). Either way, we append the result to the current clone's neighbor list.

The order matters: create the clone and add it to the map BEFORE recursing into neighbors. If you recurse first and register later, cycles in the graph will cause infinite recursion because the node will not be in the map when a neighbor loops back to it.

Practice the clone graph pattern

Visual walkthrough

Let's trace the DFS through a 4-node graph: 1 connects to 2 and 4, 2 connects to 1 and 3, 3 connects to 2 and 4, 4 connects to 3 and 1. Watch how the hash map grows and prevents duplicate cloning.

Step 1: Visit node 1. Create clone of node 1. Add to hash map.

hash map: {1: clone(1)}

DFS starts at node 1. We create a new node and store the mapping.

Step 2: Recurse to neighbor 2. Create clone of node 2. Connect clone(1) to clone(2).

hash map: {1: clone(1), 2: clone(2)}

Node 2 is not in the hash map, so we clone it and recurse.

Step 3: Recurse to neighbor 3. Create clone of node 3. Connect clone(2) to clone(3).

hash map: {1: clone(1), 2: clone(2), 3: clone(3)}

Node 3 is not in the hash map. Clone it and keep going.

Step 4: Recurse to neighbor 4. Create clone of node 4. Connect clone(3) to clone(4).

hash map: {1: clone(1), 2: clone(2), 3: clone(3), 4: clone(4)}

Node 4 is not in the hash map. Clone it.

Step 5: Node 4's neighbor is node 1. Already in hash map! Connect clone(4) to clone(1). DFS backtracks.

hash map: {1: clone(1), 2: clone(2), 3: clone(3), 4: clone(4)}

The hash map prevents us from cloning node 1 again. We reuse the existing clone. All edges connected. Done!

The key moment is Step 5. When DFS reaches node 4's neighbor (node 1), node 1 is already in the hash map. Instead of cloning it again, we grab the existing clone and wire up the connection. That is what closes the cycle in the cloned graph.

The BFS solution

If you prefer iterative traversal, BFS works just as well. Same hash map, same logic, different traversal order.

from collections import deque

def clone_graph(node: 'Node | None') -> 'Node | None':
    if not node:
        return None

    clones: dict[int, Node] = {node.val: Node(node.val)}
    queue = deque([node])

    while queue:
        original = queue.popleft()

        for neighbor in original.neighbors:
            if neighbor.val not in clones:
                clones[neighbor.val] = Node(neighbor.val)
                queue.append(neighbor)
            clones[original.val].neighbors.append(clones[neighbor.val])

    return clones[node.val]

The BFS version creates the starting node's clone upfront and adds the original to the queue. For each dequeued node, it iterates through neighbors. If a neighbor has no clone yet, create one and enqueue the original neighbor for processing. Then connect the clones.

The hash map plays the same role here. It tracks which nodes have been cloned (so we do not enqueue them twice) and gives us O(1) access to any node's clone when wiring up neighbors.

DFS and BFS are interchangeable for this problem. DFS is slightly more concise because the recursion handles the stack for you. BFS avoids deep recursion, which matters if the graph has long chains (Python's default recursion limit is 1,000). Pick whichever you are more comfortable writing under pressure.

Drill DFS and BFS graph cloning

Complexity analysis

Approach	Time	Space
DFS (recursive)	O(N + E)	O(N)
BFS (iterative)	O(N + E)	O(N)

N is the number of nodes, E is the number of edges.

Time: We visit every node exactly once and process every edge exactly once. The hash map lookup is O(1). Total: O(N + E).

Space: The hash map stores N entries (one per node). DFS also uses O(N) call stack space in the worst case (a straight-line graph). BFS uses O(N) for the queue. Total: O(N).

Edge cases

Null input: the graph is empty. Return None. Both solutions handle this with the if not node check.
Single node, no neighbors: one node with an empty neighbor list. Clone it and return. No recursion needed.
Single node, self-loop: a node whose neighbor list contains itself. The hash map catches this on the second visit, preventing infinite recursion.
Fully connected graph: every node connects to every other node. The hash map ensures each node is cloned exactly once regardless of how many edges point to it.
Two nodes pointing to each other: the simplest cycle. Node A's neighbor is B, and B's neighbor is A. DFS clones A, recurses to B, clones B, tries to recurse back to A, finds A in the map, returns the existing clone.

The building blocks

This problem is built on two reusable pieces that CodeBricks drills independently:

1. DFS/BFS with a visited hash map

The pattern of traversing a graph while using a hash map as the visited set:

visited = {}

def dfs(node):
    if node in visited:
        return visited[node]

    visited[node] = some_value  # register BEFORE recursing
    for neighbor in node.neighbors:
        dfs(neighbor)
    return visited[node]

This is different from the simple visited set you use in Number of Islands or Course Schedule. Here the map stores a computed value (the clone), not just a boolean. The same structure appears in memoization, where you cache function results to avoid recomputation. The core idea is identical: "Have I seen this before? If yes, reuse the cached result."

2. Building a parallel structure during traversal

The pattern of constructing a new data structure that mirrors an existing one as you traverse it:

def build_copy(original):
    copy = create_new(original)
    register(original, copy)
    for child in original.children:
        copy.children.append(build_copy(child))
    return copy

You see this pattern in deep copy operations, serialization/deserialization, tree cloning, and any problem where you need to "mirror" a structure. The key discipline is always registering the new object before recursing into children, so that back-references and cycles resolve correctly.

A common bug is creating the clone after processing neighbors instead of before. In a graph with cycles, this causes infinite recursion because the node is not in the hash map when a neighbor loops back to it. Always register the clone immediately after creating it, before doing anything else with it.

Common mistakes

1. Registering the clone too late. If you create the clone but do not add it to the hash map until after processing all neighbors, cycles will cause infinite recursion. The fix is simple: add to the map right after clone = Node(val), before the neighbor loop.

2. Using the node object as the key instead of the value. In LeetCode's version, each node has a unique integer value from 1 to 100. You can use node.val as the key. In a more general graph, you would use the node's identity (the object reference itself) as the key.

3. Forgetting to handle the null case. The problem states the graph can be empty (null node). If you skip the null check, you will get a null pointer error.

4. Confusing shallow copy with deep copy. A shallow copy would share the same neighbor references. A deep copy means every node in the clone is a brand new object. The hash map ensures this: every original node gets its own fresh clone.

When to use this pattern

Look for these signals:

The problem asks you to deep copy or clone a structure with cycles or shared references
You need to traverse a graph while building a parallel result
The structure has back-references or cycles that would cause infinite loops without tracking
You need a visited check that also stores a computed value (not just true/false)

Problems that use the same "hash map as visited + lookup" pattern:

Copy List with Random Pointer (LeetCode 138): same idea but with a linked list that has random pointers (another form of cycles)
Course Schedule (LeetCode 207): uses a hash map to track node states during DFS
Number of Islands (LeetCode 200): uses visited tracking during graph traversal (simpler version without the lookup component)

From understanding to recall

You have read through both DFS and BFS solutions. You see how the hash map prevents cycles and provides O(1) clone lookup. The logic makes sense. But can you write it from scratch in 5 minutes during an interview?

The details matter. Registering the clone before recursing. Using node.val as the key. Appending dfs(neighbor) to the clone's neighbor list inside the loop. These are small things, but under pressure, any of them can trip you up.

Spaced repetition makes these details automatic. You practice writing the hash map DFS traversal and the clone construction loop from memory at increasing intervals. After a few rounds, the pattern flows out without hesitation. You do not need to re-derive it. You just write it.

Number of Islands - The essential grid DFS problem, teaching the flood-fill traversal that is the simpler cousin of graph cloning
Course Schedule - Directed graph DFS with state tracking, showing how hash maps serve as visited sets in graph traversal

CodeBricks breaks Clone Graph into its hash-map-visited-DFS and parallel-structure-building building blocks, then drills them independently with spaced repetition. You type each piece from scratch until the pattern is automatic. When a deep copy graph problem shows up in your interview, you do not think about it. You just write it.

Try CodeBricks free