Word Ladder II: All Shortest Transformation Paths

May 13, 20266 min read

leetcodeproblemhardgraphbacktracking

Given beginWord, endWord, and a wordList, find all the shortest transformation sequences from beginWord to endWord. Each transformation changes exactly one letter, and every intermediate word must exist in wordList. Return all shortest paths, or an empty list if no path exists.

This is LeetCode 126: Word Ladder II, the harder follow-up to Word Ladder (LeetCode 127). Instead of returning just the length of the shortest path, you need to return every shortest path. That changes the algorithm significantly.

Example: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]

Output: [["hit","hot","dot","dog","cog"], ["hit","hot","lot","log","cog"]]

Both paths have length 5, and both are valid shortest transformation sequences.

Word graph with BFS distances. Two shortest paths from "hit" to "cog", each of length 5. The d= labels show BFS distance from "hit".

Why this is harder than Word Ladder

In Word Ladder, a single BFS finds the shortest path length. You mark words as visited on enqueue and never revisit them. That approach finds one shortest path (implicitly), but it throws away all other shortest paths along the way.

For Word Ladder II, you need to keep every shortest path. If "dot" and "lot" are both reachable at distance 2, you cannot discard either. If both lead to "cog" through different routes of the same total length, both paths must appear in the output.

A pure BFS that tracks full paths in the queue would work but blows up in memory. Each queue entry would store an entire path, and the number of paths can be exponential. The better approach splits the work into two phases.

The approach: BFS distances, then DFS backtracking

Phase 1: BFS forward. Run BFS from beginWord. For every word reachable from beginWord, record its shortest distance. This builds a distance map: dist[word] = shortest number of steps from beginWord.

Phase 2: DFS backward. Starting from endWord, run DFS. At each step, only move to neighbors whose distance is exactly one less than the current word's distance. This ensures you only follow edges that are part of some shortest path. When you reach beginWord, you have found a complete shortest path. Collect all such paths.

The key insight is that a neighbor next_word is on a shortest path from beginWord through word only if dist[next_word] == dist[word] - 1. This condition prunes the DFS to only explore shortest-path edges, preventing exponential blowup in most cases.

Practice BFS + DFS backtracking with spaced repetition

The Python solution

from collections import deque, defaultdict

def find_ladders(begin_word: str, end_word: str, word_list: list[str]) -> list[list[str]]:
    word_set = set(word_list)
    if end_word not in word_set:
        return []

    dist = {begin_word: 0}
    queue = deque([begin_word])

    while queue:
        word = queue.popleft()
        for i in range(len(word)):
            for c in "abcdefghijklmnopqrstuvwxyz":
                next_word = word[:i] + c + word[i + 1:]
                if next_word in word_set and next_word not in dist:
                    dist[next_word] = dist[word] + 1
                    queue.append(next_word)

    if end_word not in dist:
        return []

    result = []

    def backtrack(word, path):
        if word == begin_word:
            result.append(list(path))
            return
        for i in range(len(word)):
            for c in "abcdefghijklmnopqrstuvwxyz":
                prev_word = word[:i] + c + word[i + 1:]
                if prev_word in dist and dist[prev_word] == dist[word] - 1:
                    path.appendleft(prev_word)
                    backtrack(prev_word, path)
                    path.popleft()

    backtrack(end_word, deque([end_word]))
    return result

Visual walkthrough

Let's trace both phases through the example: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"].

Phase 1, Step 1: BFS from beginWord to build distance map

queue: [hit]

BFS layer by layer. Record the shortest distance from "hit" to every reachable word.

Phase 1, Step 2: Process "hit" at distance 0

hit:0hot:1

queue: [hot]

Generate neighbors of "hit". Only "hot" is in the word set. Record dist["hot"] = 1.

Phase 1, Step 3: Process "hot" at distance 1

hit:0hot:1dot:2lot:2

queue: [dot, lot]

"hot" connects to "dot" and "lot". Both are new. Record distance 2 for each.

Phase 1, Step 4: Process "dot" and "lot" at distance 2

hit:0hot:1dot:2lot:2dog:3log:3

queue: [dog, log]

"dot" connects to "dog". "lot" connects to "log". Both are new at distance 3. BFS continues.

Phase 1, Step 5: Process "dog" and "log" at distance 3. Found endWord.

hit:0hot:1dot:2lot:2dog:3log:3cog:4

BFS complete. Shortest path length: 5 words.

Both "dog" and "log" connect to "cog" at distance 4. BFS distance map is complete.

Phase 2: DFS backtrack from endWord to beginWord

cog (d=4) → neighbors with d=3: dog, log

dog (d=3) → neighbors with d=2: dot

dot (d=2) → neighbors with d=1: hot

hot (d=1) → neighbors with d=0: hit ✓

log (d=3) → neighbors with d=2: lot

lot (d=2) → neighbors with d=1: hot

hot (d=1) → neighbors with d=0: hit ✓

Recovered paths (reversed):

hit→hot→dot→dog→cog

hit→hot→lot→log→cog

Start at "cog" (d=4). At each step, only move to neighbors whose distance is exactly one less. This guarantees every path found is a shortest path.

Phase 1 builds the distance map via BFS. Phase 2 reconstructs all shortest paths by walking backward from "cog", only following edges where the distance decreases by exactly one. This recovers both ["hit","hot","dot","dog","cog"] and ["hit","hot","lot","log","cog"].

Complexity analysis

Metric	Value
Time	O(M^2 * N + P * L)
Space	O(M * N + P * L)

M is the word length. N is the number of words in the dictionary. P is the number of shortest paths. L is the length of each shortest path.

Time: The BFS phase visits each word once and generates neighbors in O(26 * M) per word, with O(M) cost per string slice. That gives O(M^2 * N) for BFS. The DFS phase explores only shortest-path edges, and in the worst case it outputs P paths of length L each. Total: O(M^2 * N + P * L).

Space: The distance map stores up to N entries of length M, costing O(M * N). The output stores P paths of L words each, costing O(P * L). The DFS recursion stack is at most L deep. Total: O(M * N + P * L).

Drill BFS distance maps from memory

The building blocks

This problem combines two reusable patterns that CodeBricks drills independently.

BFS for shortest distances

The first building block is standard BFS on an implicit graph, but instead of stopping at the target, you run BFS to completion and store the distance to every reachable word:

dist = {start: 0}
queue = deque([start])

while queue:
    node = queue.popleft()
    for neighbor in get_neighbors(node):
        if neighbor not in dist:
            dist[neighbor] = dist[node] + 1
            queue.append(neighbor)

This is the same BFS template from Word Ladder, but you record distances in a dictionary instead of returning early. The distance map becomes the guide for Phase 2.

DFS backtracking for path reconstruction

The second building block is DFS with backtracking, constrained by the distance map. Starting from the target, you only recurse into neighbors whose distance is one less:

def backtrack(node, path):
    if node == start:
        result.append(list(path))
        return
    for neighbor in get_neighbors(node):
        if dist.get(neighbor) == dist[node] - 1:
            path.appendleft(neighbor)
            backtrack(neighbor, path)
            path.popleft()

The dist[neighbor] == dist[node] - 1 check is what makes this work. It guarantees that every edge you traverse is part of a shortest path. Without this constraint, DFS would explore non-shortest-path edges and produce incorrect results.

This pattern of "BFS to build a level structure, then DFS to enumerate paths within that structure" is reusable. It appears in any problem that asks you to find all shortest paths in an unweighted graph.

Edge cases

No path exists: If endWord is not in the word list, or BFS never reaches endWord, return an empty list. The end_word not in dist check after BFS handles this.
Single transformation: If beginWord and endWord differ by one letter and endWord is in the word list, the only path is [beginWord, endWord]. BFS sets dist[endWord] = 1, and backtracking immediately finds the two-word path.
Multiple paths of the same length: This is the core challenge. The BFS + DFS approach naturally handles it because BFS records all words at their shortest distance, and DFS explores all valid predecessors at each step.
beginWord in wordList: No issue. BFS marks it at distance 0, so it will not be re-added at a later distance.
Large word lists with many shortest paths: The number of shortest paths can be exponential in theory. The algorithm's time is proportional to the output size, so it is as efficient as possible for this problem.

From understanding to recall

Reading through the BFS + DFS approach makes sense. The distance map guides the backtracking, the dist[neighbor] == dist[word] - 1 check prunes non-shortest edges, and the recursion naturally collects all valid paths. But writing this from scratch in an interview is a different challenge.

You need to remember two distinct phases. In Phase 1, you run BFS to completion (not stopping early like in Word Ladder). In Phase 2, you backtrack from endWord, not beginWord, and build paths in reverse. The distance check dist[prev] == dist[current] - 1 is the critical detail that is easy to forget under pressure.

CodeBricks breaks this into its two building blocks and drills each one separately. You practice BFS distance maps until the "record dist and keep going" pattern is automatic. You practice DFS backtracking with constraints until the "check the distance, recurse, undo" rhythm is in your fingers. When Word Ladder II or a similar "find all shortest paths" problem comes up, you assemble the two blocks and write it cleanly.

Word Ladder - The simpler version (LeetCode 127) that finds just the shortest path length using BFS
Course Schedule - Another graph BFS/DFS problem where building the graph from an edge list and traversing it are separate skills

Try CodeBricks free