Skip to content
← All posts

Word Ladder: Shortest Transformation with BFS

6 min read
leetcodeproblemhardgraph

Given two words beginWord and endWord, and a dictionary wordList, find the length of the shortest transformation sequence from beginWord to endWord. Each step changes exactly one letter, and every intermediate word must be in wordList. If no such sequence exists, return 0.

This is LeetCode 127: Word Ladder, a classic BFS shortest path problem disguised as a string puzzle. The trick is recognizing that words form a graph.

Example: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]. The answer is 5: hit -> hot -> dot -> dog -> cog.

hithotdotlotdoglogcogshortest path (length 5)
Words as graph nodes. Edges connect words that differ by one letter. The green path is the shortest route from "hit" to "cog".

Why this is a graph problem

At first glance, this looks like a string manipulation exercise. But think about it differently. Every word is a node. Two nodes share an edge if they differ by exactly one letter. Now "find the shortest transformation sequence" is just "find the shortest path in an unweighted graph," and BFS solves that directly.

This reframing is the key insight. Once you see words as graph nodes and single-letter changes as edges, the problem collapses into standard BFS shortest path on an implicit graph. You never need to build the full graph upfront. You generate neighbors on the fly.

Generating neighbors

For a word like "hot", you need to find all words in the dictionary that differ by exactly one letter. The brute force approach would compare "hot" to every word in the list, but that is O(N * M) per word where N is the dictionary size and M is word length.

A faster approach: for each position in the word, try all 26 letters and check if the result exists in a set.

For "hot" (3 positions, 26 letters each):

  • Position 0: aot, bot, cot, dot (in set!), eot, ... lot (in set!), ...
  • Position 1: hat, hbt, ... hit (but already visited), ...
  • Position 2: hoa, hob, ... hot (same word, skip), ...

Using a set for the word list makes each lookup O(1). Generating neighbors for one word costs O(26 * M) = O(M), where M is the word length.

The BFS solution

BFS explores nodes level by level. Each level corresponds to one transformation step. When BFS first reaches endWord, the current level count is the answer. This works because BFS always finds the shortest path in an unweighted graph.

from collections import deque

def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
    word_set = set(word_list)
    if end_word not in word_set:
        return 0

    queue = deque([(begin_word, 1)])
    visited = {begin_word}

    while queue:
        word, level = queue.popleft()

        for i in range(len(word)):
            for c in "abcdefghijklmnopqrstuvwxyz":
                next_word = word[:i] + c + word[i + 1:]
                if next_word == end_word:
                    return level + 1
                if next_word in word_set and next_word not in visited:
                    visited.add(next_word)
                    queue.append((next_word, level + 1))

    return 0

Let's break this down.

First, convert wordList to a set for O(1) lookups. If endWord is not in the set, return 0 immediately since no valid path exists.

Initialize BFS with beginWord at level 1. The level count includes the starting word (the problem asks for the number of words in the sequence, not the number of steps).

For each word in the queue, generate all possible one-letter transformations. If a transformation equals endWord, return level + 1. If the transformation is in the word set and has not been visited, add it to the queue and mark it visited.

Mark words as visited when you add them to the queue, not when you pop them. If you wait until popping, multiple paths might enqueue the same word at the same level, wasting time and potentially causing incorrect results.

Visual walkthrough

Let's trace BFS through the example: beginWord = "hit", endWord = "cog".

Step 1: Initialize BFS with beginWord

level:1
queue:
hit
visited:
hit

Start BFS from "hit". Level count begins at 1.

Step 2: Process level 1. Explore "hit", find neighbor "hot".

level:2
queue:
hot
visited:
hithot

"hit" differs by one letter from "hot" (h_t). Add "hot" to the queue. Level 2 starts.

Step 3: Process level 2. Explore "hot", find neighbors "dot" and "lot".

level:3
queue:
dotlot
visited:
hithotdotlot

"hot" connects to "dot" (_ot) and "lot" (_ot). Both are new. Add them. Level 3 starts.

Step 4: Process level 3. Explore "dot" and "lot". Find "dog" and "log".

level:4
queue:
doglog
visited:
hithotdotlotdoglog

"dot" connects to "dog" (do_). "lot" connects to "log" (lo_). Both are new. Level 4 starts.

Step 5: Process level 4. Explore "dog" and "log". Both connect to "cog". Found endWord!

level:5
queue:
cog
visited:
hithotdotlotdoglogcog

Both "dog" and "log" connect to "cog" (_og). "cog" is the endWord. Return level 5.

BFS finds the shortest path in 5 levels. It explores words layer by layer, guaranteeing that the first time it reaches "cog" is via the shortest route.

Complexity analysis

MetricValue
TimeO(M^2 * N)
SpaceO(M * N)

M is the length of each word. N is the number of words in the dictionary.

Time: For each word, you generate neighbors by trying 26 letters at each of M positions. Building each neighbor string costs O(M) due to string slicing. In the worst case, BFS visits all N words. Total: O(M * 26 * M * N) = O(M^2 * N). The factor of 26 is constant and drops out.

Space: The visited set and queue can hold up to O(N) words, each of length M. Total: O(M * N).

Edge cases

  • endWord not in wordList: Return 0 immediately. No valid transformation can reach a word outside the dictionary.
  • beginWord equals endWord: The problem guarantees they differ, but if they were the same, the answer would be 1 (just the word itself). Some variants treat this differently, so check the constraints.
  • No path exists: BFS exhausts the queue without finding endWord. Return 0.
  • beginWord is in wordList: This does not cause issues. BFS marks it as visited at the start, so it will not be revisited.
  • Large dictionary, short words: The neighbor generation approach (26 * M per word) is more efficient than comparing every pair of words (N * M per word) when the dictionary is large.

The building blocks

This problem is built on one core reusable pattern that CodeBricks drills independently:

BFS shortest path on an implicit graph

The implicit graph pattern means you do not build an adjacency list upfront. Instead, you generate neighbors dynamically during BFS:

queue = deque([(start, 0)])
visited = {start}

while queue:
    node, dist = queue.popleft()
    for neighbor in get_neighbors(node):
        if neighbor == target:
            return dist + 1
        if neighbor not in visited:
            visited.add(neighbor)
            queue.append((neighbor, dist + 1))

This template appears across many problems. In Word Ladder, get_neighbors generates all one-letter transformations. In other problems, get_neighbors might return adjacent grid cells, valid knight moves, or states reachable by some operation. The BFS wrapper stays the same.

The key details to remember:

  • Use a set for O(1) visited checks
  • Mark visited on enqueue, not on dequeue
  • Track distance (level count) alongside each node in the queue
  • Return as soon as the target is found, since BFS guarantees shortest path

An optimization for this problem is bidirectional BFS. You run BFS from both beginWord and endWord simultaneously and stop when the two frontiers meet. This can reduce the search space dramatically, from O(26^d) to O(2 * 26^(d/2)) where d is the path length. It is a great follow-up to discuss in interviews, but the standard single-direction BFS is the expected solution.

From understanding to recall

You now understand the word ladder BFS approach. The graph modeling, the neighbor generation with 26 letters, the level tracking. It all makes sense when you read through it. But in an interview, you need to write the set conversion, the nested loop for neighbor generation, the string slicing, and the visited-on-enqueue pattern, all from memory and under time pressure.

That is the gap spaced repetition closes. CodeBricks breaks this problem into its building block (BFS shortest path on an implicit graph) and drills it at increasing intervals. You type the neighbor generation loop, the queue initialization, and the early return from scratch. After a few sessions, the pattern is in your fingers. When a word ladder or similar BFS shortest path problem appears in your interview, you do not stare at the screen trying to remember the details. You just write it.

Related posts

  • Number of Islands - Grid BFS/DFS traversal, another problem where recognizing the implicit graph structure is the key insight
  • Course Schedule - Graph BFS (Kahn's algorithm) applied to dependency ordering, showing how BFS generalizes beyond shortest path