Skip to content
← All posts

Word Break II: All Valid Sentences with Backtracking

6 min read
leetcodeproblemhardstringsdynamic-programmingbacktracking

Given a string s and a dictionary of words wordDict, add spaces in s to construct sentences where every word belongs to the dictionary. Return all such possible sentences in any order.

This is LeetCode 140: Word Break II, and it builds directly on top of Word Break (LeetCode 139). Where Word Break asks "can the string be segmented?", Word Break II asks "give me every valid segmentation." The answer changes from a boolean to a list of sentences, and the algorithm shifts from pure DP to backtracking with memoization.

Examples:

  • s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"] returns ["cats and dog", "cat sand dog"]
  • s = "pineapplepenapple", wordDict = ["apple","pen","applepen","pine","pineapple"] returns ["pine apple pen apple", "pineapple pen apple", "pine applepen apple"]
"cat""cats""c","ca"..."sand""s","sa"..."and""a","an"..."dog""dog""catsanddog"rem="sanddog"rem="anddog"not in dictrem="dog"not in dictrem="dog"not in dict"cat sand dog""cats and dog"startexploringvalid sentencenot in dictionary (pruned)
Recursion tree for splitting "catsanddog". At each node, the algorithm tries every prefix of the remaining string. If the prefix is in the dictionary, it recurses on the suffix. Dashed red lines are prefixes not found in the dictionary. Green leaves are complete valid sentences.

Why this problem matters

Word Break II tests whether you can move from "does a solution exist?" to "enumerate all solutions." This transition appears constantly in interview problems. You already know Word Break proves feasibility with DP. Now you need backtracking to collect every valid path, plus memoization to avoid recomputing results for the same suffix.

The problem also forces you to think carefully about output size. Unlike Word Break, where the answer is a single boolean, the number of valid sentences can be exponential. A string like "aaa...a" with wordDict = ["a", "aa"] has 2^(n-1) valid segmentations. That means you cannot do better than exponential in the worst case, and your goal is to avoid redundant work rather than achieve polynomial time.

The approach

The algorithm uses backtracking with memoization:

  1. At position start, try every prefix s[start:end] for increasing values of end.
  2. If the prefix is in the dictionary, recurse on the suffix starting at end.
  3. The recursive call returns a list of all valid sentences that can be formed from the suffix.
  4. For each sentence returned, prepend the current prefix and add a space.
  5. When start reaches the end of the string, return a list containing an empty string (the base case indicating a successful split).
  6. Cache the result for each start index so that the same suffix is never solved twice.

The key insight is that memoization on the start index stores the list of all valid sentences for a given suffix. If two different prefix choices lead to the same suffix, you reuse the cached result instead of recursing again.

The solution

def wordBreak(s, wordDict):
    word_set = set(wordDict)
    memo = {}

    def backtrack(start):
        if start in memo:
            return memo[start]
        if start == len(s):
            return [""]

        sentences = []
        for end in range(start + 1, len(s) + 1):
            prefix = s[start:end]
            if prefix in word_set:
                suffixes = backtrack(end)
                for suffix in suffixes:
                    if suffix:
                        sentences.append(prefix + " " + suffix)
                    else:
                        sentences.append(prefix)

        memo[start] = sentences
        return sentences

    return backtrack(0)

Walkthrough

Let's trace through s = "catsanddog" with wordDict = ["cat", "cats", "and", "sand", "dog"].

Step 1: Start at index 0 with an empty path. The full string is "catsanddog".

path: []
remaining: "catsanddog"

Try every prefix: "c", "ca", "cat", "cats", ... Only prefixes found in the dictionary lead to recursive calls.

Step 2: Prefix "cat" is in the dictionary. Recurse on "sanddog".

path: ["cat"]
remaining: "sanddog"

"c" and "ca" are not in the dictionary, so those prefixes are skipped. "cat" matches. Add it to the path and recurse.

Step 3: Prefix "sand" is in the dictionary. Recurse on "dog".

path: ["cat", "sand"]
remaining: "dog"

"s", "sa", "san" are not in the dictionary. "sand" matches. Continue recursing.

Step 4: Prefix "dog" is in the dictionary. Remaining string is empty. Record result.

path: ["cat", "sand", "dog"]
remaining: ""
results so far:"cat sand dog"

The entire string has been consumed. Join the path with spaces to form "cat sand dog". Backtrack.

Step 5: Backtrack to index 0. Try prefix "cats" (also in the dictionary). Recurse on "anddog".

path: ["cats"]
remaining: "anddog"
results so far:"cat sand dog"

"cats" is also in the dictionary. This is a different valid first word. Recurse on the remainder.

Step 6: Prefix "and" is in the dictionary. Recurse on "dog".

path: ["cats", "and"]
remaining: "dog"
results so far:"cat sand dog"

"a" and "an" are not in the dictionary. "and" matches. Continue.

Step 7: Prefix "dog" matches again. Record second result.

path: ["cats", "and", "dog"]
remaining: ""
results so far:"cat sand dog""cats and dog"

The full string is consumed again via a different split. Record "cats and dog". Backtrack.

Step 8: All branches explored. Two valid sentences found.

path: []
remaining: "done"
results so far:"cat sand dog""cats and dog"

No more dictionary prefixes remain to try at any level. Final output: ["cat sand dog", "cats and dog"]. Memoization ensures that if a suffix like "dog" appears multiple times, its result is cached and reused.

The recursion starts at index 0 and tries every prefix. When it finds "cat" in the dictionary, it recurses on "sanddog". Inside that call, "sand" leads to "dog", which leads to an empty remainder. The sentence "cat sand dog" is built bottom-up as the recursion unwinds.

Back at index 0, the next dictionary prefix is "cats". Recursing on "anddog" finds "and" then "dog", producing "cats and dog". No other dictionary prefixes start "catsanddog", so the algorithm finishes with two sentences.

Notice how memoization helps: if the suffix "dog" were reached from multiple paths, its result (just ["dog"]) would be computed once and reused. In this small example the savings are minimal, but for strings with many overlapping dictionary words the cache prevents exponential blowup.

Complexity

AspectComplexity
TimeO(n * 2^n) worst case
SpaceO(n * 2^n) worst case

Time: In the worst case, every possible segmentation is valid (e.g., s = "aaa" with wordDict = ["a", "aa"]), and there are up to 2^(n-1) valid sentences, each up to length n. Memoization ensures each suffix is solved only once, but the total number of sentences stored across all cache entries can still be exponential.

Space: The memo dictionary can store up to 2^(n-1) total sentence strings across all entries. The recursion stack uses O(n). In practice, most inputs have far fewer valid segmentations.

The exponential worst case is unavoidable because the output itself can be exponential. The goal of memoization is not to make the algorithm polynomial, but to avoid recomputing the same suffix multiple times. Without memoization, the algorithm can re-solve the same suffix an exponential number of times on top of the already exponential output.

Building blocks

Word Break II is built from two reusable pieces.

Prefix matching backtracking

The recursive structure of trying every prefix at each position, then recursing on the remainder, is the same skeleton used in Palindrome Partitioning and Restore IP Addresses. The template is:

def backtrack(start):
    if start == n:
        return base_case
    results = []
    for end in range(start + 1, n + 1):
        prefix = s[start:end]
        if valid(prefix):
            for suffix_result in backtrack(end):
                results.append(combine(prefix, suffix_result))
    return results

In Word Break II, valid checks dictionary membership. In Palindrome Partitioning, it checks whether the prefix is a palindrome. The skeleton is identical. Once you can write this template from memory, you can solve any prefix-partitioning problem by plugging in the right valid function.

Memoized recursion

The memoization pattern caches results by start index. Each unique suffix is solved exactly once, and the cached list of sentences is reused whenever the same start index appears again. This is the same pattern from Word Break (memoized top-down version), except the cache stores lists of strings instead of booleans.

Edge cases

Before submitting, check these:

  • No valid segmentation: s = "catsandog" with wordDict = ["cats", "dog", "sand", "and", "cat"]. The suffix "og" cannot be segmented, so the function returns an empty list. The memo will store [] for the index where "og" starts, preventing repeated dead-end exploration.
  • Entire string is one word: s = "apple" with wordDict = ["apple"]. The prefix "apple" matches at index 0, the recursive call hits the base case, and the result is ["apple"]. Only one sentence, no spaces needed.
  • Overlapping dictionary words: s = "pineapplepenapple" with wordDict = ["apple", "pen", "applepen", "pine", "pineapple"]. Multiple valid first words ("pine" and "pineapple") lead to different suffixes, and some suffixes like "penapple" can themselves be split in multiple ways ("pen apple" via "pen" + "apple"). The algorithm explores all branches and memoization prevents re-solving shared suffixes.

A common mistake is running a Word Break I check first as an optimization, then running the backtracking only if the string is breakable. This works, but be careful: in the worst case (all segmentations valid), the DP check adds O(n^2) time without reducing the backtracking work. It helps most when the string is not breakable at all, short-circuiting an otherwise expensive recursion.

From understanding to recall

You see how Word Break II extends Word Break from "yes/no" to "give me everything." The backtracking skeleton is the same prefix-partitioning template from Palindrome Partitioning, and the memoization layer is the same cache-by-index idea from Word Break I. Two building blocks you already know, combined in one problem.

But can you write the memoized backtracking function, the prefix loop, the base case, and the sentence construction from scratch under time pressure? The prefix matching backtracking template and the memoized recursion pattern are two of roughly 60 reusable building blocks in the CodeBricks system. Drilling them individually with spaced repetition is far more effective than re-solving random problems and hoping the details stick.

Related posts

  • Word Break - The boolean DP version of this problem. Same dictionary prefix matching, but only asks whether segmentation is possible, not for all segmentations.
  • Palindrome Partitioning - Same prefix-partitioning backtracking skeleton, but the validity check is "is this prefix a palindrome?" instead of "is this prefix in the dictionary?"