Word Break II: All Valid Sentences with Backtracking
Given a string s and a dictionary of words wordDict, add spaces in s to construct sentences where every word belongs to the dictionary. Return all such possible sentences in any order.
This is LeetCode 140: Word Break II, and it builds directly on top of Word Break (LeetCode 139). Where Word Break asks "can the string be segmented?", Word Break II asks "give me every valid segmentation." The answer changes from a boolean to a list of sentences, and the algorithm shifts from pure DP to backtracking with memoization.
Examples:
s = "catsanddog",wordDict = ["cat","cats","and","sand","dog"]returns["cats and dog", "cat sand dog"]s = "pineapplepenapple",wordDict = ["apple","pen","applepen","pine","pineapple"]returns["pine apple pen apple", "pineapple pen apple", "pine applepen apple"]
Why this problem matters
Word Break II tests whether you can move from "does a solution exist?" to "enumerate all solutions." This transition appears constantly in interview problems. You already know Word Break proves feasibility with DP. Now you need backtracking to collect every valid path, plus memoization to avoid recomputing results for the same suffix.
The problem also forces you to think carefully about output size. Unlike Word Break, where the answer is a single boolean, the number of valid sentences can be exponential. A string like "aaa...a" with wordDict = ["a", "aa"] has 2^(n-1) valid segmentations. That means you cannot do better than exponential in the worst case, and your goal is to avoid redundant work rather than achieve polynomial time.
The approach
The algorithm uses backtracking with memoization:
- At position
start, try every prefixs[start:end]for increasing values ofend. - If the prefix is in the dictionary, recurse on the suffix starting at
end. - The recursive call returns a list of all valid sentences that can be formed from the suffix.
- For each sentence returned, prepend the current prefix and add a space.
- When
startreaches the end of the string, return a list containing an empty string (the base case indicating a successful split). - Cache the result for each
startindex so that the same suffix is never solved twice.
The key insight is that memoization on the start index stores the list of all valid sentences for a given suffix. If two different prefix choices lead to the same suffix, you reuse the cached result instead of recursing again.
The solution
def wordBreak(s, wordDict):
word_set = set(wordDict)
memo = {}
def backtrack(start):
if start in memo:
return memo[start]
if start == len(s):
return [""]
sentences = []
for end in range(start + 1, len(s) + 1):
prefix = s[start:end]
if prefix in word_set:
suffixes = backtrack(end)
for suffix in suffixes:
if suffix:
sentences.append(prefix + " " + suffix)
else:
sentences.append(prefix)
memo[start] = sentences
return sentences
return backtrack(0)
Walkthrough
Let's trace through s = "catsanddog" with wordDict = ["cat", "cats", "and", "sand", "dog"].
Step 1: Start at index 0 with an empty path. The full string is "catsanddog".
Try every prefix: "c", "ca", "cat", "cats", ... Only prefixes found in the dictionary lead to recursive calls.
Step 2: Prefix "cat" is in the dictionary. Recurse on "sanddog".
"c" and "ca" are not in the dictionary, so those prefixes are skipped. "cat" matches. Add it to the path and recurse.
Step 3: Prefix "sand" is in the dictionary. Recurse on "dog".
"s", "sa", "san" are not in the dictionary. "sand" matches. Continue recursing.
Step 4: Prefix "dog" is in the dictionary. Remaining string is empty. Record result.
The entire string has been consumed. Join the path with spaces to form "cat sand dog". Backtrack.
Step 5: Backtrack to index 0. Try prefix "cats" (also in the dictionary). Recurse on "anddog".
"cats" is also in the dictionary. This is a different valid first word. Recurse on the remainder.
Step 6: Prefix "and" is in the dictionary. Recurse on "dog".
"a" and "an" are not in the dictionary. "and" matches. Continue.
Step 7: Prefix "dog" matches again. Record second result.
The full string is consumed again via a different split. Record "cats and dog". Backtrack.
Step 8: All branches explored. Two valid sentences found.
No more dictionary prefixes remain to try at any level. Final output: ["cat sand dog", "cats and dog"]. Memoization ensures that if a suffix like "dog" appears multiple times, its result is cached and reused.
The recursion starts at index 0 and tries every prefix. When it finds "cat" in the dictionary, it recurses on "sanddog". Inside that call, "sand" leads to "dog", which leads to an empty remainder. The sentence "cat sand dog" is built bottom-up as the recursion unwinds.
Back at index 0, the next dictionary prefix is "cats". Recursing on "anddog" finds "and" then "dog", producing "cats and dog". No other dictionary prefixes start "catsanddog", so the algorithm finishes with two sentences.
Notice how memoization helps: if the suffix "dog" were reached from multiple paths, its result (just ["dog"]) would be computed once and reused. In this small example the savings are minimal, but for strings with many overlapping dictionary words the cache prevents exponential blowup.
Complexity
| Aspect | Complexity |
|---|---|
| Time | O(n * 2^n) worst case |
| Space | O(n * 2^n) worst case |
Time: In the worst case, every possible segmentation is valid (e.g., s = "aaa" with wordDict = ["a", "aa"]), and there are up to 2^(n-1) valid sentences, each up to length n. Memoization ensures each suffix is solved only once, but the total number of sentences stored across all cache entries can still be exponential.
Space: The memo dictionary can store up to 2^(n-1) total sentence strings across all entries. The recursion stack uses O(n). In practice, most inputs have far fewer valid segmentations.
The exponential worst case is unavoidable because the output itself can be exponential. The goal of memoization is not to make the algorithm polynomial, but to avoid recomputing the same suffix multiple times. Without memoization, the algorithm can re-solve the same suffix an exponential number of times on top of the already exponential output.
Building blocks
Word Break II is built from two reusable pieces.
Prefix matching backtracking
The recursive structure of trying every prefix at each position, then recursing on the remainder, is the same skeleton used in Palindrome Partitioning and Restore IP Addresses. The template is:
def backtrack(start):
if start == n:
return base_case
results = []
for end in range(start + 1, n + 1):
prefix = s[start:end]
if valid(prefix):
for suffix_result in backtrack(end):
results.append(combine(prefix, suffix_result))
return results
In Word Break II, valid checks dictionary membership. In Palindrome Partitioning, it checks whether the prefix is a palindrome. The skeleton is identical. Once you can write this template from memory, you can solve any prefix-partitioning problem by plugging in the right valid function.
Memoized recursion
The memoization pattern caches results by start index. Each unique suffix is solved exactly once, and the cached list of sentences is reused whenever the same start index appears again. This is the same pattern from Word Break (memoized top-down version), except the cache stores lists of strings instead of booleans.
Edge cases
Before submitting, check these:
- No valid segmentation:
s = "catsandog"withwordDict = ["cats", "dog", "sand", "and", "cat"]. The suffix"og"cannot be segmented, so the function returns an empty list. The memo will store[]for the index where"og"starts, preventing repeated dead-end exploration. - Entire string is one word:
s = "apple"withwordDict = ["apple"]. The prefix"apple"matches at index 0, the recursive call hits the base case, and the result is["apple"]. Only one sentence, no spaces needed. - Overlapping dictionary words:
s = "pineapplepenapple"withwordDict = ["apple", "pen", "applepen", "pine", "pineapple"]. Multiple valid first words ("pine"and"pineapple") lead to different suffixes, and some suffixes like"penapple"can themselves be split in multiple ways ("pen apple"via"pen"+"apple"). The algorithm explores all branches and memoization prevents re-solving shared suffixes.
A common mistake is running a Word Break I check first as an optimization, then running the backtracking only if the string is breakable. This works, but be careful: in the worst case (all segmentations valid), the DP check adds O(n^2) time without reducing the backtracking work. It helps most when the string is not breakable at all, short-circuiting an otherwise expensive recursion.
From understanding to recall
You see how Word Break II extends Word Break from "yes/no" to "give me everything." The backtracking skeleton is the same prefix-partitioning template from Palindrome Partitioning, and the memoization layer is the same cache-by-index idea from Word Break I. Two building blocks you already know, combined in one problem.
But can you write the memoized backtracking function, the prefix loop, the base case, and the sentence construction from scratch under time pressure? The prefix matching backtracking template and the memoized recursion pattern are two of roughly 60 reusable building blocks in the CodeBricks system. Drilling them individually with spaced repetition is far more effective than re-solving random problems and hoping the details stick.
Related posts
- Word Break - The boolean DP version of this problem. Same dictionary prefix matching, but only asks whether segmentation is possible, not for all segmentations.
- Palindrome Partitioning - Same prefix-partitioning backtracking skeleton, but the validity check is "is this prefix a palindrome?" instead of "is this prefix in the dictionary?"