Word Break: Dynamic Programming on Strings
You are given a string s and a list of dictionary words. Can you split the string into a sequence of one or more dictionary words, with no characters left over?
This is LeetCode 139: Word Break, and it is one of the most popular string segmentation DP problems out there. If you have solved Coin Change or Climbing Stairs, you already have the right mental model. Word Break applies the same "build up from smaller subproblems" idea, just on a string instead of a number line.
Why this problem matters
Word Break is the bridge between numeric DP and string DP. It teaches you how to:
- Think about substrings as subproblems: instead of "can I make amount 6?", you ask "can I segment the first 4 characters?"
- Use a boolean DP table: not every DP problem tracks a minimum or maximum. Here you track reachability.
- Try multiple split points: at each position, you consider every dictionary word that could end there.
Once you get comfortable with this pattern, problems like Palindrome Partitioning and Word Break II become much more approachable.
The brute force idea
At every position in the string, try every word in the dictionary. If the word matches the current substring, recurse on the rest. If any recursive path successfully reaches the end of the string, return True.
def word_break(s, word_dict):
def helper(start):
if start == len(s):
return True
for word in word_dict:
end = start + len(word)
if end <= len(s) and s[start:end] == word:
if helper(end):
return True
return False
return helper(0)
This works for small inputs, but the recursion tree can branch wildly. Consider s = "aaaaaab" with word_dict = ["a", "aa", "aaa"]. Every prefix of a's matches, so the recursion tries an exponential number of paths before discovering that no path can handle the trailing "b". The time complexity can reach O(2^n) in the worst case.
The worst case for brute force happens when many overlapping prefixes match but no valid segmentation exists. The recursion explores every possible combination of splits before giving up. This is the classic signal that memoization or bottom-up DP will help.
Adding memoization (top-down DP)
The key observation: helper(start) only depends on the index start. If you have already figured out whether the substring from start to the end can be segmented, you do not need to recompute it.
def word_break(s, word_dict):
memo = {}
def helper(start):
if start == len(s):
return True
if start in memo:
return memo[start]
for word in word_dict:
end = start + len(word)
if end <= len(s) and s[start:end] == word:
if helper(end):
memo[start] = True
return True
memo[start] = False
return False
return helper(0)
Now each starting index is solved at most once. Time: O(n * m * k) where n is the length of the string, m is the number of words in the dictionary, and k is the maximum word length (for substring comparison). Space: O(n) for the memo table plus call stack. Much better than exponential.
Building the DP table (bottom-up)
Instead of recursing from the beginning of the string forward, build up from position 0. Define dp[i] as: "can the substring s[0:i] be segmented into dictionary words?"
The recurrence:
dp[0] = True(the empty prefix can always be segmented)- For each position
ifrom 1 tolen(s), check every word in the dictionary. Ifwordhas lengthw, ands[i-w:i]equalsword, anddp[i-w]is True, then setdp[i] = True - The answer is
dp[len(s)]
Let's trace through s = "leetcode" and word_dict = ["leet", "code"]:
Base case: dp[0] = True
An empty string can always be segmented. dp[0] = True.
dp[1] through dp[3]: No dictionary word ends here
"l", "le", "lee" are not in the dictionary, and no shorter prefix works either. All stay False.
dp[4]: s[0:4] = "leet" is in the dictionary, and dp[0] is True!
We found a word that ends at index 4. Since dp[0] is True, everything before it was valid. dp[4] = True.
dp[5] through dp[7]: No valid segmentation ends here
"leetc", "leetco", "leetcod" cannot be segmented. For each position, no dictionary word ending there has a True dp value before it.
dp[8]: s[4:8] = "code" is in the dictionary, and dp[4] is True!
The substring from index 4 to 8 is "code", which is in the dictionary. dp[4] is True, so dp[8] = True. The string can be segmented!
The logic at each step is simple: for the current position i, look at every dictionary word. If a word fits (it ends exactly at position i and the characters match), check whether dp[i - len(word)] is True. If it is, that means everything before this word was already a valid segmentation, so now everything up through position i is valid too.
def word_break(s, word_dict):
n = len(s)
dp = [False] * (n + 1)
dp[0] = True
for i in range(1, n + 1):
for word in word_dict:
w = len(word)
if w <= i and dp[i - w] and s[i - w:i] == word:
dp[i] = True
break
return dp[n]
Time: O(n * m * k) where n is string length, m is dictionary size, and k is max word length.
Space: O(n) for the DP array.
Notice the break after setting dp[i] = True. Once you know position i is reachable, you do not need to check more words for that position.
Compare this to Coin Change. In Coin Change, dp[i] = min(dp[i - coin] + 1) over all coins, and you always check every coin because you want the minimum. In Word Break, dp[i] is a boolean. Once any word makes it True, you can stop early. The DP table structure is the same, but the transition logic is simpler.
Optimization: use a set for faster lookups
If the dictionary is large, iterating through every word for every position gets expensive. Convert the dictionary to a set so you can check membership in O(k) time. Also, cap your inner loop at the length of the longest word.
def word_break(s, word_dict):
word_set = set(word_dict)
max_len = max(len(w) for w in word_dict)
n = len(s)
dp = [False] * (n + 1)
dp[0] = True
for i in range(1, n + 1):
for j in range(max(0, i - max_len), i):
if dp[j] and s[j:i] in word_set:
dp[i] = True
break
return dp[n]
Instead of iterating over dictionary words, iterate over possible starting positions j for the last word. The inner loop now runs at most max_len times per position, which is often much smaller than the dictionary size.
Time: O(n * max_len * max_len) where the inner max_len accounts for substring hashing. In practice this is fast.
Space: O(n + total characters in dictionary) for the DP array and the set.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Brute force recursion | O(2^n) worst case | O(n) call stack |
| Memoized recursion | O(n * m * k) | O(n) memo + stack |
| Bottom-up DP | O(n * m * k) | O(n) array |
| Set-optimized DP | O(n * L * L) | O(n + dict) |
Where n = string length, m = number of words, k = max word length, L = max word length.
Edge cases to watch for
- Empty string:
dp[0] = True, return True immediately. - No valid segmentation: e.g.,
s = "catsandog"withword_dict = ["cats", "dog", "sand", "and", "cat"]. The substrings "cats" and "and" leave "og" which is not a word. Return False. - Overlapping words:
s = "cars"withword_dict = ["car", "ca", "rs"]. "car" leaves "s" (no match), but "ca" + "rs" works. The DP approach handles this because it checks every possible split. - Single character words: the inner loop just checks each character individually.
- Repeated characters:
s = "aaaaaaa"withword_dict = ["a", "aa"]. Many valid segmentations exist. DP only needs to find one.
The building blocks
Word Break is built on linear DP with constant state, the same fundamental pattern as Climbing Stairs and Coin Change. The structure is identical:
- State:
dp[i]= cans[0:i]be segmented? - Transition:
dp[i] = Trueif any word in the dictionary ends at positionianddp[i - len(word)]is True - Base case:
dp[0] = True
The parallel to Coin Change is almost exact. In Coin Change, you try every coin denomination at each amount. In Word Break, you try every dictionary word at each position. The "coins" are the words, and the "amount" is the string length.
This building block appears in related problems:
- Word Break II (LeetCode 140): same DP structure, but you also need to reconstruct all valid segmentations. Add backtracking on top of the boolean DP.
- Palindrome Partitioning (LeetCode 131): instead of dictionary words, the "valid pieces" are palindromic substrings. Same idea of checking whether a prefix is valid and whether what follows can also be split.
- Concatenated Words (LeetCode 472): run Word Break on each word using the other words as the dictionary. Same DP inside a loop.
From understanding to recall
You have seen how Word Break is really just Coin Change wearing a string costume. The DP table, the transition logic, the base case: all structurally the same. But recognizing that in an interview, under pressure, with a blank editor, is a different skill than recognizing it while reading a blog post.
Spaced repetition is how you turn recognition into recall. You practice writing the DP recurrence from scratch at increasing intervals, and after a few reps you stop thinking about it. The linear DP with constant state pattern is one of roughly 60 reusable building blocks in the CodeBricks system. Drilling them individually is far more effective than grinding random problems.
Related posts
- Coin Change - The classic unbounded knapsack DP, same building block with numeric amounts instead of string positions
- Climbing Stairs - The gentlest intro to linear DP, same pattern with a fixed lookback of 1 or 2