Concatenated Words: Word Break for Every Word
You have a list of words. Some of those words can be built entirely from other, shorter words in the same list. Your job is to find all such "concatenated words" and return them.
This is LeetCode 472: Concatenated Words, and it is really just the Word Break problem run inside a loop. If you can solve Word Break for a single string against a dictionary, you can solve this problem by applying that same check to every word in the input.
The problem
Given an array of strings words, return all the concatenated words in the list. A concatenated word is a string that can be formed by concatenating at least two shorter words from the same array.
For example, given words = ["cat", "cats", "catsdogcats", "dog", "dogcatsdog", "hippopotamuses", "rat", "ratcatdogcat"], the concatenated words are ["catsdogcats", "dogcatsdog", "ratcatdogcat"].
Why this problem matters
Concatenated Words tests whether you truly understand the Word Break DP pattern or just memorized it for one specific problem. Here you need to:
- Generalize Word Break: instead of running DP once against a fixed dictionary, you run it for every word in the input, using the other words as the dictionary.
- Decide what goes in the dictionary: you cannot put a word in its own dictionary. Sorting by length and building the set incrementally handles this naturally.
- Manage the "at least 2 pieces" constraint: a word that appears in the dictionary as-is does not count. It must be composed of 2 or more shorter words.
This pattern shows up in dictionary and NLP applications where you need to identify compound words, and it is a common follow-up in interviews after Word Break.
The key insight
Sort all words by length. Then process them shortest-first. For each word, run the Word Break DP check against a set of all shorter words you have already processed. If the word can be segmented into 2 or more pieces, it is a concatenated word, so add it to your results. Whether or not it is concatenated, add it to the word set so that longer words can use it as a building block.
Why sort by length? Because a concatenated word is always longer than each of its component words. By the time you process a word, every word that could be one of its pieces is already in the set. You never need to look ahead.
Why not add a word to the set before checking it? Because you do not want a word to count as a concatenation of itself. By checking first and adding second, you guarantee the DP only finds segmentations using strictly shorter words.
If two words have the same length, neither can be a component of the other, so processing order among same-length words does not matter. The sort only needs to separate shorter from longer.
The code
def find_all_concatenated_words(words):
words.sort(key=len)
word_set = set()
result = []
for word in words:
if can_form(word, word_set):
result.append(word)
word_set.add(word)
return result
def can_form(word, word_set):
if not word or not word_set:
return False
n = len(word)
dp = [False] * (n + 1)
dp[0] = True
for i in range(1, n + 1):
for j in range(i):
if dp[j] and word[j:i] in word_set:
dp[i] = True
break
return dp[n]
The can_form function is the standard Word Break DP. dp[i] is True when word[0:i] can be segmented into words from word_set. Since word itself is never in word_set when we check it, any True result at dp[n] means the word is formed from at least 2 shorter pieces.
Why this works
The outer loop processes words from shortest to longest. When you reach a word of length L, the set contains every word shorter than L (plus any same-length words processed earlier, which cannot participate in a segmentation of a same-length word anyway, since they are not shorter).
The inner can_form function is identical to the bottom-up Word Break DP:
dp[0] = Truebecause the empty prefix is always valid.- For each position
i, try every split pointj. Ifdp[j]is True andword[j:i]is in the set, thendp[i] = True. - If
dp[n]is True, the entire word can be composed from dictionary words.
Because we only add a word to the set after checking it, a word like "cat" cannot use itself. It can only be marked as concatenated if shorter words in the set build it up. The constraint of "at least 2 shorter words" is satisfied automatically: if dp[n] is True, it means we found at least one split point where both halves resolved to set words.
Visual walkthrough
Word: "catsdogcats" | Set: {"cat", "cats", "dog", "dogs", "s"}
Each dp[i] tracks whether s[0:i] can be segmented into words from the set.
Step 1: Base case dp[0] = True
The empty prefix is always a valid segmentation. This is our starting point.
Step 2: dp[3] = True via "cat"
s[0:3] = "cat" is in the word set, and dp[0] is True. Mark dp[3] = True.
Step 3: dp[4] = True via "cats" or "s"
s[0:4] = "cats" is in the set (dp[0]=T). Also s[3:4] = "s" is in the set (dp[3]=T). Either path works.
Step 4: dp[7] = True via "dog"
s[4:7] = "dog" is in the set and dp[4] is True. The prefix "cats" + "dog" is valid.
Step 5: dp[10] = True via "cat"
s[7:10] = "cat" is in the set and dp[7] is True. Three segments so far: "cats" + "dog" + "cat".
Step 6: dp[11] = True via "cats" or "s"
s[7:11] = "cats" (dp[7]=T), or s[10:11] = "s" (dp[10]=T). dp[11] = True. The entire word can be formed from 2+ shorter words!
Optimizing with max word length
In the inner loop, you do not need to check every possible split point. You only need to check split points where word[j:i] could be a word in the set. If the longest word in the set has length max_len, you can cap the inner loop.
def can_form(word, word_set, max_len):
if not word or not word_set:
return False
n = len(word)
dp = [False] * (n + 1)
dp[0] = True
for i in range(1, n + 1):
for j in range(max(0, i - max_len), i):
if dp[j] and word[j:i] in word_set:
dp[i] = True
break
return dp[n]
This cuts down unnecessary iterations when words are short but the target word is very long.
Complexity analysis
| Metric | Time | Space |
|---|---|---|
| Overall | O(n * L^2) | O(n * L) |
Where n is the number of words and L is the maximum word length. For each of the n words, the DP runs in O(L^2) time (L positions, each checking up to L split points, with O(L) substring hashing). Space is O(n * L) for the word set plus the DP array.
Sorting takes O(n log n) which is dominated by the DP work.
Building blocks
This problem combines a few core techniques:
- Word Break DP: the
can_formfunction is the standard boolean DP for string segmentation. If you know Word Break, you already know the hard part. - Sorting by length: ensures every potential component word is in the set before you need it. This is the same "process smaller items first" idea that appears in problems like Largest Divisible Subset.
- Set lookup: converting the word list to a set gives O(L) average-case membership checks, which is critical for keeping the DP transitions fast.
Edge cases
- Empty string in input: an empty string has no characters, so it cannot be formed by concatenating 2 or more words. Skip it or let the DP handle it (the
if not wordguard returns False). - Single-character words:
"a"cannot be concatenated from shorter words because there are no shorter non-empty words. It gets added to the set for others to use. - Duplicate words: if
"cat"appears twice, the second occurrence would be checked against a set that already contains"cat". Since"cat"is a single word (not 2+ pieces), the DP returns False for it. Duplicates are harmless. - Very long words: the O(L^2) DP per word means extremely long words (thousands of characters) could be slow. The max-length optimization helps, but the worst case is still quadratic in word length.
- All words are concatenated: works fine. Each word gets checked and added in order, and the result list can be as large as the input.
From understanding to recall
You have seen that Concatenated Words is just Word Break in a loop, with a sort to ensure the dictionary is ready before each check. The pattern is clean: sort by length, build the set incrementally, run the same DP you already know. But seeing it on a blog post and writing it from scratch under time pressure are different skills.
Spaced repetition bridges that gap. You practice the Word Break DP, then the "sort and iterate" wrapper, at increasing intervals until the full solution flows without hesitation. Concatenated Words is one of roughly 60 reusable building blocks in the CodeBricks system. Drilling these blocks individually is far more effective than grinding random problems.
Related posts
- Word Break - The core DP technique this problem builds on
- Word Break II - Word Break with all decompositions
- Implement Trie - Trie structure for efficient word lookups