Longest Word in Dictionary: Building Words Character by Character
You have an array of strings representing a dictionary. Your job is to find the longest word that can be built one character at a time, where every prefix of that word also exists in the dictionary. If two words tie on length, return the lexicographically smaller one.
This is LeetCode 720: Longest Word in Dictionary, and it is a clean example of how sorting plus a hash set can replace a trie for prefix validation problems. The key insight: a word is "buildable" if every prefix of the word also exists in the dictionary. You do not need to build a full trie to check this. You just need to verify that removing the last character gives you a word you have already validated.
The problem
Given an array of strings words representing a dictionary, return the longest word in the dictionary that can be built one character at a time by other words in the dictionary. If multiple answers exist, return the one that is lexicographically smallest. If no valid word exists, return an empty string.
A word can be built one character at a time if every prefix of the word (except the empty string) is also in the dictionary.
words = ["w","wo","wor","worl","world"] -> "world"
words = ["a","banana","app","appl","ap","apply","apple"] -> "apple"
The approach
Sort the words array. Sorting does two things for you at once. First, shorter words come before longer words, so by the time you reach a word of length 5, you have already processed all words of length 1 through 4. Second, among words of the same length, lexicographically smaller words come first. This means the first valid word you find at any given length is automatically the best tie-breaker.
After sorting, iterate through the words. Maintain a set of "valid" words that can be built one character at a time. Seed it with the empty string. For each word, check whether word[:-1] (the word with its last character removed) is in the valid set. If it is, this word can be built by extending a valid shorter word by one character. Add it to the valid set and check whether it beats the current best.
That is the entire algorithm. No trie construction, no recursive traversal, no complex data structures. Just sort, scan, and check membership.
Sorting handles the lexicographic tie-breaking automatically. Since you process words in sorted order, the first valid word you encounter at any given length is guaranteed to be the lexicographically smallest. You never need an explicit comparison for ties.
The solution
def longestWord(words):
words.sort()
valid = {""}
best = ""
for word in words:
if word[:-1] in valid:
valid.add(word)
if len(word) > len(best):
best = word
return best
Start by sorting the input. Initialize a set containing just the empty string. The empty string acts as the base case: any single-character word has word[:-1] == "", which is in the set, so all single-character words are automatically valid.
Then loop through each word. The expression word[:-1] gives you everything except the last character. If that prefix is already in the valid set, it means the prefix itself was buildable, and extending it by one character keeps the chain going. Add the current word to the valid set and update best if this word is longer.
Because the array is sorted, you never encounter a longer word before its prefix. And because lexicographically smaller words come first at the same length, the first valid word at each new length is the best candidate.
Step-by-step walkthrough
Step 1: Sort the words
After sorting: ["a", "ap", "app", "appl", "apple", "apply", "banana"]. Shorter words come first, and ties are broken lexicographically. Initialize the valid set with the empty string.
Step 2: Process "a"
The prefix of "a" is the empty string, which is in the valid set. Add "a" to valid. Update best to "a".
Step 3: Process "ap"
The prefix "a" is in the valid set. Add "ap" to valid. Update best to "ap" (longer than "a").
Step 4: Process "app"
The prefix "ap" is in the valid set. Add "app" to valid. Update best to "app".
Step 5: Process "apple" and "apply"
After processing "appl" (valid), we reach "apple". Its prefix "appl" is in the set. Both "apple" and "apply" are length 5, but "apple" comes first lexicographically since we sorted. "banana" fails because "banan" is not in the set.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Sorting + hash set | O(n * k * log n) | O(n * k) |
| Trie-based | O(n * k) | O(n * k) |
Where n is the number of words and k is the maximum word length.
The sorting step takes O(n * k * log n) because comparing two strings of length k takes O(k) time, and sorting n strings requires O(n * log n) comparisons. The scan afterward is O(n * k) for hashing each word. The trie approach avoids the sorting cost by building a prefix tree and doing a DFS for the deepest complete-word path, but it requires more code and is harder to get right under time pressure.
The building blocks
Sort then validate pattern
This problem follows a pattern that shows up in several LeetCode problems: sort the input to establish an ordering guarantee, then do a single pass to validate each element against what you have seen so far. The sort ensures that when you process element i, all elements that element i depends on have already been processed.
You see this same idea in Longest Increasing Subsequence (sort to establish order, then build up valid subsequences) and in Largest Number (custom sort to determine element ordering). The core technique is always the same: use sorting to turn a hard ordering problem into a simple left-to-right scan.
When should you use a trie instead of a hash set for prefix problems? If the problem requires prefix queries like "find all words starting with X" or "find the shortest prefix match," a trie gives you O(m) lookups without storing every prefix explicitly. But if you only need to check whether specific prefixes exist in a known dictionary, a hash set is simpler and just as fast. This problem only checks whether word[:-1] is valid, so a set is the right tool.
Edge cases
- All single-character words: every single character is buildable (its prefix is the empty string). Return the lexicographically smallest one.
- No buildable words beyond length 1: if the dictionary has
["a", "bc"], then"bc"is not buildable because"b"is missing. The answer is"a". - Multiple words of the same maximum length:
["a", "ab", "b", "bc"]has two buildable length-2 words. Sorting ensures"ab"is encountered before"bc", so"ab"wins. - Empty input: return
"". The loop never executes andbeststays empty. - Duplicate words: duplicates do not affect correctness. The first occurrence adds the word to the valid set, and subsequent duplicates just find it already there.
From understanding to recall
You have seen how this problem boils down to sorting plus a single-pass set check. The code is short and the logic is clean. But can you reproduce it from scratch in five minutes? The details matter: seeding the set with "", using word[:-1] instead of manually slicing, updating best only when strictly longer (not equal, because sorting already handles ties).
Spaced repetition is what turns this from "I read about it" into "I can write it cold." You practice the sort-then-validate pattern at increasing intervals until it becomes automatic. This building block appears in enough problems that the investment pays off quickly.
Related posts
- Implement Trie (Prefix Tree) - The trie data structure this problem can use as an alternative approach
- Word Break - Another problem about building words from a dictionary using DP
- Longest Common Prefix - Another prefix-based string problem using character-by-character comparison
CodeBricks breaks the longest word in dictionary problem into its sort-then-validate and prefix membership building blocks, then drills them independently with spaced repetition. You type each piece from scratch until the pattern is automatic. When a prefix validation problem shows up in your interview, you do not think about it. You just write it.