Vowel Spellchecker: Priority-Based Fuzzy Matching
LeetCode Vowel Spellchecker (problem 966) asks you to implement a spellchecker that can tolerate capitalization mistakes and vowel confusion. Given a wordlist and a set of queries, you need to return the best match for each query according to a strict priority system. It is a great exercise in using multiple hash maps to handle tiered lookup logic.
The problem
You are given a list of correct words (wordlist) and a list of queries. For each query, apply these matching rules in priority order:
- Exact match: If the query appears in the wordlist exactly as spelled, return it.
- Case-insensitive match: If the query matches a word in the wordlist ignoring capitalization, return the first such word from the wordlist.
- Vowel-insensitive match: If the query matches a word ignoring both capitalization and treating all vowels (a, e, i, o, u) as interchangeable, return the first such word from the wordlist.
- No match: If none of the above apply, return an empty string
"".
For example, given wordlist = ["KiTe", "kite", "hare", "Hare"]:
- Query
"kite"returns"kite"(exact match, priority 1) - Query
"Kite"returns"KiTe"(case-insensitive, priority 2, first match wins) - Query
"hre"returns"hare"(vowel-insensitive, priority 3) - Query
"xyz"returns""(no match at all)
Why this problem matters
Vowel Spellchecker teaches you a pattern that appears constantly in real software: priority-based lookup with progressively looser matching. Think about how search engines handle typos, or how DNS resolution tries multiple sources in order. The core idea is always the same. You check the most specific match first, then fall back to broader matches.
The problem also teaches you about canonical form hashing, which is the technique of transforming inputs into a standard representation before using them as hash map keys. Here the canonical forms are "lowercased" (for case-insensitive matching) and "devoweled + lowercased" (for vowel-insensitive matching). This same idea powers problems like Group Anagrams, where you sort characters to create a canonical key.
Finally, the "first match wins" rule forces you to think carefully about insertion order when building your maps. You cannot just overwrite keys blindly. You must only store the first word you encounter for each canonical form.
The brute force approach
The naive solution is to iterate through queries one by one, and for each query, scan the entire wordlist checking all three rules. First you scan for an exact match. If that fails, you scan again for a case-insensitive match. If that also fails, you scan a third time for a vowel-insensitive match.
This works but costs O(n) per query, giving O(n * q) total time where n is the wordlist size and q is the number of queries. For large inputs, that is too slow.
The key insight
Instead of scanning the wordlist for every query, you can pre-build three lookup structures from the wordlist in a single pass. An exact-match set, a case-insensitive map, and a vowel-insensitive map. Then each query becomes three O(1) lookups instead of three O(n) scans.
The trick for vowel-insensitive matching is the devowel function: convert the word to lowercase, then replace every vowel with a placeholder character like "*". Two words that differ only in vowels and case will produce the same devoweled key.
The devowel function is the heart of this problem. "KiTe" becomes "k*t*", "kite" becomes "k*t*", and "kote" also becomes "k*t*". Any word that matches the same consonant skeleton will collide in the vowel map, which is exactly the behavior you want.
Walking through it step by step
Step 1: Build the exact match set from the wordlist.
Store every word exactly as it appears. This set is used for O(1) exact lookups.
Step 2: Build the case-insensitive map (lowered key -> first word).
Only the first word for each lowered key is stored. "KiTe" is stored for "kite" because it appears first.
Step 3: Build the vowel-insensitive map (devoweled key -> first word).
Replace all vowels with "*" after lowering. "KiTe" becomes "k*t*". Again, only the first word per key is kept.
Step 4: Query "kite". Check exact set first.
"kite" is in words_exact, so return "kite" immediately. No need to check other maps.
Step 5: Query "Kite". No exact match, try case-insensitive.
"Kite" is not in words_exact. Lower it to "kite" and find it in words_lower. Return "KiTe" (the first match).
Step 6: Query "hre". No exact, no case match, try vowel-insensitive.
devowel("hre") = "hr*". This matches "h*r*" in words_vowel, returning "hare".
The solution
class Solution:
def spellchecker(self, wordlist, queries):
words_exact = set(wordlist)
words_lower = {}
words_vowel = {}
def devowel(word):
return "".join("*" if c in "aeiou" else c for c in word.lower())
for word in wordlist:
low = word.lower()
if low not in words_lower:
words_lower[low] = word
dv = devowel(word)
if dv not in words_vowel:
words_vowel[dv] = word
result = []
for query in queries:
if query in words_exact:
result.append(query)
elif query.lower() in words_lower:
result.append(words_lower[query.lower()])
elif devowel(query) in words_vowel:
result.append(words_vowel[devowel(query)])
else:
result.append("")
return result
The solution has two phases: build and query.
In the build phase, you iterate through the wordlist once. For each word, you add it to the exact set, and then compute its lowered and devoweled forms. The if key not in map check ensures that only the first word for each canonical form gets stored. This is critical because the problem says to return the first matching word from the original wordlist.
In the query phase, you check each query against the three structures in priority order. The moment you find a match, you take it and move on. If all three checks fail, you append an empty string.
Notice that devowel is called on both the wordlist words (during build) and on the queries (during lookup). The function must be consistent: lowercase first, then replace vowels. If you lowercase during build but forget to lowercase during lookup, the keys will not match.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O(n + q) |
| Space | O(n) |
Time: You iterate through the wordlist once to build the three structures, which is O(n) where n is the total number of characters across all words. Then you process each query in O(1) amortized time (hash lookups). With q queries, the total is O(n + q). If you account for string length, it is more precisely O(n * k + q * k) where k is the average word length, since hashing a string takes O(k).
Space: The exact set stores n words. The case-insensitive map stores at most n entries. The vowel-insensitive map also stores at most n entries. So total space is O(n * k) accounting for string storage.
Building blocks
1. Canonical form hashing
The core technique here is transforming different representations of the "same" thing into a single canonical key. For case-insensitive matching, the canonical form is the lowercased string. For vowel-insensitive matching, the canonical form is the lowercased-and-devoweled string. This pattern appears any time you need to group or match items that are "equivalent" under some relaxed definition.
2. Priority-based lookup chain
The algorithm checks multiple maps in a fixed order and returns the first hit. This is a general pattern for tiered matching: try the most specific rule first, then fall back to progressively looser rules. You see this in DNS resolution, CSS specificity, method resolution order in OOP, and many other systems.
Both building blocks combine into a single design: build one lookup structure per level of specificity, then query them in priority order. This scales cleanly because adding a new matching tier just means adding one more map and one more elif check.
Edge cases
- Query is an exact match but also matches case-insensitive: Exact match wins because it has higher priority. The case-insensitive map is never consulted.
- Multiple case-insensitive matches: The first word in the original wordlist order wins. This is why you check
if low not in words_lowerbefore inserting. - Query has no vowels at all:
devowel("xyz")returns"xyz"(no replacements). It still works correctly, it just means the vowel map key happens to contain no*characters. - Empty strings: If a word or query is empty,
"".lower()is""anddevowel("")is"". The algorithm handles this without special cases. - All vowels:
devowel("aeiou")returns"*****". Any other all-vowel word of length 5 will also map to"*****", so they will match.
Common mistakes
- Overwriting map entries instead of keeping the first. If you write
words_lower[low] = wordwithout theif low not in words_lowerguard, you store the last word instead of the first. The problem specifically requires returning the first match. - Forgetting to lowercase in the devowel function. If your devowel function only replaces vowels without lowering, then
"KiTe"and"kite"will produce different devoweled keys ("K*T*"vs"k*t*"), and vowel-insensitive matching will break. - Checking priorities in the wrong order. If you check vowel-insensitive before case-insensitive, you might return a less specific match when a better one exists. The order must be: exact, then case, then vowel.
- Using the exact set for case-insensitive results. When you find a case-insensitive match, you must return the original word stored in
words_lower, not the query itself. The query might have different capitalization.
From understanding to recall
Vowel Spellchecker is a clean application of two patterns: canonical form hashing and tiered lookup. The algorithm itself is simple once you see it. Build three maps, query them in order, done. But the subtlety lies in the details, specifically the insertion order guarantee and the consistency of the devowel function.
The building block to internalize here is the idea of "one map per matching tier." Any time a problem asks you to find the best match according to multiple rules with different strictness levels, this is the template. Pre-build a lookup structure for each tier, then cascade through them. You will see this pattern in many real-world systems beyond LeetCode.
Related posts
- Group Anagrams - Another problem using canonical forms for hash map keys
- Isomorphic Strings - Pattern matching with character mapping
- Valid Anagram - Hash map based string comparison
Once you have canonical form hashing and priority-based lookup internalized as separate building blocks, you can combine them to solve not just this problem but any future problem that requires fuzzy matching with tiebreaking rules. That is the power of learning patterns over solutions.