Find the Longest Substring Containing Vowels in Even Counts: Bitmask + HashMap Pattern
LeetCode Find the Longest Substring Containing Vowels in Even Counts (Problem 1371) asks: given a string s, return the length of the longest substring where every vowel (a, e, i, o, u) appears an even number of times. Consonants can appear any number of times. Vowels that do not appear at all count as appearing zero times, which is even.
For example, given s = "leetcode", the answer is 5. The substring "leetc" contains two e's (even) and no other vowels (zero is even). That is the longest such substring.
Why this problem matters
This problem combines two powerful techniques: encoding parity state as a bitmask and using a hash map to find the earliest occurrence of that state. You will see this same combination in problems involving prefix XOR, subarray parity constraints, and any scenario where you need to find the longest or shortest subarray satisfying some modular condition. Mastering this pattern gives you a reusable template for an entire family of problems.
The key insight: parity bitmask
You do not care about the exact counts of each vowel. You only care whether each count is even or odd. That is a single bit of information per vowel. With five vowels, you can encode the entire parity state in a 5-bit integer (values 0 to 31).
As you scan the string left to right, toggle the corresponding bit each time you encounter a vowel. If the bitmask at position i equals the bitmask at an earlier position j, then every vowel appeared an even number of times in the substring s[j+1..i]. Why? Because toggling a bit an even number of times returns it to its original value. Equal bitmasks mean equal parities, which means even counts in between.
To find the longest such substring, you want the earliest index where each bitmask value first appeared. Store the first occurrence of each mask in a hash map. Initialize with {0: -1} because before the string starts, all vowel counts are zero (mask 0), effectively at index -1.
For each position i, if the current mask was seen before at index j, the substring from j+1 to i has length i - j. Track the maximum.
Python solution
def findTheLongestSubstring(s: str) -> int:
vowel_map = {'a': 0, 'e': 1, 'i': 2, 'o': 3, 'u': 4}
mask = 0
seen = {0: -1}
best = 0
for i, ch in enumerate(s):
if ch in vowel_map:
mask ^= (1 << vowel_map[ch])
if mask in seen:
best = max(best, i - seen[mask])
else:
seen[mask] = i
return best
How the solution works
The vowel_map assigns each vowel a bit position (0 through 4). The mask variable holds the running parity state. Each time you see a vowel, you XOR the mask with 1 << bit_position, flipping that vowel's parity bit.
The seen dictionary maps each mask value to the first index where it appeared. You seed it with {0: -1} so that if the mask returns to 0 at index i, the length is i - (-1) = i + 1, covering the entire prefix.
At each index, if the current mask already exists in seen, the difference i - seen[mask] gives the length of a valid substring. You never update seen[mask] once it is set, because you want the earliest occurrence to maximize the substring length.
Step-by-step walkthrough
Let's trace through the algorithm on s = "leetcode". The answer should be 5.
Initialize: mask = 0, seen = {0: -1}
Before scanning, all vowel counts are zero (even). Record mask 0 at position -1.
Step 1: i = 0, char = 'l' (consonant)
'l' is not a vowel, so the mask stays 00000. We saw mask 0 at index -1, so length = 0 - (-1) = 1. Best = 1.
Step 2: i = 1, char = 'e' (vowel, bit 1)
'e' toggles bit 1. Mask becomes 00010 (decimal 2). First time seeing mask 2, so store seen[2] = 1.
Step 3: i = 2, char = 'e' (vowel, toggles bit 1 back)
Second 'e' toggles bit 1 back to 0. Mask returns to 00000. We saw 0 at index -1, so length = 2 - (-1) = 3. Best = 3.
Step 4: i = 4, char = 'c' (consonant, mask unchanged)
After 't' and 'c', mask is still 00000. Seen at -1, so length = 4 - (-1) = 5. Best = 5. Substring s[0..4] = "leetc" has two e's (even).
Step 5: i = 5, char = 'o' (vowel, bit 3)
'o' toggles bit 3. Mask becomes 01000 (decimal 8). First time seeing 8, store seen[8] = 5. Best stays 5.
Final: i = 7, char = 'e' (end of string)
'e' toggles bit 1 again. Mask = 01010 (10). New mask, stored. At i=6 (d), mask 8 was seen at 5, only length 1. Answer = 5.
The longest valid substring is "leetc" (indices 0 through 4), where the two e's cancel out to even and no other vowels appear. The mask stayed at 00000 from index -1 through index 4, giving length 5.
The XOR operation is the key to this pattern. XOR with 1 toggles a bit: even becomes odd, odd becomes even. Two XORs cancel out, returning the bit to its original state. This is exactly the parity tracking you need.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O(n), single pass through the string |
| Space | O(1), the seen map has at most 32 entries (2^5 possible masks) |
Each character is processed once with O(1) work: a hash map lookup and possibly a bit flip. The hash map stores at most 32 entries since the mask can only take values 0 through 31. This makes the space effectively constant regardless of input size.
Building blocks
This problem decomposes into reusable building blocks that appear across many other problems:
Bitmask parity encoding
Encoding even/odd state as bits in an integer. Each bit tracks a binary property (even or odd count), and XOR toggles it. This same technique works any time you need to track parity of multiple independent counters simultaneously. You will see it in problems about subarray XOR, character frequency parity, and game state tracking.
First-occurrence hash map
Storing the first index where each state was seen, then computing the distance when the state repeats. This is the same pattern used in "Subarray Sum Equals K" with prefix sums. The general principle: if you can encode your constraint as a running state, and equal states at two positions mean the subarray between them satisfies the constraint, then a first-occurrence map gives you the longest (or shortest) valid subarray in O(n).
XOR for parity toggling
mask ^= (1 << bit_position)
XOR with a single set bit flips exactly that bit. Two flips cancel. This is the fundamental operation for tracking even/odd state without maintaining full counts.
Edge cases
No vowels at all. If s contains only consonants, the mask never changes from 0. Every position matches the initial state at -1, so the answer is the full string length. The algorithm handles this correctly.
All vowels, all the same. For s = "eeee", the mask alternates between 0 and 2. At index 3, the mask returns to 0 (four e's, even count), giving length 4. The full string is valid.
Single character. If s has length 1 and it is a consonant, the answer is 1 (mask stays 0). If it is a vowel, the answer is 0 (mask becomes nonzero and was never seen before at a different index, so no valid substring exists).
All five vowels each appearing once. For s = "aeiou", each vowel toggles a different bit. The mask at the end is 11111 (31). No intermediate mask repeats the initial 0, so the answer is 0. No substring has all vowels appearing an even number of times (except the empty substring, which does not count).
From understanding to recall
The bitmask plus hash map pattern is elegant, but it requires connecting several ideas: parity as bits, XOR for toggling, equal masks meaning even counts in between, and first-occurrence for maximizing length. In an interview, you need all of these pieces to click together quickly.
Spaced repetition helps you internalize the full chain of reasoning. You write the solution from scratch today. In a few days, you do it again. Each repetition strengthens the connection between "even counts constraint" and "bitmask + first-occurrence map." Eventually, seeing "even counts" in a problem statement immediately triggers the bitmask approach without conscious effort.
Related posts
- Subarray Sum Equals K - Uses the same first-occurrence hash map pattern, but with prefix sums instead of bitmasks. Both problems find subarrays where a running state matches an earlier state.
- Longest Substring Without Repeating Characters - Another "longest substring" problem, solved with sliding window instead of bitmask. Comparing the two approaches highlights when to use each technique.
- Single Number - Uses XOR cancellation to find an unpaired element. The same XOR property (two toggles cancel out) is the foundation of the parity tracking here.
The bitmask parity pattern is one of those techniques that feels like magic the first time you see it, then becomes second nature with practice. Once you own it, an entire class of parity-based substring problems opens up.