Word Subsets: Merging Character Requirements

May 24, 20266 min read

leetcodeproblemmediumarraysstringshash-map

Given two arrays of strings words1 and words2, a string b is a subset of string a if every letter in b (including multiplicity) appears in a. A word a from words1 is called universal if every string in words2 is a subset of a. Return a list of all universal words in words1.

This is LeetCode 916: Word Subsets, a medium problem that tests your ability to combine multiple frequency constraints into one efficient check. The core trick is merging all of the requirements from words2 into a single map so you only need one pass through words1.

The problem

You are given words1 = ["amazon","apple","facebook","google","leetcode"] and words2 = ["e","o"]. A word in words1 is universal if it contains at least the characters required by every word in words2. Since words2 demands at least one "e" and at least one "o", you need words that have both. "amazon" has no "e", "apple" has no "o", but "facebook", "google", and "leetcode" each have both.

Output: ["facebook", "google", "leetcode"]

Merge words2 requirements into one map: {e:1, o:1}. Check each word in words1. "facebook", "google", and "leetcode" satisfy both requirements. "amazon" is missing "e" and "apple" is missing "o".

The question becomes: how do you check these constraints efficiently without recounting characters for every pair?

The brute force approach

The obvious approach is nested loops. For each word a in words1, loop through every word b in words2 and check if b is a subset of a. To check the subset relationship, you count character frequencies for both words and verify that a has at least as many of each character as b.

This works, but it is slow. If words1 has n words and words2 has m words, you do n * m subset checks. Each check involves counting characters across both strings. For large inputs, this becomes expensive and redundant. You are recounting the characters of each word in words2 over and over again, once per word in words1.

There has to be a way to compress the work from words2 into a single step.

The key insight

You do not need to check each word in words2 separately. Instead, merge all of words2 into a single "max-frequency" requirement map. For each character, track the maximum count needed across all words in words2. Then check each word in words1 against this single map. One merge, one pass, done.

Why does this work? If words2 = ["e", "oo"], then the requirement is: at least 1 "e" and at least 2 "o"s. The word "e" demands {e: 1}. The word "oo" demands {o: 2}. Merging by taking the max of each character gives {e: 1, o: 2}. Any word that satisfies this merged map automatically satisfies both original words.

This reduces the problem from n * m checks down to m frequency counts (to build the merged map) plus n frequency counts (to check each word in words1). That is O(n + m) passes, each proportional to the word length.

Step-by-step walkthrough

Let's trace through words1 = ["amazon","apple","facebook","google","leetcode"] with words2 = ["e","o"].

Step 1: Build a frequency map for each word in words2.

Each word in words2 produces its own character count. "e" needs one e. "o" needs one o.

Step 2: Merge into one max-frequency requirement map.

For each character, take the maximum count across all words2 entries. max(e) = 1, max(o) = 1. One map captures every constraint.

Step 3: Check each word in words1 against the merged map.

For each word, count its characters and compare. A word is universal only if every required character count is met.

Step 4: Collect and return the universal words.

Only the words that passed every character requirement are universal. Return ["facebook", "google", "leetcode"].

That is the whole algorithm. Build the merged requirement map once, then filter words1 in a single pass.

Practice this pattern with spaced repetition

The code

from collections import Counter

def wordSubsets(words1, words2):
    max_freq = Counter()
    for b in words2:
        freq_b = Counter(b)
        for ch in freq_b:
            max_freq[ch] = max(max_freq[ch], freq_b[ch])

    result = []
    for a in words1:
        freq_a = Counter(a)
        if all(freq_a[ch] >= max_freq[ch] for ch in max_freq):
            result.append(a)
    return result

Here is what each section does:

max_freq = Counter() initializes the merged requirement map. This will hold the maximum frequency of each character needed across all words in words2.
The first loop iterates through words2. For each word b, it counts character frequencies and updates max_freq by taking the maximum for each character. After this loop, max_freq contains the tightest constraint for every character.
The second loop iterates through words1. For each word a, it counts character frequencies and checks whether a meets every requirement in max_freq. The all() call returns True only if the word has enough of every required character.
Words that pass are appended to result.

The merge step is the key optimization. Without it, you would recheck every word in words2 for every word in words1. With it, you do a single O(m) preprocessing pass and then O(n) checks. The total character work is proportional to the sum of all word lengths, which is optimal.

Complexity analysis

Metric	Value
Time	O(n * k + m * j) where n = len(words1), k = avg length of words1, m = len(words2), j = avg length of words2
Space	O(26) = O(1) for the frequency maps (fixed alphabet size)

The time complexity is linear in the total number of characters across both arrays. You visit each character a constant number of times. The space is O(1) because the frequency maps can hold at most 26 entries (lowercase English letters).

Build your pattern library with CodeBricks

The building blocks

1. Frequency map merging

The core pattern here is combining multiple frequency constraints into a single merged map using element-wise max:

merged = Counter()
for constraint in constraints:
    freq = Counter(constraint)
    for ch in freq:
        merged[ch] = max(merged[ch], freq[ch])

This pattern appears whenever you need to satisfy multiple requirements simultaneously. Instead of checking each requirement separately, you precompute the tightest constraint and check once. The template is always: iterate through the constraints, take the max (or min, or sum, depending on the problem) for each dimension, then use the merged result.

2. Frequency-based filtering

The second pattern is filtering a collection by checking each element against a frequency requirement:

result = []
for item in collection:
    freq = Counter(item)
    if all(freq[ch] >= required[ch] for ch in required):
        result.append(item)

This is a frequency-aware filter. You count characters (or properties) of each candidate and compare them against a threshold. The all() check ensures every dimension of the requirement is met. You see this in anagram checking, substring matching, and any problem where "contains at least these characters" is the condition.

Edge cases

Single word in words2. The merged map is just that word's frequency. No merging needed, but the algorithm handles it identically.
Duplicate characters in words2 words. If words2 = ["oo"], the requirement is {o: 2}, not {o: 1}. The Counter correctly captures this.
Overlapping requirements. If words2 = ["e", "ee"], the merged map has {e: 2} because max(1, 2) = 2. Only words with at least two "e"s qualify.
Empty result. If no word in words1 satisfies the merged requirements, the result is an empty list.
All words pass. If words2 only requires very common characters, every word in words1 might qualify. The algorithm handles this without special logic.

From understanding to recall

The merge step is the entire insight for this problem. Once you see that you can compress all of words2 into a single max-frequency map, the rest is just counting and comparing. But recognizing when to apply this technique is the harder part.

The building block is "merge multiple frequency constraints into one." That idea applies beyond this specific problem. Whenever you have a set of conditions that each involve character (or element) counts, ask yourself: can I combine them into a single requirement and check once? If the conditions use "at least" semantics (as they do here), element-wise max is the right merge operation.

Spaced repetition helps you internalize this pattern so it fires automatically. After a few rounds of writing the merge loop from scratch, you will not need to re-derive it. You will see "multiple subset conditions" and immediately reach for the max-frequency merge.

Build recall with CodeBricks

Group Anagrams - Another frequency-counting problem where the key insight is choosing the right canonical representation
Find All Anagrams in a String - Sliding window with frequency maps, a related pattern for substring matching
Valid Anagram - The simplest frequency comparison problem, a good warm-up before Word Subsets

Try CodeBricks free