Skip to content
← All posts

Number of Matching Subsequences: Bucket Processing

5 min read
leetcodeproblemmediumstringshash-map

You are given a string s and an array of strings words. Return the number of words that are subsequences of s. A subsequence is a string that can be derived from another by deleting some (or no) characters without changing the order of the remaining characters.

This is LeetCode 792: Number of Matching Subsequences. For example, given s = "abcde" and words = ["a", "bb", "acd", "ace"], the answer is 3. The words "a", "acd", and "ace" are all subsequences of s, but "bb" is not because s contains only one "b".

The naive approach checks each word independently by scanning through s with a two-pointer technique. That works, but if s is long and there are many words, you end up scanning s over and over again. The bucket approach flips the script: scan s exactly once and process all words simultaneously.

sa0b1c2d3e4"a"match"bb"no matchb?"acd"match"ace"matchSubsequence foundNot a subsequence
Given s="abcde" and words=["a", "bb", "acd", "ace"], three words are subsequences of s. The word "bb" fails because s has only one "b".

Why this problem matters

This problem sits at the intersection of string matching and efficient batch processing. In real systems, you often need to check many patterns against a single source. Rather than running each check independently, you group the queries by what they need and process them together. That principle shows up in search engines, log filtering, and event-driven systems.

It also teaches you how to think about iterators as first-class data. Instead of storing entire strings, you track each word's progress with an index. Words become lightweight state machines that advance through their characters one at a time. When a word finishes, you count it. When it needs a character you have not seen yet, you park it in the right bucket and move on.

The bucket approach is a great example of amortized work. Each word character gets processed exactly once across the entire scan of s. No character in any word is revisited, and no character in s triggers unnecessary comparisons.

The approach

Create 26 buckets, one for each lowercase letter. For each word, look at its first character and place the word (along with a pointer set to 0) into the corresponding bucket.

Now scan through s one character at a time. When you encounter character c, take all the words currently waiting in bucket c. For each of those words, advance the pointer by one. If the pointer reaches the end of the word, that word is a complete subsequence, so increment your count. Otherwise, look at the word's next needed character and place it into that bucket.

After you finish scanning s, the count holds the answer. Any words still sitting in buckets never found all the characters they needed.

Clean solution

def num_matching_subseq(s, words):
    buckets = [[] for _ in range(26)]
    for word in words:
        buckets[ord(word[0]) - ord('a')].append((word, 0))

    count = 0
    for c in s:
        bucket = buckets[ord(c) - ord('a')]
        buckets[ord(c) - ord('a')] = []
        for word, idx in bucket:
            idx += 1
            if idx == len(word):
                count += 1
            else:
                buckets[ord(word[idx]) - ord('a')].append((word, idx))

    return count

Step-by-step walkthrough

Step 1: Initialize buckets by first character

Group each word by its first needed character. Each entry tracks the word and how far along we are (index 0 to start).

'a':
a(0)acd(0)ace(0)
'b':
bb(0)
'c':
(empty)
'd':
(empty)
'e':
(empty)

Words 'a', 'acd', and 'ace' all start with 'a', so they go into the 'a' bucket. Word 'bb' starts with 'b'.

Step 2: Process s[0] = 'a'

Take everything from the 'a' bucket. Advance each word's pointer. If a word is fully consumed, count it.

+1 match: "a" is fully consumed (length 1, pointer reached end).

'a':
(empty)
'b':
bb(0)
'c':
acd(1)ace(1)
'd':
(empty)
'e':
(empty)

Count so far: 1

'acd' now needs 'c' next, so it moves to the 'c' bucket. Same for 'ace'. The word 'a' finished, so count increases.

Step 3: Process s[1] = 'b'

Take everything from the 'b' bucket. "bb" advances its pointer and now needs another 'b'.

'a':
(empty)
'b':
bb(1)
'c':
acd(1)ace(1)
'd':
(empty)
'e':
(empty)

Count so far: 1

'bb' matched its first 'b' and now waits for a second 'b'. It goes right back into the 'b' bucket.

Step 4: Process s[2] = 'c'

Take everything from the 'c' bucket. Both "acd" and "ace" advance past their 'c'.

'a':
(empty)
'b':
bb(1)
'c':
(empty)
'd':
acd(2)
'e':
ace(2)

Count so far: 1

'acd' now needs 'd', so it moves to the 'd' bucket. 'ace' now needs 'e', so it moves to the 'e' bucket.

Step 5: Process s[3] = 'd'

Take everything from the 'd' bucket. "acd" has consumed all its characters.

+1 match: "acd" is fully consumed.

'a':
(empty)
'b':
bb(1)
'c':
(empty)
'd':
(empty)
'e':
ace(2)

Count so far: 2

'acd' finishes. Count goes to 2. 'bb' is still stuck waiting for a second 'b' that will never come.

Step 6: Process s[4] = 'e'

Take everything from the 'e' bucket. "ace" has consumed all its characters.

+1 match: "ace" is fully consumed.

'a':
(empty)
'b':
bb(1)
'c':
(empty)
'd':
(empty)
'e':
(empty)

Final count: 3

'ace' finishes. 'bb' never found a second 'b' in s, so it remains unmatched. Final answer: 3 words are subsequences.

Notice how each word moves through the buckets like a token on a game board. Every time s reveals a character, the words waiting for that character advance one step. Words that finish their journey get counted. Words that never get their turn stay parked in their bucket forever.

The key insight is that you never go back. Each word's pointer only moves forward, and each character of s is visited exactly once. The total work across all words is proportional to the sum of their lengths, not the product of s's length with the number of words.

Think of buckets as waiting rooms. Each word sits in the room for the character it needs next. When that character arrives in s, the word advances to the next waiting room or exits if it is done. This turns a multi-pass problem into a single scan.

Complexity analysis

ApproachTimeSpace
Bucket processingO(n + sum of word lengths)O(k)

Where n is the length of s and k is the total number of words. The time comes from scanning s once (O(n)) plus advancing each word character exactly once (O(sum of word lengths)). The space holds all words in the bucket structure at any given time, which is proportional to the number of words and their total length.

The building blocks

1. Bucket grouping by next needed character

The core idea is grouping items by a key that changes over time. Initially, each word is bucketed by its first character. As processing continues, words migrate between buckets based on their next needed character. This is similar to how radix sort distributes elements into bins, except here the bins represent future events rather than digit positions.

buckets = [[] for _ in range(26)]
for word in words:
    buckets[ord(word[0]) - ord('a')].append((word, 0))

2. Single-pass event processing

Instead of checking each word against s separately, you process all words during a single left-to-right scan of s. Each character in s acts as an "event" that triggers advancement for all words waiting on that character. This pattern appears in event-driven simulation, where a timeline of events drives state changes across multiple entities simultaneously.

Edge cases to watch for

  • Empty s: no characters to match against, so the answer is 0 unless some word is also empty.
  • Empty words in the array: an empty string is technically a subsequence of any string. Handle this by counting empty words before starting the bucket scan.
  • Duplicate words: the same word can appear multiple times in words. Each copy is tracked independently in the buckets, so duplicates are handled naturally.
  • Single-character words: these finish as soon as their character appears in s. They exit the bucket immediately.
  • Words longer than s: these can never be subsequences. They will remain parked in buckets after the scan completes, so they are automatically excluded from the count.
  • All words starting with the same character: the first bucket may contain many words, but each still advances independently. No special handling needed.

From understanding to recall

You now understand how bucket processing turns a multi-word subsequence check into a single scan of s. Each word starts in the bucket for its first character, advances when that character appears, and either finishes or moves to the next bucket. The total work is proportional to the sum of all word lengths plus the length of s.

But reading this once is not enough to reproduce it under pressure. The details matter: initializing buckets by first character, swapping out the current bucket before iterating, advancing the pointer and checking for completion. These are the small steps that trip people up when the clock is ticking.

This bucket processing pattern is one of roughly 60 reusable building blocks in the CodeBricks system. You practice each block with spaced repetition until writing the solution from memory feels automatic. Locking in the bucket-advance-or-count loop means you can apply the same technique to any problem that needs to match multiple patterns against a single source in one pass.

Related posts