Occurrences After Bigram: Linear Scan Pattern
Given a string text along with two words first and second, find all occurrences of the bigram first second in text and collect the word that immediately follows each occurrence. Return those collected words as a list.
This is LeetCode 1078: Occurrences After Bigram, and it is a clean introduction to scanning text at the word level. Instead of matching individual characters, you split the text into words and look for a pattern across consecutive entries.
Why this problem matters
Bigrams (pairs of consecutive words) are a fundamental concept in natural language processing. Counting bigrams, finding words that follow them, and building frequency tables over word pairs are common tasks in text analysis, autocomplete systems, and language modeling. This problem isolates the core mechanic: scan a word sequence and look for a specific pair.
It also reinforces a key interviewing skill, which is recognizing when you can reduce a string problem to an array problem. Once you split the text into words, the entire problem becomes a linear scan over an array with a window of three consecutive elements. No complex string matching is needed.
The key insight
Split the text into an array of words. Then scan through the array, and at each position i, check whether words[i] equals first and words[i+1] equals second. If both match, append words[i+2] to the result. You need i+2 to be in bounds, so the loop runs from index 0 to len(words) - 3.
This is essentially a sliding window of size 3 over the word array. You check the first two elements of the window against the bigram, and if they match, you collect the third.
The solution
def find_ocurrences(text: str, first: str, second: str) -> list[str]:
words = text.split()
result = []
for i in range(len(words) - 2):
if words[i] == first and words[i + 1] == second:
result.append(words[i + 2])
return result
The text.split() call handles any amount of whitespace and produces a clean list of words. Then we iterate through all valid starting positions for a triple-word window. At each position, a simple equality check on the first two words tells us whether we have a bigram match. If we do, we grab the third word.
Notice that the loop bound is len(words) - 2. This ensures words[i + 2] never goes out of bounds. When i is at its maximum value of len(words) - 3, the indices i, i+1, and i+2 are the last three elements.
When a string problem talks about "words" in a sentence, your first move should almost always be text.split(). This converts the problem from character-level string matching to array-level scanning, which is usually simpler to reason about and implement.
Visual walkthrough
Step 1: Check window at indices 0, 1, 2
words[0]="alice", words[1]="is". Neither matches first="a", so skip.
result = []
Step 2: Check window at indices 1, 2, 3
words[1]="is", words[2]="a". words[1] does not equal first="a", so skip.
result = []
Step 3: Check window at indices 2, 3, 4
words[2]="a" matches first, words[3]="good" matches second. Bigram found! Collect words[4]="girl".
result = ["girl"]
Steps 4-6: Scan indices 3 through 6
No window starting at indices 3, 4, 5, or 6 has words[i]="a" followed by words[i+1]="good". Keep scanning.
result = ["girl"]
Step 7: Check window at indices 7, 8, 9
words[7]="a" matches first, words[8]="good" matches second. Bigram found again! Collect words[9]="student".
result = ["girl", "student"]
Done: All windows checked
We scanned every triple-word window. Two bigram matches produced two result words.
result = ["girl", "student"]
The scan visits each triple-word window exactly once. When the first two words match the bigram, we collect the third. No backtracking, no nested loops, just a single left-to-right pass through the word array.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Linear scan | O(n) | O(n) |
Here n is the length of the text string.
Time: Splitting the text into words takes O(n) where n is the number of characters. The scan through the words array visits each word once, which is also O(n) total character comparisons in the worst case. Overall: O(n).
Space: The words array stores all the words from the text, which takes O(n) space. The result list takes at most O(n) as well. Total: O(n).
The building blocks
1. Word splitting and array scanning
The foundation of this solution is converting a sentence into a word array and then scanning it. This is the same pattern used in problems like Reverse Words in a String and Most Common Word. Any time you see a problem that operates on "words in a sentence," the first step is split(). Once you have an array, you can use indexing, iteration, and standard array techniques.
words = text.split()
for i in range(len(words)):
process(words[i])
2. Triple-word window pattern
The second building block is checking a fixed-size window of consecutive elements. Here the window is three words wide: you check the first two against a pattern and collect the third. This is a simplified version of the sliding window technique. The key difference is that the window size is fixed and tiny, so there is no need for complex pointer management. You just index i, i+1, and i+2.
for i in range(len(arr) - 2):
if arr[i] == target1 and arr[i + 1] == target2:
collect(arr[i + 2])
Edge cases
- No bigram match: if the bigram
first secondnever appears in the text, the result is an empty list. The loop runs but never triggers theifcondition. - Bigram at the very end: if the text ends with
first secondand there is no word after, the loop boundlen(words) - 2prevents that position from being checked. No index error occurs. - Overlapping bigrams: consider text =
"a a a", first ="a", second ="a". At index 0, words[0]="a" and words[1]="a" match, so words[2]="a" is collected. At index 1, words[1]="a" and words[2]="a" also match, but there is no words[3]. The loop bound stops at index 0 (len=3, range(1)), so only one result is collected. This is correct. - Single or two words in text: if the text has fewer than three words,
range(len(words) - 2)produces an empty range. The loop body never executes, and the result is an empty list. - Repeated bigram with different following words: each match is independent. If the bigram appears five times with five different following words, all five are collected in order.
From understanding to recall
The logic here is minimal: split, scan triples, collect matches. But in an interview, the small details matter. Remembering to use len(words) - 2 as the bound, not len(words) - 1 or len(words), is the kind of off-by-one detail that causes bugs under pressure.
Spaced repetition drills these details into muscle memory. You practice writing the split-and-scan pattern from scratch, getting the loop bound right every time. After a few rounds, you do not need to think about whether it is -1 or -2. Your fingers just write it correctly because they have done it before.
Related posts
- Find the Index of the First Occurrence in a String - Classic string matching
- Most Common Word - Another word-level string processing problem
- Reverse Words in a String - Word-based string manipulation
CodeBricks breaks Occurrences After Bigram into its word-splitting and triple-window scanning building blocks, then drills them independently with spaced repetition. You type each piece from scratch until the pattern is automatic. When a word-level string problem appears in your interview, you do not hesitate. You just write it.