Positions of Large Groups: Linear Scan Grouping
LeetCode 830, Positions of Large Groups, asks you to find every contiguous group of identical characters in a string that has a length of three or more. You return the start and end indices of each such group.
The problem
You are given a string s of lowercase letters. A group is a maximal run of consecutive identical characters. A group is called "large" if its length is >= 3. Return the intervals [start, end] of every large group, sorted by start index.
For example, given s = "abbxxxxzzy", the groups are "a", "bb", "xxxx", "zz", and "y". Only "xxxx" has length >= 3, so the answer is [[3, 6]].
The groups form naturally as you scan left to right. Each time the character changes, you know the previous group has ended. You just need to check whether that group was long enough to count.
The key insight
You do not need to find groups in a separate pass and then filter them. You can do both in a single scan. Walk through the string with one pointer. Track where the current group started. When the character changes (or you reach the end), compare the current index to the start. If the difference is >= 3, that group is large and you record it.
This "track the start, check the length when the group ends" pattern is the same one used in run-length encoding and string compression problems.
The solution
def largeGroupPositions(s):
result = []
start = 0
for i in range(1, len(s) + 1):
if i == len(s) or s[i] != s[start]:
if i - start >= 3:
result.append([start, i - 1])
start = i
return result
Here is what happens at each stage:
- Set
start = 0to mark the beginning of the first group. - Loop from index 1 through
len(s)(inclusive). Going one past the end lets you handle the final group without special-casing it after the loop. - When
s[i]differs froms[start], or whenireaches the end of the string, the current group is complete. Its length isi - start. - If that length is
>=3, append[start, i - 1]to the result. - Move
starttoito begin tracking the next group. - Return the result list.
Visual walkthrough
Here is a step-by-step trace of the algorithm on s = "abbxxxxzzy". Each step shows what happens when a group boundary is detected.
Step 1: Initialize
Begin scanning from index 0. The variable start tracks where the current group began.
Step 2: Group "a" (index 0)
The character changes at index 1, so the group 'a' ends. Length 1 is less than 3, so we do not record it.
Step 3: Group "bb" (indices 1-2)
Two consecutive 'b' characters. Still below the threshold of 3.
Step 4: Group "xxxx" (indices 3-6)
Four consecutive 'x' characters. Length 4 is at least 3, so we append [3, 6] to the result.
Step 5: Group "zz" (indices 7-8)
Two consecutive 'z' characters. Below the threshold.
Step 6: Group "y" (index 9) and finish
The final character forms a group of length 1. The scan is complete, and the only large group is [3, 6].
Complexity analysis
| Aspect | Value | Why |
|---|---|---|
| Time | O(n) | Single pass through the string. Each character is visited once. |
| Space | O(1) | Only a start pointer and the result list. The result list uses O(k) where k is the number of large groups, but this is output space. |
The algorithm touches each character exactly once and uses constant auxiliary space beyond the output.
The building blocks
This problem is built from two reusable ideas that appear across many string problems.
Linear grouping scan
The core technique is scanning a string to identify maximal runs of identical characters. You maintain a start pointer and advance through the string until the character changes. This same scan drives run-length encoding, string compression, and any problem that asks you to process consecutive duplicates.
Other problems that use this pattern:
- String Compression (LeetCode 443) asks you to compress consecutive characters in place using counts.
- Count and Say (LeetCode 38) uses the same grouping scan to read runs of digits aloud.
Boundary detection with a sentinel
By looping to len(s) instead of len(s) - 1, you create a virtual boundary that forces the final group to be processed inside the loop. This eliminates the need for a duplicate check after the loop. The same sentinel trick works whenever you process items in groups or runs, such as merging intervals or collapsing sorted duplicates.
Edge cases
Entire string is one character. If s = "aaa", there is one group spanning the entire string. The loop detects it when i reaches the end. The result is [[0, 2]].
No large groups. If every group has fewer than 3 characters (for example, s = "abcdef"), the result is an empty list. Every group check fails the length threshold.
Multiple large groups. When several large groups exist, they are naturally collected in left-to-right order because the scan proceeds from index 0 to the end.
String of length 1 or 2. A single character or two characters can never form a large group. The function returns an empty list without any special handling because no group can reach length 3.
From understanding to recall
You can read this solution and follow each step. But the details that fade first are the practical ones: looping to len(s) (not len(s) - 1) so you do not miss the last group, recording [start, i - 1] (not [start, i]) because the end index is inclusive, and updating start = i after every group boundary.
Spaced repetition locks these details in. You drill the linear grouping scan as one brick and the boundary detection trick as another. Each rep takes a minute. After a few rounds at increasing intervals, the loop range and index arithmetic are automatic. When you see a grouping problem in an interview, you are assembling bricks you already own.
Related posts
- String Compression - Uses the same grouping scan to compress runs of characters in place
- Count and Say - Applies run-length encoding iteratively to build a sequence from grouped digit runs
- Count Binary Substrings - Groups consecutive identical characters and then compares adjacent group sizes
The grouping scan is one of the most reusable patterns in string problems. Once you can identify and measure runs of identical characters in a single pass, a large family of problems becomes approachable.