Positions of Large Groups: Linear Scan Grouping

May 23, 20264 min read

leetcodeproblemeasystrings

LeetCode 830, Positions of Large Groups, asks you to find every contiguous group of identical characters in a string that has a length of three or more. You return the start and end indices of each such group.

The problem

You are given a string s of lowercase letters. A group is a maximal run of consecutive identical characters. A group is called "large" if its length is >= 3. Return the intervals [start, end] of every large group, sorted by start index.

For example, given s = "abbxxxxzzy", the groups are "a", "bb", "xxxx", "zz", and "y". Only "xxxx" has length >= 3, so the answer is [[3, 6]].

The string "abbxxxxzzy" with large groups highlighted. The "xxxx" group spans indices [3, 6] and has length 4.

The groups form naturally as you scan left to right. Each time the character changes, you know the previous group has ended. You just need to check whether that group was long enough to count.

The key insight

You do not need to find groups in a separate pass and then filter them. You can do both in a single scan. Walk through the string with one pointer. Track where the current group started. When the character changes (or you reach the end), compare the current index to the start. If the difference is >= 3, that group is large and you record it.

This "track the start, check the length when the group ends" pattern is the same one used in run-length encoding and string compression problems.

The solution

def largeGroupPositions(s):
    result = []
    start = 0
    for i in range(1, len(s) + 1):
        if i == len(s) or s[i] != s[start]:
            if i - start >= 3:
                result.append([start, i - 1])
            start = i
    return result

Here is what happens at each stage:

Set start = 0 to mark the beginning of the first group.
Loop from index 1 through len(s) (inclusive). Going one past the end lets you handle the final group without special-casing it after the loop.
When s[i] differs from s[start], or when i reaches the end of the string, the current group is complete. Its length is i - start.
If that length is >= 3, append [start, i - 1] to the result.
Move start to i to begin tracking the next group.
Return the result list.

Practice linear scan problems

Visual walkthrough

Here is a step-by-step trace of the algorithm on s = "abbxxxxzzy". Each step shows what happens when a group boundary is detected.

Step 1: Initialize

s = "abbxxxxzzy", result = [], start = 0

Begin scanning from index 0. The variable start tracks where the current group began.

Step 2: Group "a" (index 0)

i = 0 to 1. s[1] = 'b' != 'a'. Group length = 1 - 0 = 1. Too short, skip.

The character changes at index 1, so the group 'a' ends. Length 1 is less than 3, so we do not record it.

Step 3: Group "bb" (indices 1-2)

start = 1, i = 2 to 3. s[3] = 'x' != 'b'. Group length = 3 - 1 = 2. Too short, skip.

Two consecutive 'b' characters. Still below the threshold of 3.

Step 4: Group "xxxx" (indices 3-6)

start = 3, i = 3 to 7. s[7] = 'z' != 'x'. Group length = 7 - 3 = 4. Record [3, 6].

Four consecutive 'x' characters. Length 4 is at least 3, so we append [3, 6] to the result.

Step 5: Group "zz" (indices 7-8)

start = 7, i = 7 to 9. s[9] = 'y' != 'z'. Group length = 9 - 7 = 2. Too short, skip.

Two consecutive 'z' characters. Below the threshold.

Step 6: Group "y" (index 9) and finish

start = 9, i = 9 to 10. End of string. Group length = 10 - 9 = 1. Too short. Return [[3, 6]].

The final character forms a group of length 1. The scan is complete, and the only large group is [3, 6].

Drill string grouping patterns

Complexity analysis

Aspect	Value	Why
Time	O(n)	Single pass through the string. Each character is visited once.
Space	O(1)	Only a start pointer and the result list. The result list uses O(k) where k is the number of large groups, but this is output space.

The algorithm touches each character exactly once and uses constant auxiliary space beyond the output.

The building blocks

This problem is built from two reusable ideas that appear across many string problems.

Linear grouping scan

The core technique is scanning a string to identify maximal runs of identical characters. You maintain a start pointer and advance through the string until the character changes. This same scan drives run-length encoding, string compression, and any problem that asks you to process consecutive duplicates.

Boundary detection with a sentinel

By looping to len(s) instead of len(s) - 1, you create a virtual boundary that forces the final group to be processed inside the loop. This eliminates the need for a duplicate check after the loop. The same sentinel trick works whenever you process items in groups or runs, such as merging intervals or collapsing sorted duplicates.

Edge cases

Entire string is one character. If s = "aaa", there is one group spanning the entire string. The loop detects it when i reaches the end. The result is [[0, 2]].

No large groups. If every group has fewer than 3 characters (for example, s = "abcdef"), the result is an empty list. Every group check fails the length threshold.

Multiple large groups. When several large groups exist, they are naturally collected in left-to-right order because the scan proceeds from index 0 to the end.

String of length 1 or 2. A single character or two characters can never form a large group. The function returns an empty list without any special handling because no group can reach length 3.

From understanding to recall

You can read this solution and follow each step. But the details that fade first are the practical ones: looping to len(s) (not len(s) - 1) so you do not miss the last group, recording [start, i - 1] (not [start, i]) because the end index is inclusive, and updating start = i after every group boundary.

Spaced repetition locks these details in. You drill the linear grouping scan as one brick and the boundary detection trick as another. Each rep takes a minute. After a few rounds at increasing intervals, the loop range and index arithmetic are automatic. When you see a grouping problem in an interview, you are assembling bricks you already own.

String Compression - Uses the same grouping scan to compress runs of characters in place
Count and Say - Applies run-length encoding iteratively to build a sequence from grouped digit runs
Count Binary Substrings - Groups consecutive identical characters and then compares adjacent group sizes

The grouping scan is one of the most reusable patterns in string problems. Once you can identify and measure runs of identical characters in a single pass, a large family of problems becomes approachable.

Try CodeBricks free