Skip to content
← All posts

Count Number of Homogenous Substrings: Group Counting Pattern

5 min read
leetcodeproblemmediumstringsmath

LeetCode 1759, Count Number of Homogenous Substrings, asks you to count the total number of homogenous substrings in a given string. A homogenous string is one where every character is the same, like "aaa" or "bb". You need to return the count modulo 10^9 + 7.

For example, given s = "abbcccaa", the answer is 13. The homogenous substrings are: "a", "b", "b", "bb", "c", "c", "c", "cc", "cc", "ccc", "a", "a", "aa".

abbcccaalen=1len=2len=3len=2
The string "abbcccaa" split into four groups of consecutive identical characters. Each group contributes k*(k+1)/2 homogenous substrings.

Why this problem matters

This problem teaches you how to count substrings without enumerating them. The brute-force approach of checking every substring is O(n^2), but you can do much better by recognizing that consecutive runs of the same character form independent groups. Each group's contribution to the total count follows a simple mathematical formula.

This "group consecutive characters and count" pattern shows up in many string problems: string compression, run-length encoding, and counting subarrays with specific properties. Learning to spot it saves you from unnecessary nested loops.

The key insight

When you see a run of k identical characters, every substring within that run is homogenous. How many such substrings are there? You can pick any starting position and any ending position within the run, as long as the ending position comes after the start.

A run of length k produces exactly k * (k + 1) / 2 substrings. This is the k-th triangular number. For example, a run of 3 produces 3 * 4 / 2 = 6 substrings: three of length 1, two of length 2, and one of length 3.

The total answer is simply the sum of these triangular numbers across all groups, taken modulo 10^9 + 7.

The solution

def countHomogenous(s):
    MOD = 10**9 + 7
    total = 0
    count = 1

    for i in range(1, len(s)):
        if s[i] == s[i - 1]:
            count += 1
        else:
            total = (total + count * (count + 1) // 2) % MOD
            count = 1

    total = (total + count * (count + 1) // 2) % MOD
    return total

Here is what happens step by step:

  1. Start with count = 1 because the first character is always a group of at least one.
  2. Walk through the string from index 1 onward. If the current character matches the previous one, extend the current group by incrementing count.
  3. When the character changes, the current group is complete. Add its contribution (count * (count + 1) / 2) to total and reset count to 1 for the new group.
  4. After the loop ends, add the contribution of the final group, since it never triggers the "character changed" branch.
  5. Return total modulo 10^9 + 7.

You can also accumulate the count incrementally: instead of computing k*(k+1)/2 at the end of each group, add count to total at every step. When a run extends by one character, the new character adds exactly count new homogenous substrings (one ending at each position in the run). Both approaches give the same result.

Visual walkthrough

Here is a step-by-step trace of the algorithm processing the string "abbcccaa". Each group of consecutive identical characters contributes a triangular number of substrings.

Group 1: "a" (length 1)

a

k*(k+1)/2 = 1*2/2 = 1

"a"

+1 substrings, running total: 1

A single character contributes 1*(1+1)/2 = 1 homogenous substring.

Group 2: "bb" (length 2)

bb

k*(k+1)/2 = 2*3/2 = 3

"b""b""bb"

+3 substrings, running total: 4

Two consecutive b's give 2*(2+1)/2 = 3 substrings: "b", "b", and "bb".

Group 3: "ccc" (length 3)

ccc

k*(k+1)/2 = 3*4/2 = 6

"c""c""c""cc""cc""ccc"

+6 substrings, running total: 10

Three consecutive c's give 3*(3+1)/2 = 6 substrings: three "c", two "cc", and one "ccc".

Group 4: "aa" (length 2)

aa

k*(k+1)/2 = 2*3/2 = 3

"a""a""aa"

+3 substrings, running total: 13

Two consecutive a's give 2*(2+1)/2 = 3 substrings. Total: 1 + 3 + 6 + 3 = 13.

Complexity analysis

ApproachTimeSpace
Group countingO(n)O(1)

You scan the string once from left to right, tracking only the current group length and the running total. No extra data structures are needed. The modulo operation is constant time per group.

The building blocks

1. Consecutive group traversal

The core technique is scanning a string and grouping consecutive identical characters. You maintain a counter that grows while characters repeat and resets when the character changes.

count = 1
for i in range(1, len(s)):
    if s[i] == s[i - 1]:
        count += 1
    else:
        process(count)
        count = 1
process(count)

This pattern appears whenever you need to compress, describe, or measure runs within a string. The critical detail is remembering to process the last group after the loop ends, since there is no "character change" to trigger it.

2. Triangular number counting

A run of k identical characters contains k * (k + 1) / 2 homogenous substrings. This formula counts the number of ways to pick a contiguous segment from a sequence of k items.

def substrings_in_run(k):
    return k * (k + 1) // 2

The triangular number formula is a building block that shows up in combinatorics problems whenever you need to count pairs, intervals, or contiguous segments. Recognizing it lets you replace a nested loop with a constant-time calculation.

Edge cases

Single character string. s = "a" has exactly 1 homogenous substring. The loop body never executes, and the final group processing handles it correctly.

All identical characters. s = "aaaa" is one big group of length 4. The answer is 4 * 5 / 2 = 10.

All distinct characters. s = "abcd" has four groups of length 1. Each contributes 1 substring, so the answer is 4.

Large input. With n up to 10^5, the O(n) solution runs instantly. The modulo operation prevents integer overflow in languages with fixed-width integers. In Python, integers grow arbitrarily, but the modulo is still required by the problem specification.

Modulo arithmetic. Make sure to apply the modulo after each addition, not just at the end. This prevents overflow in languages like C++ or Java.

From understanding to recall

You can follow this solution from top to bottom and it makes sense. But in an interview, you need to produce the key pieces from memory: recognizing that consecutive groups are independent, recalling the triangular number formula, and remembering to process the final group after the loop. Those are the details that slip away after a week.

Spaced repetition locks them in. You drill the consecutive group traversal as one brick and the triangular counting formula as another. Each rep takes a minute. After a few rounds at increasing intervals, you stop thinking about whether to use k * (k + 1) / 2 or k * (k - 1) / 2. The formula is automatic. When a counting problem shows up in an interview, you are assembling bricks you already own.

Related posts

CodeBricks teaches you the patterns behind problems like this. Instead of grinding hundreds of problems and hoping the ideas stick, you learn roughly 60 reusable building blocks and drill them with spaced repetition. Each brick clicks into place, and new problems become assemblies of familiar pieces.