Skip to content
← All posts

Longest Substring with At Least K Repeating Characters

6 min read
leetcodeproblemmediumstringssliding-window

LeetCode 395, Longest Substring with At Least K Repeating Characters, asks you to find the length of the longest substring where every character appears at least k times.

For example, given s = "aaabb" and k = 3, the answer is 3. The substring "aaa" has every character appearing at least 3 times. The full string "aaabb" does not work because 'b' only appears twice.

This problem looks like it should be a sliding window problem, and it can be solved that way with some effort. But the cleanest approach is divide and conquer. The key insight is beautifully simple: any character that appears fewer than k times in the string can never be part of a valid substring, so you can split on it.

s = "aaabb", k = 3Full string:aaabb3a2bk=3Split on 'b' (count 2 is below k=3):splitaaa"aaa" = 3bb"bb" = 0Every character in "aaa" appears at least 3 times.Answer: 3Meets k thresholdBelow k threshold
In "aaabb" with k=3, 'b' only appears 2 times (below k). We split the string at every 'b' and recurse on each piece. The left piece "aaa" is all valid, giving us 3.

Why brute force is too slow

The brute force approach checks every possible substring and verifies that every character in it appears at least k times.

def longest_substring_brute(s, k):
    best = 0
    for i in range(len(s)):
        for j in range(i, len(s)):
            freq = {}
            for c in s[i:j + 1]:
                freq[c] = freq.get(c, 0) + 1
            if all(v >= k for v in freq.values()):
                best = max(best, j - i + 1)
    return best

This is O(n^3) in the worst case: two nested loops to pick the substring and another pass to count frequencies. For a string of 10,000 characters, that is way too slow.

The key insight: split on infrequent characters

Here is the idea that makes this problem click. Count the frequency of every character in the string. If a character appears fewer than k times total, it can never be part of any valid substring. That character acts as a natural split point.

Think of it this way: if 'c' appears only once in the entire string, then no valid substring can contain 'c'. So any valid substring must live entirely to the left of 'c' or entirely to the right of 'c'. You split the string at every occurrence of 'c' and solve each piece recursively.

If every character in the string already meets the k threshold, the whole string is valid and you return its length. That is the base case.

This is a divide-and-conquer strategy, not the typical sliding window approach. You are not moving two pointers across the string. Instead, you are recursively breaking the string into smaller pieces at natural boundary points. Each split eliminates at least one character from the alphabet, so the recursion depth is at most 26.

Walking through it step by step

Let's trace through s = "ababbc" with k = 2. We count frequencies, identify the split character, break the string apart, and recurse on each piece.

Step 1: Count character frequencies in "ababbc" with k=2

a0b1a2b3b4c5freq:a:2b:3c:1splitc appears only 1 time, which is below k=2. Split here.

a appears 2 times (meets k=2), b appears 3 times (meets k=2), c appears 1 time (below k=2). Split on c.

Step 2: Recurse on left substring "ababb" (indices 0-4)

ababbfreq:a:2b:3All characters meet k=2. Return length 5.

In "ababb", a appears 2 times (meets k=2) and b appears 3 times (meets k=2). Every character meets the threshold, so this whole substring is valid. Length = 5.

Step 3: Right side of split (after "c") is empty

(empty substring after 'c') = 0

There are no characters after the split character "c" at index 5. This branch returns 0.

Step 4: Compare results and return the maximum

"ababbc""ababb" = 5"" = 0max(5, 0) = 5

The left branch returned 5, the right branch returned 0. The answer is max(5, 0) = 5.

The answer is 5, from the substring "ababb". Every character in that substring ('a' appears 2 times, 'b' appears 3 times) meets the k = 2 threshold.

The code

Here is the complete solution in Python:

def longest_substring(s, k):
    if len(s) < k:
        return 0

    freq = {}
    for c in s:
        freq[c] = freq.get(c, 0) + 1

    for c in freq:
        if freq[c] < k:
            return max(longest_substring(sub, k) for sub in s.split(c))

    return len(s)

Let's break down each piece:

  1. Base case: if the string is shorter than k, no character can appear k times. Return 0.
  2. Count frequencies: build a frequency map of every character in the current string.
  3. Find a split character: scan the frequency map for any character that appears fewer than k times. If you find one, split the string on every occurrence of that character and recurse on each piece. Return the maximum result across all pieces.
  4. All characters valid: if no character has a frequency below k, the entire string is a valid answer. Return its length.

The split call handles multiple occurrences of the split character at once. If 'c' appears at indices 3, 7, and 12, s.split('c') gives you the substrings between all of those positions. You only need to find one split character per recursive call because split breaks on all its occurrences.

Why the recursion terminates quickly

You might worry about the recursion depth. But notice: each time you split on a character, that character is absent from all the resulting substrings (because split removes it). So each level of recursion eliminates at least one character from the alphabet. With only 26 lowercase letters, the recursion depth is at most 26. Each level does O(n) work to count frequencies and split, so the total time is O(26 * n) = O(n).

Complexity analysis

ApproachTimeSpace
Brute force (all substrings)O(n^3)O(n)
Divide and conquerO(n)O(n)

Time: O(n). Each recursion level scans through the string to count frequencies and split. At most 26 levels of recursion (one per unique character). Total: O(26n) which simplifies to O(n).

Space: O(n). The recursion stack is at most 26 levels deep. The split call creates substrings whose total length across one level is at most n. In practice, Python's string slicing creates copies, so the space per level is the sum of substring lengths.

The building blocks

This problem decomposes into two reusable pieces that CodeBricks drills independently:

1. Frequency counting as a decision tool

Count character frequencies and use them to make decisions about the string's structure. Here, frequencies tell you which characters are "blockers" that can never be part of a valid substring. This same frequency analysis appears in problems like Group Anagrams (frequency signatures), Top K Frequent Elements (bucket sort by frequency), and Longest Repeating Character Replacement (max frequency in a window).

freq = {}
for c in s:
    freq[c] = freq.get(c, 0) + 1

for c in freq:
    if freq[c] < k:
        # this character is a blocker

2. Divide and conquer on strings

Split a problem into smaller subproblems at natural boundary points, solve each recursively, and combine results. Here, the boundary points are characters that violate the constraint. The same recursive decomposition pattern shows up in problems like Merge Sort (split at the midpoint), Quick Sort (split at the pivot), and expression parsing (split at operators).

for c in freq:
    if freq[c] < k:
        return max(longest_substring(sub, k) for sub in s.split(c))

return len(s)

These two pieces snap together cleanly: frequency counting identifies where to split, and divide and conquer handles the recursion. Learning them as isolated building blocks means you can recognize and apply them in other problems that use the same structure.

Edge cases

Before submitting, verify your solution handles:

  • Empty string "": return 0.
  • k equals 1: every character appears at least once, so return len(s).
  • k larger than the string length: no valid substring possible, return 0.
  • All identical "aaaa" with k = 2: return 4. The entire string is valid.
  • No valid substring "abcde" with k = 2: every character appears exactly once, so return 0.
  • Multiple split characters: "aabcbdd" with k = 2 splits on 'c' (count 1), giving "aab" and "bdd". Then "aab" splits on 'b' (count 1 in that piece), giving "aa" which is valid with length 2. "bdd" splits on 'b' (count 1), giving "dd" which is valid with length 2. Answer: 2.

The divide-and-conquer solution handles all of these without special cases beyond the base case.

From understanding to recall

Reading through this solution is one thing. Reproducing it from scratch in a week is another. The divide-and-conquer insight here is elegant but easy to forget. Spaced repetition helps you internalize two things: the frequency-counting pattern that identifies split points, and the recursive decomposition that turns a string problem into smaller subproblems. You revisit these building blocks at increasing intervals until splitting on infrequent characters is something you derive naturally rather than recall from memory.

Related posts