Skip to content
← All posts

Number of Good Ways to Split a String: Prefix-Suffix Distinct Counting

5 min read
leetcodeproblemmediumstringshash-map

LeetCode 1525, Number of Good Ways to Split a String, asks you to count how many ways you can split a string s into two non-empty parts where both parts have the same number of distinct characters.

For example, given s = "aacaba", there are 2 good splits: "aac" | "aba" (2 distinct on each side) and "aaca" | "ba" (2 distinct on each side).

This problem is a clean example of the prefix-suffix precomputation pattern. Instead of recounting distinct characters from scratch at every split point, you precompute the answer for all prefixes and all suffixes in two linear passes. Then a single scan across split positions gives you the answer.

s = "aacaba"a0a1c2a3b4a5splits_left"aac"s_right"aba"distinct = 2 (a, c)distinct = 2 (a, b)2 == 2, good split!
Splitting "aacaba" after index 2 gives left = "aac" (2 distinct) and right = "aba" (2 distinct). Counts match, so this is a good split.

Why this problem matters

The prefix-suffix pattern shows up in many string and array problems. You have already seen it in Product of Array Except Self where you build prefix products and suffix products. Here, the same idea applies to distinct character counts. Learning to spot when a problem decomposes into "what happened on the left" and "what happened on the right" is one of the most useful instincts you can develop.

The key question to ask yourself: "Can I precompute some property for every prefix and every suffix, then combine them at each split point?" If yes, you almost always get an O(n) solution.

The approach

A brute force solution would try every split position and count distinct characters on each side using a set. That gives you O(n^2) because for each of the n-1 split points, you scan up to n characters.

The insight is that you do not need to rebuild the set from scratch at each split. Instead, build two arrays:

  1. left_distinct[i] = number of distinct characters in s[0..i] (inclusive)
  2. right_distinct[i] = number of distinct characters in s[i..n-1] (inclusive)

You build left_distinct with a single left-to-right pass, tracking a running set. You build right_distinct with a single right-to-left pass. Then, for each split position i (splitting after index i, so left = s[0..i] and right = s[i+1..n-1]), check if left_distinct[i] == right_distinct[i+1].

This is the same prefix-suffix decomposition pattern used in Product of Array Except Self. There, you precompute prefix and suffix products. Here, you precompute prefix and suffix distinct counts. The structure is identical: two passes to precompute, one pass to combine.

The solution

def numSplits(s: str) -> int:
    n = len(s)
    left_distinct = [0] * n
    right_distinct = [0] * n

    seen = set()
    for i in range(n):
        seen.add(s[i])
        left_distinct[i] = len(seen)

    seen = set()
    for i in range(n - 1, -1, -1):
        seen.add(s[i])
        right_distinct[i] = len(seen)

    count = 0
    for i in range(n - 1):
        if left_distinct[i] == right_distinct[i + 1]:
            count += 1

    return count

Let's walk through the logic:

  1. Build left_distinct by scanning left to right. At each index, add the character to a set and record the set's size. This gives the cumulative distinct count for the prefix ending at that index.
  2. Build right_distinct by scanning right to left. Same idea, but now you are building the distinct count for the suffix starting at each index.
  3. For every valid split position (after index i, where i goes from 0 to n-2), compare left_distinct[i] with right_distinct[i+1]. If they are equal, it is a good split.

The final loop runs from 0 to n-2 because both sides must be non-empty: the left side needs at least one character (index 0) and the right side needs at least one character (index n-1).

Visual walkthrough

Let's trace through s = "aacaba" at every split position and see which ones produce matching distinct counts.

Step 1: Split after index 0. Left = "a", Right = "acaba".

a0a1c2a3b4a51 distinct (a)3 distinct (a,b,c)1 != 3

Left has 1 distinct character, right has 3. Not a match.

Step 2: Split after index 1. Left = "aa", Right = "caba".

a0a1c2a3b4a51 distinct (a)3 distinct (a,b,c)1 != 3

Left still only has 1 distinct character (just a), right has 3. Not a match.

Step 3: Split after index 2. Left = "aac", Right = "aba".

a0a1c2a3b4a52 distinct (a,c)2 distinct (a,b)Good split!

Both sides have 2 distinct characters. This is a good split. Count = 1.

Step 4: Split after index 3. Left = "aaca", Right = "ba".

a0a1c2a3b4a52 distinct (a,c)2 distinct (a,b)Good split!

Both sides have 2 distinct characters again. Count = 2.

Step 5: Split after index 4. Left = "aacab", Right = "a".

a0a1c2a3b4a53 distinct (a,b,c)1 distinct (a)3 != 1

Left has 3 distinct, right has 1. Not a match. Final answer: 2 good splits.

After checking all 5 possible split positions, we found 2 good splits (after index 2 and after index 3). Both give 2 distinct characters on each side.

Complexity analysis

ApproachTimeSpace
Brute force (rebuild set each split)O(n^2)O(n)
Prefix-suffix distinct arraysO(n)O(n)

Time: O(n) where n is the length of the string. Three passes through the string: one left-to-right, one right-to-left, and one final scan of split positions. Each pass is O(n).

Space: O(n). Two arrays of length n store the prefix and suffix distinct counts. The sets used during construction hold at most 26 entries (for lowercase English letters), which is O(1), but the arrays themselves are O(n).

You can optimize space to O(1) by noting that left_distinct is monotonically non-decreasing and right_distinct is monotonically non-decreasing (when read right to left). This means you could use two pointers or binary search. But the O(n) space version is cleaner and easier to get right in an interview.

The building blocks

This problem decomposes into two reusable pieces that CodeBricks drills independently:

1. Prefix distinct count

The technique of scanning left to right while maintaining a running set, recording the size at each position. This gives you the number of distinct elements in every prefix of the array or string.

seen = set()
for i in range(n):
    seen.add(s[i])
    left_distinct[i] = len(seen)

This same building block appears in problems like First Unique Character in a String and Longest Substring Without Repeating Characters, where you need to track what you have seen so far from the left.

2. Suffix distinct count

The mirror image: scanning right to left with a running set. This gives you the distinct count for every suffix.

seen = set()
for i in range(n - 1, -1, -1):
    seen.add(s[i])
    right_distinct[i] = len(seen)

Pairing a prefix pass with a suffix pass is the core of the prefix-suffix decomposition pattern. Once you have both arrays, you can answer any "split the array/string at position i" question in O(1) per position.

Edge cases

Before submitting, verify your solution handles:

  • Two characters, same: s = "aa". Only one split: "a" | "a". Both have 1 distinct. Return 1.
  • Two characters, different: s = "ab". Only one split: "a" | "b". Both have 1 distinct. Return 1.
  • All identical characters: s = "aaaa". Every split gives 1 distinct on each side. Return 3 (there are n-1 = 3 split positions).
  • All unique characters: s = "abcde". The prefix distinct counts grow from 1 to 5. The suffix distinct counts shrink from 5 to 1. They cross in the middle. Check each position.
  • Length 2 (minimum): The smallest valid input. One split position, one comparison.

From understanding to recall

Reading through this solution, the prefix-suffix pattern feels obvious. But under time pressure, the details matter: remembering to use range(n - 1) (not range(n)) for the split check, remembering to compare left_distinct[i] with right_distinct[i + 1] (not right_distinct[i]), and remembering to reset the set between the two passes. These are the kinds of details that slip away without repetition. Spaced repetition locks them in by having you write the code from scratch at increasing intervals until the prefix-suffix decomposition is second nature.

Related posts