Skip to content
← All posts

Largest Substring Between Two Equal Characters: Hash Map First Occurrence

5 min read
leetcodeproblemeasystringshash-map

Given a string s, return the length of the longest substring between two equal characters, excluding the two characters themselves. If there is no such substring, return -1.

This is LeetCode 1624: Largest Substring Between Two Equal Characters, and it is a clean example of how a hash map can replace nested loops. The brute force approach checks every pair of matching characters. The optimal approach does it in a single pass by remembering where each character first appeared.

0afirst 'a'1b2c3asecond 'a'distance: 3 - 0 - 1 = 2
In "abca", character 'a' appears at index 0 and index 3. The substring between them is "bc", which has length 2.

Why this problem matters

This problem teaches a pattern that shows up constantly in string and array questions: track the first (or last) occurrence of each element in a hash map. The same idea powers problems like Longest Substring Without Repeating Characters, Contains Duplicate, and Two Sum. If you internalize the "first occurrence map" technique here, you will recognize it instantly in harder problems.

It also reinforces a core algorithmic instinct: when you see a brute force solution with two nested loops, ask yourself whether a hash map can collapse the inner loop into an O(1) lookup.

The approach

The key insight is that you only need to know where each character first appeared. When you encounter a character for the second (or third, or fourth) time, you compute the distance between the current index and the first occurrence. You keep the maximum distance seen so far.

Why the first occurrence and not the last? Because you want the largest substring. For any repeated character, the widest gap is always between the first and last occurrence. Since you scan left to right and only store the first index, every subsequent occurrence of that character naturally measures from the earliest possible position.

def max_length_between_equal_characters(s: str) -> int:
    first_occurrence = {}
    best = -1

    for i, char in enumerate(s):
        if char in first_occurrence:
            best = max(best, i - first_occurrence[char] - 1)
        else:
            first_occurrence[char] = i

    return best

Step-by-step walkthrough

Let's trace through the string "abca" to see the algorithm in action.

Step 1: i=0, char='a'. Not in map. Record first occurrence.

0a1b2c3abest = -1first_occurrence:'a' : 0

'a' is new. Store its index 0 in the map. No match yet, best stays -1.

Step 2: i=1, char='b'. Not in map. Record first occurrence.

0a1b2c3abest = -1first_occurrence:'a' : 0'b' : 1

'b' is new. Store its index 1. Still no repeated character.

Step 3: i=2, char='c'. Not in map. Record first occurrence.

0a1b2c3abest = -1first_occurrence:'a' : 0'b' : 1'c' : 2

'c' is new. Store its index 2. No match found yet.

Step 4: i=3, char='a'. Already in map at index 0. Distance = 3 - 0 - 1 = 2.

0a1b2c3abest = 2first_occurrence:'a' : 0'b' : 1'c' : 2

'a' was first seen at index 0. The substring between indices 0 and 3 has length 2. Update best to 2.

The process is simple. For each character, check if it has been seen before. If not, record its index. If it has been seen, compute the substring length between the first occurrence and the current index using the formula i - first_occurrence[char] - 1. The minus one excludes the two matching characters themselves from the count. Update the best if this distance is larger.

Notice that we never update first_occurrence[char] once it is set. That is intentional. Keeping the earliest index guarantees the widest possible gap when we encounter the character again later.

Complexity analysis

ApproachTimeSpace
Brute force (check every pair)O(n^2)O(1)
Hash map first occurrenceO(n)O(1)*

*The space is O(1) because the hash map stores at most 26 entries (one per lowercase English letter). The input is constrained to lowercase letters, so the map size is bounded by the alphabet, not by the input length.

The hash map approach is a full order of magnitude faster on large inputs. For a string of 10,000 characters, brute force could examine up to 50 million pairs. The hash map approach scans the string exactly once.

Edge cases to watch for

  • All unique characters: "abcde" has no repeated character, so return -1. The hash map never finds a match.
  • All identical characters: "aaaa" should return 2. The first occurrence of 'a' is index 0, and the last is index 3. Distance = 3 - 0 - 1 = 2.
  • Length 1 string: "a" has no pair, return -1.
  • Two characters, same: "aa" returns 0. The substring between them is empty, which has length 0. This is different from -1 (no match at all).
  • Two characters, different: "ab" returns -1. No character repeats.
  • Multiple repeated characters: "cbzxy c" requires you to track several characters and take the maximum across all of them.

The building blocks

This problem is built on two reusable pieces:

1. First-occurrence map

Store the index where each element first appears. This is the same pattern used in Two Sum (where you store values and their indices in a dict) and in many sliding window problems (where you track the last seen position of each character). The structure is always the same: iterate once, and for each element either record it or look it up.

first_occurrence = {}
for i, item in enumerate(collection):
    if item not in first_occurrence:
        first_occurrence[item] = i

2. Distance computation from stored index

Once you know where an element first appeared, computing the gap is a single subtraction. This micro-pattern is the same one that powers problems like Subarray Sum Equals K (where you compute prefix sum differences) and Longest Substring Without Repeating Characters (where you compute window sizes from stored positions).

distance = current_index - first_occurrence[item] - 1

These two building blocks, storing a position and later computing a distance from it, combine here in their simplest form. In harder problems they combine with sliding windows, prefix sums, or frequency maps. But the core mechanic is identical.

From understanding to recall

You can probably follow every step of this walkthrough without difficulty. The algorithm is short, the logic is clear, and the code fits in under ten lines. But will you remember the approach a week from now when you see a similar problem?

The gap between understanding and recall is real. When you encounter Longest Substring Without Repeating Characters or Minimum Window Substring in an interview, you want the "first-occurrence map" pattern to fire automatically. That only happens if you have practiced writing the code from memory multiple times at spaced intervals.

Spaced repetition takes a problem like this, one where you already understand the solution, and turns that understanding into reliable recall. You write the hash map lookup from scratch after one day, then after three days, then after a week. By the fourth repetition, your fingers know the pattern before your conscious mind finishes analyzing the problem.

Related posts