Remove All Occurrences of a Substring: Stack-Based String Processing
Remove All Occurrences of a Substring is LeetCode 1910, rated Medium. You are given two strings s and part. Repeatedly find the leftmost occurrence of part in s and remove it, until part no longer appears. Return the final string. The trick is that removing one occurrence can create a new one, so you need a strategy that handles cascading matches cleanly.
Why this problem matters
This problem teaches two important concepts. First, it shows you how removals in a string can create new patterns that did not exist before. When you delete "abc" from "daabcbaabcbc", the characters on either side of the gap collapse together, and a brand new "abc" might form. That cascading behavior is what makes the naive loop approach potentially slow.
Second, it introduces a stack-like technique for string building. Instead of scanning the whole string repeatedly, you can build the result one character at a time, checking at each step whether the end of your result matches the pattern. If it does, you chop it off immediately. This is the same "build and backtrack" idea that powers problems like Simplify Path and Decode String.
The naive approach and why it works (but slowly)
The simplest solution is a while loop: keep searching for part and removing the first occurrence until it is gone.
def remove_occurrences(s, part):
while part in s:
s = s.replace(part, "", 1)
return s
This is correct. Python's str.replace(part, "", 1) removes only the first (leftmost) occurrence, which is exactly what the problem asks for. The loop keeps going until no occurrences remain.
The downside is performance. Each call to replace scans the string to find part (O(n) work) and then creates a new string with the match removed (another O(n) work). In the worst case, you might need to do this roughly n/m times, where m is the length of part. That gives you O(n^2 / m) total time. For small inputs (the constraint is n up to 1000), this is perfectly fine and will pass on LeetCode. But it is worth knowing the better approach.
The naive solution is totally acceptable in an interview as a starting point. State it, explain the time cost, and then move to the stack approach as an optimization.
The stack approach: build and check
The key insight is that you do not need to rescan the entire string after each removal. Instead, build the result character by character. After appending each character, check whether the last m characters of the result match part. If they do, remove them immediately.
This works because any match that forms must end at the character you just appended. You never need to look further back than m characters. And after you remove a match, the remaining characters in the result might form a new match at their tail, but you will catch that on the very next append.
def remove_occurrences(s, part):
result = []
m = len(part)
for ch in s:
result.append(ch)
if len(result) >= m and ''.join(result[-m:]) == part:
del result[-m:]
return ''.join(result)
Walk through each character of s. Append it to the result list. Then check: do the last m characters spell out part? If yes, delete them. That is it. The result list acts like a stack where you push characters and occasionally pop a whole substring off the top.
Using a list instead of a string is important here. Appending to and slicing from a Python list is O(1) amortized for append and O(m) for the slice check and delete. Building strings with += in a loop would create a new string each time, adding unnecessary overhead.
Step-by-step walkthrough
Let's trace through s = "daabcbaabcbc" and part = "abc". Watch how each removal causes the surrounding characters to collapse and form new matches.
Initial string
s = "daabcbaabcbc". Scan left to right. The first "abc" is at index 2.
Step 1: Remove "abc" at index 2
After removing indices 2..4, we get "dabaabcbc". A new "abc" appears at index 4.
Step 2: Remove "abc" at index 4
After removing, we get "dababc". Another "abc" has formed at index 3.
Step 3: Remove "abc" at index 3
After removing, we get "dab". No more occurrences of "abc" exist. Done.
Notice the cascade: removing one "abc" brings characters together that form another "abc". The stack approach handles this naturally because it checks after every single character append. The naive approach handles it too, but it rescans the entire string each time.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Repeated find/replace | O(n^2 / m) worst case | O(n) |
| Stack-based | O(n * m) | O(n) |
For the stack approach, you iterate through n characters. For each character, the comparison of the last m characters costs O(m) in the worst case. That gives O(n * m) total. The space is O(n) for the result list.
You can further optimize to O(n) time using KMP or rolling hash to check the suffix match in O(1) amortized time. But O(n * m) is more than sufficient for interview purposes, and the code stays clean and readable.
Building blocks
This problem uses two reusable patterns:
Stack-like string building. You build a result by appending characters and occasionally removing from the end. This is the same structure behind bracket matching (push openers, pop on closers) and path simplification (push directories, pop on ".."). The result list behaves like a stack where you push single characters and pop substrings.
Suffix matching after each step. Instead of rescanning the whole string, you only check the tail of what you have built so far. This local-check-after-each-step pattern keeps the algorithm efficient and avoids redundant work.
Edge cases
Before submitting, verify your solution handles these:
partnot ins. For example,s = "hello"andpart = "xyz". The result is justsunchanged. The stack approach handles this naturally since no suffix ever matches.- Entire string is
part. For example,s = "abc"andpart = "abc". After one removal, the result is empty. Return"". - Cascading removals that empty the string. For example,
s = "aababcbcbc"andpart = "abc". Each removal triggers another until the string is gone. - Single character
part. For example,part = "a". Every occurrence of that character gets removed. The suffix check is trivial (just compare the last character). partlonger thans. No match is possible. Returnsunchanged.
From understanding to recall
You have seen both approaches now. The naive loop is easy to remember, and the stack-based builder is only a few lines longer. But the details matter under pressure. Do you use del result[-m:] or result = result[:-m]? Do you check len(result) >= m before comparing? Do you join with '' at the end?
These are recall questions, not conceptual ones. You understand the algorithm, but producing it from scratch in three weeks requires practice. Spaced repetition bridges that gap. You type the stack-based solution from memory at increasing intervals, and each repetition reinforces the append-check-delete flow. After a few cycles, the whole thing is automatic.
Related posts
- Evaluate Reverse Polish Notation - another stack-based processing problem
- Simplify Path - stack-based string manipulation with backtracking
- Daily Temperatures - monotonic stack pattern for sequential processing
CodeBricks breaks the stack-based string building pattern into its reusable pieces and drills them individually. You type the solution from scratch each time, building the muscle memory so that when you see Remove All Occurrences or any string-processing stack problem in an interview, the code flows without hesitation.