Skip to content
← All posts

Remove Invalid Parentheses: BFS Approach Explained

7 min read
leetcodeproblemhardstringsbacktracking

LeetCode 301, Remove Invalid Parentheses, is one of those hard problems that sounds deceptively simple. You are given a string with parentheses that might be invalid, and you need to find every possible valid string you can make by removing the minimum number of parentheses. The catch is that "minimum" part. You cannot just strip characters until it works. You need to find all results that require the fewest removals.

The problem

Given a string s that contains parentheses and letters, remove the minimum number of invalid parentheses to make the input string valid. Return all the possible results. You may return the answer in any order.

Input:  s = "()())()"
Output: ["(())()", "()()()"]

The string "()())()" has one extra closing parenthesis. By removing just one ), we can produce two distinct valid strings. Both "(())()" and "()()()" are valid, and both require exactly one removal. That is the minimum possible.

input()())()0123456remove 1 character (minimum)#1()())()x"(())()"#2()())()x"()()()"keptremovedoriginal
Input "()())()" has one extra closing parenthesis. Removing the ")" at index 3 gives "(())()". Removing the ")" at index 4 gives "()()()". Both are valid with just one removal.

Why this is tricky

The hard part is not checking whether a string is valid. That is a simple counter check. The hard part is the combination of two requirements: you need all valid results, and they must use the minimum number of removals.

If you only needed one valid result, you could greedily scan left to right and remove the first unmatched parenthesis. But the problem asks for every valid result at the minimum removal count. That means you need to explore multiple removal choices systematically, and you need to stop as soon as you find the shallowest level of removals that produces valid strings.

Interviewers love this problem because it tests whether you can choose the right search strategy. BFS naturally gives you minimum removals because it explores all strings with 0 removals, then all strings with 1 removal, then 2, and so on. The moment you find valid strings at any level, you are done.

The key insight: BFS for minimum removals

Think of each string as a node in a graph. From any string, you can reach new strings by removing one parenthesis. Each removal creates an edge to a string that is one character shorter. You want the shortest path from the original string to any valid string.

This is exactly what BFS does. Start with the original string. At level 0, check if it is already valid. If not, generate all strings with one character removed (level 1). Check each one. If any are valid, collect them all and stop. If none are valid, generate level 2 (two characters removed), and so on.

The first level where you find valid strings guarantees minimum removals. BFS explores level by level, so you never overshoot.

Before starting BFS, you can count exactly how many removals are needed. Scan the string: track unmatched ( and unmatched ) separately. The sum tells you the minimum removals. This lets you prune branches early: if you have already removed the right number, just check validity instead of generating more candidates.

Step-by-step walkthrough

Let's trace through the BFS approach on "()())()".

0

Level 0: Check the original string

candidates:
"()())()"

Start with the original string "()())()". Check if it is valid. It is not (extra closing paren at index 4). No valid strings found at this level, so proceed to the next level.

1

Level 1: Remove one character at a time

candidates:
")())()""(())()""()()()""()()(""()())()"
valid:
"(())()""()()()"

Try removing each parenthesis from the string one at a time. This generates candidates with one fewer character. Check each for validity. Two valid strings found: "(())()" and "()()()". Since we found valid results at this level, stop here. These are the answers with the minimum number of removals (1).

Notice how BFS naturally handles the "minimum removals" constraint. We check the original string first (zero removals), and only move to single-character removals when needed. The moment we find valid strings at level 1, we collect all of them and return. We never explore level 2.

The code

from collections import deque

def remove_invalid_parentheses(s: str) -> list[str]:
    def is_valid(string):
        count = 0
        for ch in string:
            if ch == '(':
                count += 1
            elif ch == ')':
                count -= 1
            if count < 0:
                return False
        return count == 0

    result = []
    visited = {s}
    queue = deque([s])
    found = False

    while queue:
        level_size = len(queue)
        for _ in range(level_size):
            curr = queue.popleft()
            if is_valid(curr):
                result.append(curr)
                found = True

            if found:
                continue

            for i in range(len(curr)):
                if curr[i] not in ('(', ')'):
                    continue
                candidate = curr[:i] + curr[i + 1:]
                if candidate not in visited:
                    visited.add(candidate)
                    queue.append(candidate)

        if found:
            break

    return result
  1. The is_valid helper uses a simple counter. Increment for (, decrement for ). If the counter ever goes negative, there is an unmatched ). If the counter is not zero at the end, there are unmatched ( characters.

  2. BFS starts with the original string in the queue and a visited set to avoid processing duplicates.

  3. We process the queue level by level. For each string at the current level, we check validity first. If any string is valid, we set the found flag and keep checking the rest of the current level (there may be multiple valid strings at the same level).

  4. If found is not set, we generate candidates by removing each parenthesis one at a time. We skip non-parenthesis characters since removing letters would not help fix the parentheses structure. Each new candidate is added to the queue if it has not been visited.

  5. After processing a full level, if found is true, we break out of the loop. All valid strings with the minimum number of removals have been collected.

Complexity analysis

Time: O(2^n) in the worst case, where n is the length of the string. At each position, we either remove or keep the character, leading to 2^n possible substrings. The visited set prevents reprocessing, but in the worst case we might still generate an exponential number of candidates.

Space: O(2^n) for the visited set and queue, which in the worst case hold all possible substrings of the input.

ApproachTimeSpace
Brute force (try all subsets)O(2^n * n)O(2^n)
BFS with visited setO(2^n)O(2^n)
DFS with min-removal countO(2^n)O(n) recursion stack

Common mistakes

  1. Forgetting the visited set. Without deduplication, BFS will process the same string many times. Removing index 2 then index 5 gives the same result as removing index 5 then index 2. The visited set prevents this exponential blowup.

  2. Not processing the entire level before stopping. When you find the first valid string, you cannot return immediately. There may be other valid strings at the same BFS level. You need to finish checking all candidates at that level to collect every valid result.

  3. Removing non-parenthesis characters. If the string contains letters, removing them never helps fix parentheses validity. Skipping non-parenthesis characters when generating candidates is both correct and a significant optimization.

  4. Using DFS without tracking the minimum removal count. A naive DFS might find valid strings with more removals than necessary. If you use DFS, you must first calculate the minimum removals needed and then only collect results that match that count.

A subtle gotcha: when two adjacent characters are the same (like )) in a row), removing the first or second produces identical strings. The visited set handles this, but if you forget it, you will get duplicate results in your output.

The building blocks

Parentheses validity check

def is_valid(s):
    count = 0
    for ch in s:
        if ch == '(':
            count += 1
        elif ch == ')':
            count -= 1
        if count < 0:
            return False
    return count == 0

This counter-based pattern appears in nearly every parentheses problem. The key rule: count must never go negative during the scan (no unmatched close), and it must be zero at the end (no unmatched open). You will use this exact function in Valid Parentheses, Generate Parentheses, and Longest Valid Parentheses.

If you can write the validity check from memory in under 30 seconds, you have freed up mental bandwidth for the harder parts of parentheses problems. Drill this small building block until it is automatic.

Edge cases

  • Already valid string: If s is already valid (like "()"), return it as-is with zero removals.
  • All parentheses invalid: If s is something like ")(", you need to remove both characters. The result is [""], the empty string.
  • Empty string: An empty string is valid. Return [""].
  • No parentheses at all: A string like "abc" is valid. Return ["abc"].
  • String with only open or only close parens: For "(((", all three must be removed. For ")))", same thing. The result is [""].
  • Very long strings: The exponential worst case can be slow for large inputs. In practice, most interview test cases keep the string short enough for BFS to work within time limits.

From understanding to recall

Reading through this BFS approach probably makes sense to you right now. You see how BFS naturally handles minimum removals, how the visited set prevents duplicates, and how the level-by-level processing collects all valid results at the optimal depth. The logic is clear.

But clarity while reading is different from fluency while writing. In an interview, you need to set up the BFS queue, implement the validity check, handle the found flag correctly, and remember to skip non-parenthesis characters when generating candidates. Spaced repetition bridges that gap. By drilling the validity check and the BFS template separately, then combining them, you build muscle memory for each piece. When this problem appears under pressure, you write the solution rather than reconstructing it.

Related posts

CodeBricks breaks problems like Remove Invalid Parentheses into their core building blocks and drills each one with spaced repetition. The parentheses validity check, the BFS level-by-level template, the visited set for deduplication. You practice each piece in isolation, then combine them. That is how patterns move from understanding to recall.