Regular Expression Matching: DP with Wildcards
You are given an input string s and a pattern p. Implement regular expression matching with support for '.' and '*':
'.'matches any single character.'*'matches zero or more of the preceding element.
The matching should cover the entire input string, not just a substring.
This is LeetCode 10: Regular Expression Matching, one of the classic hard problems. It combines two-string DP with wildcard logic that trips up even experienced candidates. If you have already solved Edit Distance or Longest Common Subsequence, you will recognize the 2D table structure. The twist here is that the table stores booleans instead of integers, and the '*' operator introduces a branching recurrence that takes careful handling.
Why this problem matters
Regular expression matching is not just an interview question. Real regex engines, text editors, and compilers all need to decide whether a string matches a pattern. The '*' wildcard (Kleene star) is one of the fundamental operators in formal language theory. Understanding how to handle it with dynamic programming gives you insight into how pattern matching actually works under the hood.
In interviews, this problem tests your ability to handle multi-case recurrences. The '.' case is simple, but the '*' case requires you to reason about two sub-cases (use the star zero times, or use it one more time). Getting the logic right under pressure requires genuine understanding, not just memorization.
Examples
s = "aa",p = "a"returnsfalse. The pattern is a single"a", which does not cover the entire string.s = "aa",p = "a*"returnstrue."a*"means zero or moreacharacters.s = "ab",p = ".*"returnstrue.".*"means zero or more of any character.
The brute force idea
Start from the beginning of both strings. At each position, you check whether the current characters match. But when you see a '*' at p[j+1], you have two choices: skip the x* group entirely (match zero occurrences), or consume one character from s and stay at the same pattern position (match one more occurrence). This branching leads to exponential time.
def is_match_recursive(s, p, i, j):
if j == len(p):
return i == len(s)
first_match = i < len(s) and (p[j] == s[i] or p[j] == ".")
if j + 1 < len(p) and p[j + 1] == "*":
return (is_match_recursive(s, p, i, j + 2) or
(first_match and is_match_recursive(s, p, i + 1, j)))
return first_match and is_match_recursive(s, p, i + 1, j + 1)
The same (i, j) pair gets recomputed many times. You need a table.
The recursive version captures the logic cleanly, but do not submit it without memoization. For large inputs, it will time out. The bottom-up DP approach fills the table in O(m * n) and avoids recursion limits entirely.
Building the 2D DP table
Define dp[i][j] as whether s[0..i-1] matches p[0..j-1]. The table has (m + 1) rows and (n + 1) columns, where m = len(s) and n = len(p).
The recurrence has three cases:
- Characters match (either
p[j-1] == s[i-1]orp[j-1] == '.'): take the diagonal value.dp[i][j] = dp[i-1][j-1]. - Star in the pattern (
p[j-1] == '*'): two sub-cases:- Use zero occurrences: skip the entire
x*group.dp[i][j] = dp[i][j-2]. - Use one more occurrence: if the preceding character
p[j-2]matchess[i-1](orp[j-2] == '.'), thendp[i][j] = dp[i-1][j]. This means "the star already matched some characters, and it can match one more." dp[i][j]isTrueif either sub-case isTrue.
- Use zero occurrences: skip the entire
- Characters do not match and no star:
dp[i][j] = False.
Base cases:
dp[0][0] = True. Empty string matches empty pattern.dp[0][j]: the empty string can match patterns like"a*","a*b*", etc. For everyjwherep[j-1] == '*', setdp[0][j] = dp[0][j-2].dp[i][0] = Falsefori > 0. A non-empty string never matches an empty pattern.
Let's trace through s = "aab" and p = "c*a*b":
Initialize: base cases for the empty string vs pattern
dp[0][0] = True (empty matches empty). dp[0][2] = True because "c*" can match zero c's. dp[0][4] = True because "c*a*" can match zero of both. Every star column checks dp[0][j-2].
Row 1 (s="a"): checking "a" against each prefix of "c*a*b"
dp[1][3]: pattern "c*a" vs "a". c* matches zero, then a==a, so dp[0][2] is True and we take the diagonal. dp[1][4]: p[3]="*", and p[2]="a" matches s[0]="a", so dp[0][4] is True.
Row 2 (s="aa"): the star at p[4] extends to absorb the second "a"
dp[2][4]: p[3]="*", p[2]="a" matches s[1]="a". Since dp[1][4] is True, the star absorbs one more "a". dp[2][3] is False because "c*a" only handles one "a".
Row 3 (s="aab"): "b" matches the final "b" in the pattern. Done!
dp[3][5]: p[4]="b" matches s[2]="b", so we check the diagonal dp[2][4] which is True. The full string "aab" matches "c*a*b". Answer: True.
The answer is dp[3][5] = True. The pattern "c*" matches zero c's, "a*" matches two a's, and "b" matches the final b.
def is_match(s, p):
m, n = len(s), len(p)
dp = [[False] * (n + 1) for _ in range(m + 1)]
dp[0][0] = True
for j in range(2, n + 1):
if p[j - 1] == "*":
dp[0][j] = dp[0][j - 2]
for i in range(1, m + 1):
for j in range(1, n + 1):
if p[j - 1] == s[i - 1] or p[j - 1] == ".":
dp[i][j] = dp[i - 1][j - 1]
elif p[j - 1] == "*":
dp[i][j] = dp[i][j - 2]
if p[j - 2] == s[i - 1] or p[j - 2] == ".":
dp[i][j] = dp[i][j] or dp[i - 1][j]
return dp[m][n]
Time: O(m * n). Every cell in the 2D table gets filled exactly once.
Space: O(m * n). The full table.
The key to the star case: dp[i][j-2] handles zero occurrences (skip the x* entirely), and dp[i-1][j] handles one more occurrence (the star already matched, and it can extend by consuming s[i-1]). You never need to explicitly loop over "how many times does the star match." The recurrence handles it one character at a time, propagating truth values row by row.
Understanding the star case visually
The '*' wildcard is what makes this problem hard. Think of it this way:
When you encounter p[j-1] == '*', you are looking at a two-character group p[j-2]p[j-1], like "a*" or ".*". This group can match:
- Zero characters: pretend the group does not exist. Check
dp[i][j-2]. This is like erasing"a*"from the pattern. - One or more characters: if the preceding character
p[j-2]matches the current string characters[i-1](orp[j-2]is'.'), then checkdp[i-1][j]. This is like saying "the star already matched everything up tos[i-2], and now it matchess[i-1]too."
The beauty of the DP approach is that you do not need to decide upfront how many characters the star matches. The table propagates the information row by row. If dp[1][4] is True (the star matched one "a"), then when you check dp[2][4], the star can match one more "a" by looking at dp[1][4].
The '.' wildcard is much simpler. It acts exactly like a regular character match, except it matches any character in s. Wherever you see a character comparison p[j-1] == s[i-1], the condition p[j-1] == '.' also triggers the same logic.
Walkthrough of the key transitions
Let's trace the critical cells for s = "aab", p = "c*a*b":
- dp[0][2] (empty string vs
"c*"):p[1] == '*', so checkdp[0][0](skip"c*").dp[0][0]is True, sodp[0][2] = True. The star eliminates the"c". - dp[0][4] (empty string vs
"c*a*"):p[3] == '*', so checkdp[0][2](skip"a*").dp[0][2]is True, sodp[0][4] = True. Both stars match zero. - dp[1][3] (
"a"vs"c*a"):p[2] == 'a'ands[0] == 'a', characters match. Check diagonaldp[0][2], which is True. Sodp[1][3] = True. - dp[1][4] (
"a"vs"c*a*"):p[3] == '*'. First check zero occurrences:dp[1][2] = False. Then check one more:p[2] == 'a'matchess[0] == 'a', so checkdp[0][4] = True. Sodp[1][4] = True. - dp[2][4] (
"aa"vs"c*a*"):p[3] == '*'. Zero occurrences:dp[2][2] = False. One more:p[2] == 'a'matchess[1] == 'a', so checkdp[1][4] = True. Sodp[2][4] = True. The star absorbed both a's. - dp[3][5] (
"aab"vs"c*a*b"):p[4] == 'b'ands[2] == 'b', characters match. Check diagonaldp[2][4] = True. Sodp[3][5] = True.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Brute force recursion | Exponential | O(m + n) call stack |
| 2D DP table | O(m * n) | O(m * n) |
Where m = length of s, n = length of p.
Edge cases to watch for
- Empty string, empty pattern:
dp[0][0] = True. Both are empty, so they match. - Empty string, pattern is all stars: e.g.,
s = "",p = "a*b*c*". Eachx*can match zero, so the answer is True. The base case row handles this by checkingdp[0][j-2]for each star. - No wildcards at all: e.g.,
s = "abc",p = "abc". Every cell uses the diagonal character-match case. The DP reduces to a simple character-by-character comparison. - Pattern is just
".*": matches any string, including the empty string.".*"means zero or more of any character. - Consecutive stars: e.g.,
p = "a*b*". Eachx*pair is independent. The first star handles a's, the second handles b's. - Dot without star: e.g.,
p = "a.b"matches"aab","axb", etc. The dot is a single-character wildcard, not a multi-character one.
The building blocks
This problem is built on two core patterns:
2D DP (two-string comparison): The same table structure you see in Edit Distance and Longest Common Subsequence. Define dp[i][j] for prefixes of both inputs, fill the table row by row, and read the answer from dp[m][n]. The difference here is that cells store booleans instead of integers, and the recurrence has a star branch.
Pattern matching with wildcards: The '*' operator introduces a "use it or skip it" branching structure. This is similar to the "take or skip" pattern in subset and knapsack problems, but applied to string matching. The zero-occurrence case (dp[i][j-2]) skips the wildcard group entirely, and the one-more-occurrence case (dp[i-1][j]) extends the match by one character.
Recognizing that the star case decomposes into exactly two sub-cases (zero or one more) is the key insight. Once you have that, the DP recurrence writes itself.
From understanding to recall
You now know how to build the boolean DP table, handle the three cases (character match, star, no match), and trace through the star propagation logic. But reading this once is not the same as writing it cold. In an interview, you need to set up the (m+1) x (n+1) table, get the base cases right for the star columns, and write the star recurrence without confusing j-1 and j-2 indices.
The star case is the part that trips people up. It is easy to forget that dp[i][j-2] is the zero-occurrence branch, or to mix up which index refers to the star character versus the preceding character. That confusion disappears with repeated practice, but not with a single read-through.
The 2D DP and pattern matching building blocks together cover a family of wildcard problems (including LeetCode 44: Wildcard Matching, which uses '?' and '*' with slightly different semantics). Locking in the regex matching recurrence gives you a template you can adapt to any variant. These building blocks are part of the roughly 60 reusable patterns in the CodeBricks system. You drill each block with spaced repetition until writing the solution from scratch is automatic.
Related posts
- Edit Distance - The classic 2D string DP problem. Same table structure, but with integer costs instead of booleans.
- Longest Common Subsequence - The foundational two-string DP pattern. A good warmup before tackling wildcard matching.