Regular Expression Matching: DP with Wildcards

May 12, 20268 min read

leetcodeproblemhardstringsdynamic-programming

You are given an input string s and a pattern p. Implement regular expression matching with support for '.' and '*':

'.' matches any single character.
'*' matches zero or more of the preceding element.

The matching should cover the entire input string, not just a substring.

This is LeetCode 10: Regular Expression Matching, one of the classic hard problems. It combines two-string DP with wildcard logic that trips up even experienced candidates. If you have already solved Edit Distance or Longest Common Subsequence, you will recognize the 2D table structure. The twist here is that the table stores booleans instead of integers, and the '*' operator introduces a branching recurrence that takes careful handling.

2D DP table for s="aab", p="c*a*b". True cells are highlighted. Stars in the pattern can absorb zero or more of the preceding character. Answer: dp[3][5] = True.

Why this problem matters

Regular expression matching is not just an interview question. Real regex engines, text editors, and compilers all need to decide whether a string matches a pattern. The '*' wildcard (Kleene star) is one of the fundamental operators in formal language theory. Understanding how to handle it with dynamic programming gives you insight into how pattern matching actually works under the hood.

In interviews, this problem tests your ability to handle multi-case recurrences. The '.' case is simple, but the '*' case requires you to reason about two sub-cases (use the star zero times, or use it one more time). Getting the logic right under pressure requires genuine understanding, not just memorization.

Examples

s = "aa", p = "a" returns false. The pattern is a single "a", which does not cover the entire string.
s = "aa", p = "a*" returns true. "a*" means zero or more a characters.
s = "ab", p = ".*" returns true. ".*" means zero or more of any character.

The brute force idea

Start from the beginning of both strings. At each position, you check whether the current characters match. But when you see a '*' at p[j+1], you have two choices: skip the x* group entirely (match zero occurrences), or consume one character from s and stay at the same pattern position (match one more occurrence). This branching leads to exponential time.

def is_match_recursive(s, p, i, j):
    if j == len(p):
        return i == len(s)
    first_match = i < len(s) and (p[j] == s[i] or p[j] == ".")
    if j + 1 < len(p) and p[j + 1] == "*":
        return (is_match_recursive(s, p, i, j + 2) or
                (first_match and is_match_recursive(s, p, i + 1, j)))
    return first_match and is_match_recursive(s, p, i + 1, j + 1)

The same (i, j) pair gets recomputed many times. You need a table.

The recursive version captures the logic cleanly, but do not submit it without memoization. For large inputs, it will time out. The bottom-up DP approach fills the table in O(m * n) and avoids recursion limits entirely.

Building the 2D DP table

Define dp[i][j] as whether s[0..i-1] matches p[0..j-1]. The table has (m + 1) rows and (n + 1) columns, where m = len(s) and n = len(p).

The recurrence has three cases:

Characters match (either p[j-1] == s[i-1] or p[j-1] == '.'): take the diagonal value. dp[i][j] = dp[i-1][j-1].
Star in the pattern (p[j-1] == '*'): two sub-cases:
- Use zero occurrences: skip the entire x* group. dp[i][j] = dp[i][j-2].
- Use one more occurrence: if the preceding character p[j-2] matches s[i-1] (or p[j-2] == '.'), then dp[i][j] = dp[i-1][j]. This means "the star already matched some characters, and it can match one more."
- dp[i][j] is True if either sub-case is True.
Characters do not match and no star: dp[i][j] = False.

Base cases:

dp[0][0] = True. Empty string matches empty pattern.
dp[0][j]: the empty string can match patterns like "a*", "a*b*", etc. For every j where p[j-1] == '*', set dp[0][j] = dp[0][j-2].
dp[i][0] = False for i > 0. A non-empty string never matches an empty pattern.

Let's trace through s = "aab" and p = "c*a*b":

Initialize: base cases for the empty string vs pattern

dp[0][0] = True (empty matches empty). dp[0][2] = True because "c*" can match zero c's. dp[0][4] = True because "c*a*" can match zero of both. Every star column checks dp[0][j-2].

Row 1 (s="a"): checking "a" against each prefix of "c*a*b"

dp[1][3]: pattern "c*a" vs "a". c* matches zero, then a==a, so dp[0][2] is True and we take the diagonal. dp[1][4]: p[3]="*", and p[2]="a" matches s[0]="a", so dp[0][4] is True.

Row 2 (s="aa"): the star at p[4] extends to absorb the second "a"

dp[2][4]: p[3]="*", p[2]="a" matches s[1]="a". Since dp[1][4] is True, the star absorbs one more "a". dp[2][3] is False because "c*a" only handles one "a".

Row 3 (s="aab"): "b" matches the final "b" in the pattern. Done!

dp[3][5]: p[4]="b" matches s[2]="b", so we check the diagonal dp[2][4] which is True. The full string "aab" matches "c*a*b". Answer: True.

The answer is dp[3][5] = True. The pattern "c*" matches zero c's, "a*" matches two a's, and "b" matches the final b.

def is_match(s, p):
    m, n = len(s), len(p)
    dp = [[False] * (n + 1) for _ in range(m + 1)]
    dp[0][0] = True

    for j in range(2, n + 1):
        if p[j - 1] == "*":
            dp[0][j] = dp[0][j - 2]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if p[j - 1] == s[i - 1] or p[j - 1] == ".":
                dp[i][j] = dp[i - 1][j - 1]
            elif p[j - 1] == "*":
                dp[i][j] = dp[i][j - 2]
                if p[j - 2] == s[i - 1] or p[j - 2] == ".":
                    dp[i][j] = dp[i][j] or dp[i - 1][j]

    return dp[m][n]

Time: O(m * n). Every cell in the 2D table gets filled exactly once.

Space: O(m * n). The full table.

The key to the star case: dp[i][j-2] handles zero occurrences (skip the x* entirely), and dp[i-1][j] handles one more occurrence (the star already matched, and it can extend by consuming s[i-1]). You never need to explicitly loop over "how many times does the star match." The recurrence handles it one character at a time, propagating truth values row by row.

Practice this problem with spaced repetition

Understanding the star case visually

The '*' wildcard is what makes this problem hard. Think of it this way:

When you encounter p[j-1] == '*', you are looking at a two-character group p[j-2]p[j-1], like "a*" or ".*". This group can match:

Zero characters: pretend the group does not exist. Check dp[i][j-2]. This is like erasing "a*" from the pattern.
One or more characters: if the preceding character p[j-2] matches the current string character s[i-1] (or p[j-2] is '.'), then check dp[i-1][j]. This is like saying "the star already matched everything up to s[i-2], and now it matches s[i-1] too."

The beauty of the DP approach is that you do not need to decide upfront how many characters the star matches. The table propagates the information row by row. If dp[1][4] is True (the star matched one "a"), then when you check dp[2][4], the star can match one more "a" by looking at dp[1][4].

The '.' wildcard is much simpler. It acts exactly like a regular character match, except it matches any character in s. Wherever you see a character comparison p[j-1] == s[i-1], the condition p[j-1] == '.' also triggers the same logic.

Walkthrough of the key transitions

Let's trace the critical cells for s = "aab", p = "c*a*b":

dp[0][2] (empty string vs "c*"): p[1] == '*', so check dp[0][0] (skip "c*"). dp[0][0] is True, so dp[0][2] = True. The star eliminates the "c".
dp[0][4] (empty string vs "c*a*"): p[3] == '*', so check dp[0][2] (skip "a*"). dp[0][2] is True, so dp[0][4] = True. Both stars match zero.
dp[1][3] ("a" vs "c*a"): p[2] == 'a' and s[0] == 'a', characters match. Check diagonal dp[0][2], which is True. So dp[1][3] = True.
dp[1][4] ("a" vs "c*a*"): p[3] == '*'. First check zero occurrences: dp[1][2] = False. Then check one more: p[2] == 'a' matches s[0] == 'a', so check dp[0][4] = True. So dp[1][4] = True.
dp[2][4] ("aa" vs "c*a*"): p[3] == '*'. Zero occurrences: dp[2][2] = False. One more: p[2] == 'a' matches s[1] == 'a', so check dp[1][4] = True. So dp[2][4] = True. The star absorbed both a's.
dp[3][5] ("aab" vs "c*a*b"): p[4] == 'b' and s[2] == 'b', characters match. Check diagonal dp[2][4] = True. So dp[3][5] = True.

Build your DP pattern library with CodeBricks

Complexity analysis

Approach	Time	Space
Brute force recursion	Exponential	O(m + n) call stack
2D DP table	O(m * n)	O(m * n)

Where m = length of s, n = length of p.

Edge cases to watch for

Empty string, empty pattern: dp[0][0] = True. Both are empty, so they match.
Empty string, pattern is all stars: e.g., s = "", p = "a*b*c*". Each x* can match zero, so the answer is True. The base case row handles this by checking dp[0][j-2] for each star.
No wildcards at all: e.g., s = "abc", p = "abc". Every cell uses the diagonal character-match case. The DP reduces to a simple character-by-character comparison.
Pattern is just ".*": matches any string, including the empty string. ".*" means zero or more of any character.
Consecutive stars: e.g., p = "a*b*". Each x* pair is independent. The first star handles a's, the second handles b's.
Dot without star: e.g., p = "a.b" matches "aab", "axb", etc. The dot is a single-character wildcard, not a multi-character one.

The building blocks

This problem is built on two core patterns:

2D DP (two-string comparison): The same table structure you see in Edit Distance and Longest Common Subsequence. Define dp[i][j] for prefixes of both inputs, fill the table row by row, and read the answer from dp[m][n]. The difference here is that cells store booleans instead of integers, and the recurrence has a star branch.

Pattern matching with wildcards: The '*' operator introduces a "use it or skip it" branching structure. This is similar to the "take or skip" pattern in subset and knapsack problems, but applied to string matching. The zero-occurrence case (dp[i][j-2]) skips the wildcard group entirely, and the one-more-occurrence case (dp[i-1][j]) extends the match by one character.

Recognizing that the star case decomposes into exactly two sub-cases (zero or one more) is the key insight. Once you have that, the DP recurrence writes itself.

From understanding to recall

You now know how to build the boolean DP table, handle the three cases (character match, star, no match), and trace through the star propagation logic. But reading this once is not the same as writing it cold. In an interview, you need to set up the (m+1) x (n+1) table, get the base cases right for the star columns, and write the star recurrence without confusing j-1 and j-2 indices.

The star case is the part that trips people up. It is easy to forget that dp[i][j-2] is the zero-occurrence branch, or to mix up which index refers to the star character versus the preceding character. That confusion disappears with repeated practice, but not with a single read-through.

The 2D DP and pattern matching building blocks together cover a family of wildcard problems (including LeetCode 44: Wildcard Matching, which uses '?' and '*' with slightly different semantics). Locking in the regex matching recurrence gives you a template you can adapt to any variant. These building blocks are part of the roughly 60 reusable patterns in the CodeBricks system. You drill each block with spaced repetition until writing the solution from scratch is automatic.

Edit Distance - The classic 2D string DP problem. Same table structure, but with integer costs instead of booleans.
Longest Common Subsequence - The foundational two-string DP pattern. A good warmup before tackling wildcard matching.

Try CodeBricks free