Skip to content
← All posts

Distinct Subsequences: 2D DP for Subsequence Counting

7 min read
leetcodeproblemhardstringsdynamic-programming

You are given two strings s and t. Return the number of distinct subsequences of s that equal t. A subsequence keeps characters in their original order but can skip any number of characters.

This is LeetCode 115: Distinct Subsequences. For example, s = "rabbbit" and t = "rabbit" returns 3. There are three different ways to choose characters from s to spell t, depending on which of the three b characters you pick. With s = "babgbag" and t = "bag", the answer is 5.

If you have solved Edit Distance or Longest Common Subsequence, you will recognize the 2D table structure. The difference is that distinct subsequences does not minimize cost or maximize length. It counts the number of ways to form one string from another.

""bag""babgbag10001100111012101211131113411345matchskipMatch (diagonal)Skip (above)Chars matchAnswer
2D DP table for s="babgbag", t="bag". Green cells are where characters match. Arrows at dp[3][1] show the match/skip recurrence. Answer: dp[7][3] = 5.

Why this problem matters

Most 2D string DP problems ask for an optimal value: the shortest edit distance, the longest common subsequence. Distinct subsequences asks for a count. This changes the recurrence from min or max to addition. That shift trips people up because the table looks the same but the logic is different.

The match/skip recurrence also appears in combinatorics and probability. Every time you stand at a character in s that matches the current character in t, you face a choice: use this character (match) or ignore it and try later characters (skip). The total at each cell is the sum of both branches. This is the same branching logic behind binomial coefficients and path counting problems.

The recurrence

Define dp[i][j] as the number of distinct subsequences of s[0..i-1] that equal t[0..j-1]. The table has (m + 1) rows (for s) and (n + 1) columns (for t), where m = len(s) and n = len(t).

At each cell dp[i][j], you ask: how many ways can the first i characters of s form the first j characters of t?

  • If s[i-1] == t[j-1] (the characters match), you have two choices:
    • Match: use s[i-1] to match t[j-1], and count the ways to form t[0..j-2] from s[0..i-2]. That is dp[i-1][j-1].
    • Skip: ignore s[i-1] and count the ways to form t[0..j-1] from s[0..i-2]. That is dp[i-1][j].
    • Total: dp[i][j] = dp[i-1][j-1] + dp[i-1][j]
  • If s[i-1] != t[j-1] (no match), you cannot use s[i-1], so you skip it:
    • dp[i][j] = dp[i-1][j]

Base cases:

  • dp[i][0] = 1 for all i. Every prefix of s has exactly one way to form the empty string: delete everything.
  • dp[0][j] = 0 for j > 0. An empty s has zero ways to form any non-empty t.

The key insight: when characters match, you add the two branches together instead of picking the better one. Match contributes dp[i-1][j-1] ways, skip contributes dp[i-1][j] ways, and the total is their sum. This is counting, not optimizing.

Walkthrough: s = "babgbag", t = "bag"

Let's trace through the table cell by cell.

Initialize: dp[i][0] = 1 for all i, dp[0][j] = 0 for j > 0

""bag""babgbag10001000100010001000100010001000

Any string has exactly one way to form the empty subsequence (delete everything). An empty string has zero ways to form a non-empty target.

Row 1 (s="b"): s[0]="b" matches t[0]="b"

""bag""babgbag10001100100010001000100010001000matchskip

s[0]="b" == t[0]="b": dp[1][1] = dp[0][0] + dp[0][1] = 1 + 0 = 1. One way to form "b" from "b".

Row 2 (s="ba"): s[1]="a" matches t[1]="a"

""bag""babgbag10001100111010001000100010001000

s[1]="a" != t[0]="b", so dp[2][1] = dp[1][1] = 1. s[1]="a" == t[1]="a", so dp[2][2] = dp[1][1] + dp[1][2] = 1 + 0 = 1.

Row 3 (s="bab"): s[2]="b" matches t[0]="b" again, adding a new way

""bag""babgbag10001100111012101000100010001000matchskip

s[2]="b" == t[0]="b": dp[3][1] = dp[2][0] + dp[2][1] = 1 + 1 = 2. Two ways to pick a "b" from "bab".

Row 4 (s="babg"): s[3]="g" matches t[2]="g", first complete match

""bag""babgbag10001100111012101211100010001000

s[3]="g" == t[2]="g": dp[4][3] = dp[3][2] + dp[3][3] = 1 + 0 = 1. First complete "bag" found!

Rows 5-7: more "b", "a", "g" matches accumulate more ways

""bag""babgbag10001100111012101211131113411345

dp[5][1] = 3 (three ways to pick "b"). dp[6][2] = 4 (four ways to form "ba"). dp[7][3] = dp[6][2] + dp[6][3] = 4 + 1 = 5. Answer: 5 distinct subsequences.

The answer is dp[7][3] = 5. There are five distinct ways to choose characters from "babgbag" to spell "bag".

Python solution

def num_distinct(s, t):
    m, n = len(s), len(t)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(m + 1):
        dp[i][0] = 1

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s[i - 1] == t[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + dp[i - 1][j]
            else:
                dp[i][j] = dp[i - 1][j]

    return dp[m][n]

Time: O(m * n). Every cell in the 2D table gets filled exactly once.

Space: O(m * n). The full table.

Walking through key cells

A few cells deserve closer attention because they show how the count accumulates.

dp[3][1] = 2. At this point, s is "bab" and you are trying to form "b". The first b (index 0) gives one way, and the second b (index 2) gives another. The recurrence adds the match path (dp[2][0] = 1) to the skip path (dp[2][1] = 1), giving 2.

dp[5][1] = 3. Now s is "babgb" and you are still forming "b". A third b has appeared at index 4. The match path adds dp[4][0] = 1 to the skip path dp[4][1] = 2, giving 3. Each new matching character in s adds more ways.

dp[6][2] = 4. Here s is "babgba" and you are forming "ba". The a at index 5 matches t[1], so dp[6][2] = dp[5][1] + dp[5][2] = 3 + 1 = 4. The 3 comes from the three ways to pick a b before this a, and the 1 comes from the existing ways to form "ba" without using this a.

dp[7][3] = 5. The final g at index 6 matches t[2]. dp[7][3] = dp[6][2] + dp[6][3] = 4 + 1 = 5. All four "ba" prefixes can extend through this g, plus the one existing "bag" that was already formed earlier.

Complexity analysis

ApproachTimeSpace
Brute force recursionO(2^m)O(m) call stack
2D DP tableO(m * n)O(m * n)
Space-optimized (one row)O(m * n)O(n)

Where m = length of s, n = length of t.

You can optimize space to O(n) by using a single row and iterating j from right to left, since each cell only depends on the current row and the row above. But the 2D version is what you should learn first. It makes the recurrence visible and is easier to debug.

Building blocks

This problem is built on the 2D subsequence DP pattern with a match/skip recurrence.

  • State: dp[i][j] = number of ways to form t[0..j-1] from s[0..i-1]
  • Transition: if characters match, dp[i][j] = dp[i-1][j-1] + dp[i-1][j]. Otherwise, dp[i][j] = dp[i-1][j].
  • Base case: first column is all 1s (one way to form empty string), first row is all 0s except dp[0][0] = 1

The match/skip recurrence is the counting version of the same structure you see in other two-string DP problems:

  • LCS (LeetCode 1143): match = diagonal + 1, no match = max(up, left). Maximizes shared length.
  • Edit Distance (LeetCode 72): match = diagonal (free), no match = min(diagonal, up, left) + 1. Minimizes cost.
  • Distinct Subsequences (LeetCode 115): match = diagonal + above, no match = above. Counts ways.

All three use the same table shape and the same three-neighbor dependency. What changes is the operation at each cell: max, min, or sum.

Edge cases to watch for

  • t longer than s: if len(t) > len(s), the answer is always 0. You cannot form a longer string from a shorter one. The DP handles this naturally since the table will never accumulate any positive values beyond the base column.
  • Empty t: the answer is always 1, regardless of s. Every string has exactly one way to form the empty subsequence.
  • Identical strings: if s == t, the answer is 1. There is exactly one way to select every character.
  • No matching characters: if s and t share no characters, the answer is 0. Every cell outside the base column stays at 0.
  • Large counts: the count can grow very large with repeated characters. In Python this is not an issue due to arbitrary precision integers, but in languages like C++ or Java you may need to handle overflow or use modular arithmetic.

From understanding to recall

You now know how to set up the 2D table, why the recurrence adds match and skip branches together, and how the count accumulates as more matching characters appear in s. But reading through this once is not the same as writing it under pressure. In an interview, you need to remember that the first column is 1s (not 0s), that match means addition (not min or max), and that no-match means you only look above (not at the diagonal).

The distinction between this problem and its siblings (LCS, edit distance) is subtle but critical. All three problems share the same table layout and the same character-comparison step. The difference is a single line of code in the recurrence. If you drill all three until the transitions are automatic, you will never mix them up.

The match/skip recurrence is one of roughly 60 reusable building blocks in the CodeBricks system. You practice each block individually with spaced repetition until writing the solution from scratch is second nature. Distinct subsequences, edit distance, and LCS together form the core of the two-string DP family. Locking them in as a group is the fastest path to owning this entire category.

Related posts

  • Edit Distance - Same 2D table, but minimizes transformation cost instead of counting subsequences.
  • Longest Common Subsequence - The foundational two-string DP problem. Maximizes shared subsequence length with the same table shape.