Longest Common Subsequence: 2D String DP
You are given two strings. Find the length of their longest common subsequence. A subsequence keeps the original order of characters but does not have to be contiguous. You can skip characters, but you cannot rearrange them.
This is LeetCode 1143: Longest Common Subsequence, and it is the textbook introduction to 2D string dynamic programming. If you have solved 1D DP problems like Longest Increasing Subsequence or Word Break, this is the natural next step. Instead of one array and one DP dimension, you now have two strings and a 2D DP table.
Why the longest common subsequence matters
The longest common subsequence (LCS) pattern is everywhere. Unix diff uses it to compare files line by line. Git uses it under the hood for merge conflict detection. DNA sequence alignment in bioinformatics is a direct generalization of LCS. And in interviews, LCS is the gateway to a whole family of two-string DP problems.
What makes this problem important is the 2D DP table structure. In 1D DP problems, you fill a single row left to right. Here you fill a grid, and each cell depends on three neighbors: the cell above, the cell to the left, and the cell diagonally above-left. Once you internalize that pattern, you can apply it to Edit Distance, Shortest Common Supersequence, and other two-string comparison problems with almost no extra thought.
The brute force idea
The naive approach is to generate all subsequences of one string and check which ones also appear in the other string. A string of length n has 2^n subsequences, so this is exponential. Not going to work.
A slightly smarter recursion: compare the last characters of both strings. If they match, that character is part of the LCS, and you recurse on both strings with the last character removed. If they do not match, you try two branches: skip the last character of string 1, or skip the last character of string 2. Take the better result.
def lcs_recursive(text1, text2, i, j):
if i == 0 or j == 0:
return 0
if text1[i - 1] == text2[j - 1]:
return 1 + lcs_recursive(text1, text2, i - 1, j - 1)
return max(
lcs_recursive(text1, text2, i - 1, j),
lcs_recursive(text1, text2, i, j - 1),
)
This has overlapping subproblems. The same (i, j) pair gets computed many times. Time complexity is O(2^(m+n)) in the worst case, which is way too slow. The fix: store the results in a 2D table.
The recursive solution is useful for building intuition, but do not submit it on LeetCode. Even with memoization, the bottom-up DP version is cleaner and avoids Python's recursion limit for large inputs.
Building the 2D DP table
Define dp[i][j] as the length of the longest common subsequence of text1[0..i-1] and text2[0..j-1]. The table has (m + 1) rows and (n + 1) columns, where m and n are the lengths of the two strings.
The recurrence has two cases:
- If
text1[i-1] == text2[j-1], the characters match. Take the diagonal value and add 1:dp[i][j] = dp[i-1][j-1] + 1 - If the characters do not match, take the better of skipping one character from either string:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
Base case: the first row and first column are all 0, because the LCS of any string with an empty string is empty.
Let's trace through text1 = "abcde" and text2 = "ace":
Initialize: first row and first column are all 0
Base case: comparing any string against an empty string gives LCS length 0.
Row 1 (text1="a"): text1[0]="a" matches text2[0]="a" at dp[1][1]
"a" == "a", so dp[1][1] = dp[0][0] + 1 = 1. Then dp[1][2] and dp[1][3] carry forward the max of 1.
Row 2 (text1="ab"): "b" does not match any of "a","c","e"
No match for "b". Each cell takes max(cell above, cell to the left). Row stays at 1.
Row 3 (text1="abc"): "c" matches text2[1]="c" at dp[3][2]
"c" == "c", so dp[3][2] = dp[2][1] + 1 = 2. We found a longer common subsequence: "ac".
Row 4 (text1="abcd"): "d" does not match. Values carry forward.
No match for "d". Each cell inherits from its top or left neighbor. Still at LCS length 2.
Row 5 (text1="abcde"): "e" matches text2[2]="e" at dp[5][3]. Done!
"e" == "e", so dp[5][3] = dp[4][2] + 1 = 3. The LCS of "abcde" and "ace" is "ace", length 3.
The final answer lives in dp[m][n], the bottom-right corner of the table. For our example, dp[5][3] = 3, which corresponds to the LCS "ace".
def longest_common_subsequence(text1, text2):
m, n = len(text1), len(text2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if text1[i - 1] == text2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
return dp[m][n]
Time: O(m * n). We fill every cell in the 2D table exactly once.
Space: O(m * n). The full table.
The key insight is the diagonal step. When characters match, you go diagonally (both strings shrink by one character). When they do not match, you go up or left (only one string shrinks). This is what distinguishes 2D string DP from simply running two independent 1D passes.
Optimizing space to O(n)
Notice that each row of the DP table only depends on the current row and the row above it. You do not need to store the entire grid. Instead, keep just two rows: the previous row and the current row.
def longest_common_subsequence(text1, text2):
m, n = len(text1), len(text2)
prev = [0] * (n + 1)
for i in range(1, m + 1):
curr = [0] * (n + 1)
for j in range(1, n + 1):
if text1[i - 1] == text2[j - 1]:
curr[j] = prev[j - 1] + 1
else:
curr[j] = max(prev[j], curr[j - 1])
prev = curr
return prev[n]
Time: O(m * n). Same as before.
Space: O(n). Only two rows of length n + 1.
This is the version you should use in interviews when space matters. You can even reduce to a single row with a temporary variable to save dp[i-1][j-1] before it gets overwritten, but the two-row version is clear enough.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Brute force recursion | O(2^(m+n)) | O(m + n) call stack |
| 2D DP table | O(m * n) | O(m * n) |
| Space-optimized (two rows) | O(m * n) | O(n) |
Where m = length of text1, n = length of text2.
Edge cases to watch for
- One or both strings empty: return 0. The base case handles this.
- Identical strings: the entire string is the LCS. Every cell on the diagonal will match, and
dp[m][n] = m = n. - No common characters: e.g.,
text1 = "abc",text2 = "def". No diagonal steps ever happen.dp[m][n] = 0. - One string is a subsequence of the other: e.g.,
text1 = "abcde",text2 = "ace". The LCS length equals the length of the shorter string. - Repeated characters: e.g.,
text1 = "aaa",text2 = "aa". The LCS is"aa", length 2. The DP table handles this correctly because each match consumes one character from both strings.
The building blocks
This problem is built on two-string DP, the fundamental pattern for comparing two sequences in a 2D table. The structure is:
- State:
dp[i][j]= answer fortext1[0..i-1]andtext2[0..j-1] - Transition: depends on whether
text1[i-1]andtext2[j-1]match - Base case: first row and first column are the "empty string" cases
This exact framework applies to a family of related problems:
- Edit Distance (LeetCode 72): instead of tracking LCS length, track the minimum number of insertions, deletions, and substitutions. The diagonal step means "characters match, no edit needed." The up/left steps mean "delete from one string or insert into the other."
- Shortest Common Supersequence (LeetCode 1092): find the shortest string that has both text1 and text2 as subsequences. Solve LCS first, then reconstruct using the DP table.
- Minimum ASCII Delete Sum (LeetCode 712): instead of maximizing the LCS length, minimize the total ASCII cost of deleted characters. Same 2D table, different transition values.
Once you can fill a 2D table for two strings and reason about diagonal-vs-up-vs-left transitions, you have the building block for all of these. The two-string DP pattern is one of the most reusable bricks in the entire DP category.
From understanding to recall
You have just walked through the complete 2D DP table for the longest common subsequence. The recurrence is simple: match means diagonal plus one, no match means max of up and left. The space optimization is straightforward. But can you write it from scratch in an interview next week?
Understanding the pattern while reading is not the same as producing it under pressure. In an interview, you need to set up the 2D table dimensions, write the correct recurrence without mixing up indices, and explain why the diagonal step works. That fluency comes from spaced repetition, not one-time reading.
The two-string DP pattern is one of roughly 60 reusable building blocks in the CodeBricks system. You drill each block individually until writing the solution from memory is automatic. LCS is the entry point to the whole two-string comparison family, so it is one of the most valuable blocks to lock in early.
Related posts
- Longest Increasing Subsequence - 1D DP on a single array, a good warmup before tackling 2D string DP
- Word Break - Another string DP problem, but with a 1D table and dictionary lookups instead of two-string comparison