Skip to content
← All posts

Delete Operation for Two Strings: Minimum Deletions via LCS

7 min read
leetcodeproblemmediumstringsdynamic-programming

You are given two strings word1 and word2. Return the minimum number of steps required to make the two strings the same. In each step, you can delete one character from either string.

This is LeetCode 583: Delete Operation for Two Strings. For example, word1 = "sea" and word2 = "eat" returns 2. You delete 's' from "sea" and 't' from "eat", leaving "ea" in both. If you have already solved Longest Common Subsequence or Edit Distance, you will recognize the 2D table structure immediately. The difference is that the only operation here is deletion, no insertions or replacements.

""eat""sea0123123421233212del w1del w2Delete from word1 (above)Delete from word2 (left)Match (free)Answer
2D DP table for word1="sea", word2="eat". Green cells are free diagonal matches (characters equal). Arrows at dp[1][1] show the two delete options. Answer: dp[3][3] = 2.

The problem

You want to find the minimum total deletions from both strings so that what remains is identical. Think of it this way: whatever characters survive the deletions must appear in the same order in both strings. That surviving string is exactly the longest common subsequence (LCS) of word1 and word2.

If the LCS has length L, then you need to delete len(word1) - L characters from word1 and len(word2) - L characters from word2. The total deletions are len(word1) + len(word2) - 2 * L.

The approach

There are two clean ways to solve this problem.

Approach 1: Find the LCS, then compute deletions. Run the standard LCS algorithm on both strings. The answer is m + n - 2 * LCS(word1, word2).

Approach 2: Direct DP for minimum deletions. Define dp[i][j] as the minimum number of deletions to make word1[:i] and word2[:j] equal. This avoids the subtraction step and gives you the answer directly.

The key insight: the characters that survive deletion must form a common subsequence of both strings. To minimize deletions, you maximize what survives. That is exactly the LCS. So this problem reduces to: total length minus twice the LCS length.

Both approaches have the same time and space complexity. The direct DP is slightly more intuitive for this specific problem, so we will walk through that one.

The solution

The direct DP recurrence:

  • If word1[i-1] == word2[j-1]: characters match, no deletion needed. dp[i][j] = dp[i-1][j-1]
  • If word1[i-1] != word2[j-1]: delete from whichever side is cheaper. dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1])

Base cases:

  • dp[i][0] = i for all i. To match an empty string, delete all i characters from word1.
  • dp[0][j] = j for all j. To match an empty string, delete all j characters from word2.
def min_distance(word1, word2):
    m, n = len(word1), len(word2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    # Base cases
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    # Fill the table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = 1 + min(
                    dp[i - 1][j],   # delete from word1
                    dp[i][j - 1],   # delete from word2
                )

    return dp[m][n]

Step-by-step walkthrough

Let's trace through word1 = "sea" and word2 = "eat":

Initialize: base cases are the distance from the empty string

""eat""sea0123100020003000

dp[i][0] = i (delete all i characters from word1). dp[0][j] = j (delete all j characters from word2). Each base cell equals its row or column index.

Row 1 (word1="s"): "s" != "e", "s" != "a", "s" != "t"

""eat""sea0123123420003000del w1del w2

"s" != "e": dp[1][1] = 1 + min(dp[0][1], dp[1][0]) = 1 + min(1, 1) = 2. No matches in this row, so costs keep climbing.

Row 2 (word1="se"): "e" == "e" at dp[2][1], free diagonal match!

""eat""sea0123123421233000match

"e" == "e": dp[2][1] = dp[1][0] = 1. Characters match, so we take the diagonal with no extra cost. Delete just the "s".

Row 3 (word1="sea"): "a" == "a" at dp[3][2], another free match!

""eat""sea0123123421233212match

"a" == "a": dp[3][2] = dp[2][1] = 1. dp[3][3]: "a" != "t", so dp[3][3] = 1 + min(dp[2][3], dp[3][2]) = 1 + min(3, 1) = 2. Answer: 2 deletions.

Final answer: dp[3][3] = 2. Delete "s" from "sea" and "t" from "eat".

""eat""sea0123123421233212

The LCS of "sea" and "eat" is "ea" (length 2). Total deletions = 3 + 3 - 2 * 2 = 2. Delete "s" from word1, delete "t" from word2, leaving "ea" in both.

The answer is dp[3][3] = 2. Delete "s" from "sea" and "t" from "eat". What remains is "ea" in both strings.

The LCS-based approach

You can also solve this by computing the LCS first, then subtracting. Here is the alternative:

def min_distance_lcs(word1, word2):
    m, n = len(word1), len(word2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    lcs_len = dp[m][n]
    return m + n - 2 * lcs_len

Both approaches fill the same size table and run in the same time. The direct deletion DP tracks the cost you want to minimize. The LCS approach tracks the value you want to maximize, then converts at the end. Pick whichever feels more natural to you.

Optimizing space to O(n)

Each row in the DP table only depends on the current row and the previous row. You can drop the full grid down to two arrays:

def min_distance(word1, word2):
    m, n = len(word1), len(word2)
    prev = list(range(n + 1))

    for i in range(1, m + 1):
        curr = [0] * (n + 1)
        curr[0] = i
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                curr[j] = prev[j - 1]
            else:
                curr[j] = 1 + min(prev[j], curr[j - 1])
        prev = curr

    return prev[n]

Time: O(m * n). Same as the full table.

Space: O(n). Two rows of length n + 1.

Complexity analysis

ApproachTimeSpace
Brute force recursionO(2^(m+n))O(m + n) call stack
2D DP table (direct or LCS)O(m * n)O(m * n)
Space-optimized (two rows)O(m * n)O(n)

Where m = length of word1, n = length of word2.

The building blocks

1. Two-string DP table

This problem uses the same (m+1) x (n+1) table structure as LCS, edit distance, and distinct subsequences. The state is dp[i][j] representing some relationship between word1[0..i-1] and word2[0..j-1]. The transition checks whether the current characters match.

The key difference from edit distance: there is no replace operation. You can only delete from one side or the other. That means the recurrence only considers the cell above (delete from word1) and the cell to the left (delete from word2), never the diagonal plus one for a replacement. When characters match, you take the diagonal as-is, same as edit distance.

2. LCS reduction

Many problems in the two-string DP family can be reduced to LCS. If you know the LCS length, you know what survives. Everything else gets deleted, inserted, or replaced depending on the problem. For this problem, the reduction is clean: answer = m + n - 2 * LCS. No insertions, no replacements, just count the characters that do not appear in the common subsequence.

This same reduction appears in Shortest Common Supersequence, where the answer is m + n - LCS (you keep everything, but shared characters only appear once). Recognizing these LCS reductions saves you from re-deriving recurrences from scratch.

Edge cases

  • Identical strings: if word1 == word2, the answer is 0. Every character matches, LCS equals the full length, and no deletions are needed.
  • Completely different strings: if word1 and word2 share no characters, the LCS is 0, and you delete everything from both. The answer is m + n.
  • One string is empty: dp[i][0] = i and dp[0][j] = j. You must delete all characters from the non-empty string.
  • One string is a subsequence of the other: e.g., word1 = "abc", word2 = "ac". The LCS is "ac" (length 2), so the answer is 3 + 2 - 2 * 2 = 1. Delete only the character not in the subsequence.
  • Single character strings: word1 = "a", word2 = "a" returns 0. word1 = "a", word2 = "b" returns 2 (delete both).

From understanding to recall

You now know how to fill the deletion DP table, why this problem reduces to LCS, and how the recurrence differs from edit distance (no replace, only two neighbors instead of three). But reading through this once is not the same as writing it under pressure. In an interview, you need to set up the (m+1) x (n+1) table from memory, nail the base cases where dp[i][0] = i and dp[0][j] = j, and explain why a match means zero cost (diagonal, no increment).

The delete operation problem sits right between LCS and edit distance in difficulty. It uses the same table as both, but with a simpler recurrence than edit distance (two neighbors instead of three) and a more direct question than LCS (minimize deletions instead of maximize subsequence length). If you can solve all three without hesitation, you own the entire two-string DP family.

The two-string DP building block is one of roughly 60 reusable patterns in the CodeBricks system. You practice each block individually with spaced repetition until writing the solution from scratch is second nature. Delete operation, LCS, and edit distance together form the core of the two-string comparison family, so locking them in as a group pays dividends across the entire DP category.

Related posts