Skip to content
← All posts

Shortest Common Supersequence: LCS-Based String Construction

6 min read
leetcodeproblemhardstringsdynamic-programming

You are given two strings str1 and str2. Return the shortest string that has both str1 and str2 as subsequences. If there are multiple valid answers, return any of them.

This is LeetCode 1092: Shortest Common Supersequence, a hard problem that builds directly on Longest Common Subsequence. The trick is that you are not just computing a length. You need to reconstruct the actual string. That reconstruction step is what makes this problem harder than a typical DP problem.

str1abacstr2cabSCS resultSCScstr2abothbbothastr1cstr1From str1 onlyFrom str2 onlyShared (LCS)
str1="abac", str2="cab". The LCS is "ab". The shortest common supersequence is "cabac" (length 5 = 4 + 3 - 2).

Why this problem matters

Shortest Common Supersequence (SCS) comes up in DNA sequence assembly, where you want the shortest strand that contains multiple fragments. It also appears in merge operations, file diffing, and version control, anywhere you need to combine two sequences while preserving their relative order.

In interviews, this problem tests two things at once. First, can you see the connection to LCS? Second, can you go beyond just computing a DP value and actually reconstruct the answer from the table? Many candidates can fill a DP table but freeze when asked to build the output string. SCS forces you to practice that skill.

The key insight

The shortest common supersequence and the longest common subsequence are two sides of the same coin. The LCS tells you which characters the two strings share. Everything else needs to be added to the result. The formula is:

SCS length = len(str1) + len(str2) - len(LCS)

Why? The LCS characters appear in both strings. In the supersequence, you only need to include them once instead of twice. So you start with the total length of both strings and subtract the LCS length to avoid double-counting.

The construction algorithm works like this: compute the LCS, then walk through it character by character. For each LCS character, drain any non-LCS characters from both strings first, then add the shared character once. After the LCS is exhausted, append whatever remains from either string.

The solution

def shortest_common_supersequence(str1: str, str2: str) -> str:
    m, n = len(str1), len(str2)
    dp = [[""] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if str1[i - 1] == str2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + str1[i - 1]
            else:
                dp[i][j] = dp[i - 1][j] if len(dp[i - 1][j]) > len(dp[i][j - 1]) else dp[i][j - 1]

    lcs = dp[m][n]
    result = []
    i, j = 0, 0

    for c in lcs:
        while i < m and str1[i] != c:
            result.append(str1[i])
            i += 1
        while j < n and str2[j] != c:
            result.append(str2[j])
            j += 1
        result.append(c)
        i += 1
        j += 1

    result.append(str1[i:])
    result.append(str2[j:])
    return "".join(result)

The first half of this function builds a DP table where each cell stores the actual LCS string (not just its length). This is a variation of the standard LCS algorithm. Instead of storing integers, we store the subsequence itself.

The DP recurrence works like this: if the characters at position i in str1 and position j in str2 match, that character extends the LCS from dp[i-1][j-1]. If they do not match, we take whichever of dp[i-1][j] or dp[i][j-1] gives the longer LCS so far.

Once we have the full LCS string, the second half constructs the supersequence. We maintain two pointers i and j into str1 and str2. For each character c in the LCS, we first drain all characters from str1 that come before c (appending them to the result), then drain all characters from str2 that come before c. Then we append c itself and advance both pointers past it.

After processing every LCS character, there may be leftover characters in one or both strings. We append those at the end. The result is the shortest string that contains both str1 and str2 as subsequences.

The LCS is the foundation for the SCS. If you can compute the LCS of two strings, you can build the SCS by interleaving the non-LCS characters around the shared ones. This pattern of "compute a DP value, then reconstruct the answer" appears in many problems, including Edit Distance backtracking and sequence alignment.

Visual walkthrough

Let's trace through str1 = "abac" and str2 = "cab" step by step. The LCS of these two strings is "ab", and the resulting SCS is "cabac".

Step 1: Build the LCS DP table for str1="abac" and str2="cab"

""cab""abac""""""""""""aa""""aab""""aab""ccaab

Each cell dp[i][j] stores the actual LCS string of str1[0..i-1] and str2[0..j-1]. The final cell gives LCS = "ab".

Step 2: Use the LCS to guide supersequence construction

LCS:abWalk through LCS characters one by one. For each LCS character, first drain any non-LCS characters from both strings.

The LCS "ab" tells us which characters are shared. We iterate through it to build the SCS.

Step 3: Process LCS character "a"

str1:abacstr2:cabresult:ca

Before reaching "a" in str2, we drain "c" (not in LCS). Then we add the shared "a" once. Result so far: "ca".

Step 4: Process LCS character "b"

str1:abacstr2:cabresult:cab

No extra characters to drain before "b" in either string. Add shared "b" once. Result so far: "cab".

Step 5: LCS exhausted, append remaining characters from both strings

str1:abacstr2:cabdoneresult:cabac

str2 is fully consumed. Append remaining str1 characters "ac". Result so far: "cabac".

Step 6: Final shortest common supersequence

SCS:cstr2aLCSbLCSastr1cstr1Length = 4 + 3 - 2 = 5. Both "abac" and "cab" are subsequences of "cabac".

The SCS "cabac" contains str1 as subsequence (c-a-b-a-c: positions 1,2,3,4) and str2 as subsequence (c-a-b: positions 0,1,2).

Complexity analysis

ApproachTimeSpace
LCS + reconstructionO(m * n * min(m, n))O(m * n * min(m, n))

Where m = length of str1 and n = length of str2. The extra min(m, n) factor comes from storing actual LCS strings in the DP table rather than just integers. Each string can be up to min(m, n) characters long, and string concatenation at each cell takes that much time.

If you need better performance, you can build a standard integer LCS table first, then reconstruct the LCS by backtracking through the table. That gives O(m * n) time and space for the DP phase, plus O(m + n) for the reconstruction pass. For interview purposes, the string-storing approach shown above is simpler to implement and explain.

The building blocks

LCS dynamic programming

The core of this problem is the Longest Common Subsequence DP. You build a 2D table where dp[i][j] represents the LCS of the first i characters of str1 and the first j characters of str2. The recurrence compares characters: if they match, extend the diagonal. If they do not match, take the better of the cell above or the cell to the left. This is the same table structure used in Edit Distance and other two-string DP problems.

DP backtracking and reconstruction

Many DP problems ask you to return the optimal value. This one asks you to return the actual string. The technique is the same in all cases: after filling the table, walk backward (or forward through the LCS) to reconstruct the answer. You use the DP values to decide which direction to go at each step. This reconstruction skill transfers directly to problems like Edit Distance (finding the sequence of operations) and knapsack variants (finding which items to include).

Interleaving from LCS

The final piece is the interleaving step. Once you have the LCS, you walk through it and weave in the non-LCS characters from both strings. This is similar in spirit to the merge step in merge sort, or to the construction in Interleaving String. You maintain pointers into both input strings and advance them in coordination with the LCS characters.

Edge cases

  • Identical strings: If str1 == str2, the LCS is the entire string, and the SCS is just str1 itself. Length = n.
  • No common characters: If the two strings share no characters, the LCS is empty, and the SCS is simply str1 + str2. Length = m + n.
  • One string is a subsequence of the other: The longer string is already the SCS. For example, str1 = "abc" and str2 = "ac" gives SCS = "abc".
  • Empty string: If either string is empty, the SCS is the other string. The DP table handles this through the base cases where dp[0][j] and dp[i][0] are all empty strings.
  • Single characters: str1 = "a", str2 = "a" gives SCS = "a". If str1 = "a", str2 = "b", the SCS is "ab" (or "ba").

From understanding to recall

You now know how the SCS connects to LCS, how to build the DP table with actual strings, and how to interleave non-LCS characters to construct the result. But knowing the approach is different from writing it under pressure. In an interview, you need to set up the 2D table, get the LCS recurrence right, and then implement the pointer-based reconstruction without mixing up which pointer belongs to which string.

This problem combines two skills that trip candidates up individually: two-string DP and answer reconstruction. Practicing them together builds the kind of fluency that makes hard DP problems feel manageable. The SCS pattern is one of roughly 60 reusable building blocks in the CodeBricks system. You drill each block with spaced repetition until the solution flows naturally from memory.

Related posts