Find the Shortest Superstring: Bitmask DP on Overlaps

May 25, 20267 min read

leetcodeproblemhardstringsdynamic-programmingbit-manipulation

You are given an array of strings words. Return the shortest string that contains every string in words as a substring. If multiple valid answers exist, return any of them.

This is LeetCode 943: Find the Shortest Superstring. For example, given words = ["alex", "loves", "leetcode"], one valid answer is "alexlovesleetcode". But when words share overlapping prefixes and suffixes, you can do much better. Given words = ["catg", "ctaagt", "gcta", "ttca", "atgcatc"], the shortest superstring is "gctaagttcatgcatc" with length 16 instead of the 26 characters you would get from naive concatenation.

The key challenge is figuring out the optimal order in which to concatenate the words. Each ordering produces a different total length because different pairs of words overlap by different amounts. This is fundamentally a variant of the Travelling Salesman Problem (TSP).

words = ["abc", "bcd", "cde"]. Overlap matrix shows how much the end of one word matches the start of another. Chain: abc + bcd (overlap 2) + cde (overlap 2) = "abcde".

Why this problem matters

The shortest superstring problem is NP-hard in general. For arbitrary inputs, no polynomial-time algorithm is known. But the constraint on LeetCode is small: n (the number of words) is at most 12. That makes bitmask DP viable, since 2^12 = 4096 is perfectly manageable.

This problem sits at the intersection of three important patterns: bitmask DP for subset enumeration, greedy overlap computation between strings, and TSP-style state transitions. If you understand how to combine these three ideas, you unlock a whole family of "visit all items in optimal order" problems. The bitmask tracks which words you have used so far, and the DP transition tries appending each unused word, picking the one that maximizes overlap (and therefore minimizes total length).

The brute force approach

The most direct approach is to try every permutation of the words. For each permutation, greedily merge adjacent words using their overlaps, and track the shortest result.

from itertools import permutations

def shortestSuperstring(words):
    n = len(words)
    overlap = [[0] * n for _ in range(n)]
    for i in range(n):
        for j in range(n):
            if i != j:
                for k in range(min(len(words[i]), len(words[j])), 0, -1):
                    if words[i].endswith(words[j][:k]):
                        overlap[i][j] = k
                        break

    best = "".join(words)
    for perm in permutations(range(n)):
        current = words[perm[0]]
        for idx in range(1, n):
            prev, nxt = perm[idx - 1], perm[idx]
            current += words[nxt][overlap[prev][nxt]:]
        if len(current) < len(best):
            best = current
    return best

This runs in O(n! * n) time, which is only feasible for tiny inputs. With 12 words, 12! is about 479 million, far too slow. We need a smarter approach.

The key insight: bitmask DP on overlaps

Instead of trying every permutation explicitly, we can use dynamic programming where the state is a bitmask representing which words have been included, plus which word we ended with.

Define dp[mask][i] as the shortest superstring that uses exactly the set of words indicated by mask and ends with word i. The bitmask mask has bit i set if word i has been included.

The transition: to build dp[mask | (1 << j)][j], take the existing superstring dp[mask][i] and append word j, but only add the characters of word j that are not already covered by the overlap with word i. The appended suffix is words[j][overlap[i][j]:].

This is exactly the TSP structure. Each word is a "city." The "distance" from word i to word j is len(words[j]) - overlap[i][j], which is the number of new characters you must add. Minimizing total distance gives the shortest superstring.

Think of this as the Travelling Salesman Problem where "distance" is the non-overlapping suffix length of each word. Minimizing total distance gives the shortest superstring.

Walking through it step by step

Step 1: Compute the overlap matrix for all pairs

overlap[i][j] = length of longest suffix of words[i] that matches a prefix of words[j]. For example, "ab" ends with "b" and "bc" starts with "b", so overlap[0][1] = 1.

Step 2: Initialize DP base cases (one word per mask)

dp[mask][i] stores the shortest superstring using the words indicated by mask, ending with word i. Start with each word alone: dp[001][0] = "ab", dp[010][1] = "bc", dp[100][2] = "cd".

Step 3: Expand masks by adding one word at a time

From dp[001][0] = "ab", add word 1 ("bc") with overlap 1. New string: "ab" + "c" = "abc". So dp[011][1] = "abc" (length 3). Also try dp[001][0] + word 2: overlap[0][2]=0, so dp[101][2] = "abcd" (length 4).

Step 4: Full mask = 111, all words included

dp[111][2] is the best. Starting from dp[011][1] = "abc", add word 2 ("cd") with overlap[1][2]=1. Result: "abc" + "d" = "abcd" (length 4). This is the shortest superstring containing "ab", "bc", and "cd".

Step 5: Pick the shortest, reconstruct the path

The shortest entry at full mask is dp[111][2] = "abcd". The path was word 0 ("ab") then word 1 ("bc") then word 2 ("cd"). Each step appended only the non-overlapping suffix.

Practice this problem with spaced repetition

The bitmask DP solution

def shortestSuperstring(words):
    n = len(words)
    overlap = [[0] * n for _ in range(n)]
    for i in range(n):
        for j in range(n):
            if i != j:
                for k in range(min(len(words[i]), len(words[j])), 0, -1):
                    if words[i].endswith(words[j][:k]):
                        overlap[i][j] = k
                        break

    dp = [[""] * n for _ in range(1 << n)]
    for i in range(n):
        dp[1 << i][i] = words[i]

    for mask in range(1, 1 << n):
        for last in range(n):
            if not (mask & (1 << last)):
                continue
            if not dp[mask][last]:
                continue
            for nxt in range(n):
                if mask & (1 << nxt):
                    continue
                new_mask = mask | (1 << nxt)
                candidate = dp[mask][last] + words[nxt][overlap[last][nxt]:]
                if not dp[new_mask][nxt] or len(candidate) < len(dp[new_mask][nxt]):
                    dp[new_mask][nxt] = candidate

    full = (1 << n) - 1
    return min(dp[full], key=len)

The algorithm has three phases. First, precompute the overlap between every pair of words. Second, fill the DP table bottom-up, starting from single-word masks and expanding to larger subsets. Third, look at all entries where mask equals the full set and return the shortest one.

Complexity analysis

Metric	Value
Time	O(2^n * n^2), iterating over all masks and pairs
Space	O(2^n * n), storing DP strings for each state

The time complexity comes from iterating over all 2^n masks, and for each mask checking all n possible last words and all n possible next words. The space stores a string for every (mask, last) pair. For n = 12, this is 4096 * 12 = 49,152 entries, which fits comfortably in memory.

You can optimize space by storing only the length in the DP table and reconstructing the actual string at the end using a parent pointer array. This avoids storing full strings at every state but adds complexity to the reconstruction step.

Building blocks

1. Bitmask DP

The core pattern here is using an integer bitmask to represent a subset of items. Bit i is set if item i has been included. This lets you enumerate all 2^n subsets and transition between them by flipping bits. The DP state is (mask, last_item), and transitions try adding each unused item.

This same pattern appears in problems like the Travelling Salesman Problem, Partition to K Equal Sum Subsets, and any problem where you need to find the optimal ordering or grouping of a small number of items (typically n up to 20).

2. Overlap precomputation

Before running the DP, you precompute overlap[i][j] for every pair (i, j). This is the length of the longest suffix of words[i] that matches a prefix of words[j]. You check all possible overlap lengths from the maximum down to 1, and stop at the first match.

This precomputation runs in O(n^2 * L) time where L is the maximum word length. It converts the problem from string manipulation into simple integer lookups during the DP phase.

Edge cases

One word: if n = 1, the answer is just words[0]. The DP handles this naturally since the only mask is 1 and the only entry is the word itself.
No overlaps: if no word overlaps with any other word, the shortest superstring is just the concatenation of all words in any order. The DP will produce this result since every transition appends the full word.
One word is a substring of another: the problem guarantees that no word in the input is a substring of another. This constraint simplifies the problem because you never need to "skip" a word that is already fully contained.
All words identical: this cannot happen given the substring constraint, but if words share long overlaps, the DP correctly identifies and exploits them.
Maximum input size: with n = 12, the DP table has 4096 * 12 entries. Each entry stores a string, so memory usage depends on the total string lengths. In practice, this is well within limits.

Build your pattern library with CodeBricks

Common mistakes

Forgetting asymmetric overlaps. The overlap of word i into word j is not the same as word j into word i. overlap[i][j] measures how the end of words[i] matches the start of words[j]. Make sure to compute both directions separately.

Initializing the DP incorrectly. Each single-word mask needs to store the full word as its base case. If you initialize with empty strings or forget to set the base cases, the transitions will produce wrong results.

Not checking all final states. The answer is not necessarily dp[full][n-1]. You need to check dp[full][i] for all i and return the shortest one, since the optimal superstring could end with any word.

Using greedy instead of DP. A greedy approach that always picks the largest overlap first does not guarantee the optimal result. It can get stuck in locally good choices that lead to a globally longer superstring.

From understanding to recall

You now know the three-phase structure: precompute overlaps, fill the bitmask DP table, and extract the shortest result from the full mask. The TSP analogy makes the problem intuitive, as you are visiting every word exactly once in an order that minimizes the total "travel distance" (non-overlapping suffix length).

The tricky part in interviews is remembering the DP state definition and the transition. The state is (mask, last_word). The transition appends a new word and uses the precomputed overlap to avoid redundant characters. If you can recall these two pieces, the rest of the code follows directly.

Bitmask DP on subsets is one of roughly 60 reusable building blocks in the CodeBricks system. You practice each block individually with spaced repetition until writing the solution from scratch is second nature. Problems like this one, Travelling Salesman, and Partition to K Equal Sum Subsets all share the same bitmask DP skeleton. Locking in the pattern once means you can apply it across all of them.

Word Break - String DP with substring matching
Edit Distance - String DP measuring transformation cost
Longest Common Subsequence - String DP on two sequences

Try CodeBricks free