Skip to content
← All posts

String Matching in an Array: Substring Detection

4 min read
leetcodeproblemeasyarraysstrings

You are given an array of strings words. Return all strings in words that are a substring of another word in the same array. A string words[i] is a substring of words[j] if words[i] can be found inside words[j] and i != j. You can return the answer in any order.

This is LeetCode 1408: String Matching in an Array.

words = ["mass", "as", "hero", "superhero"]

# "as" is a substring of "mass"
# "hero" is a substring of "superhero"
# Result: ["as", "hero"]
words"mass""as""hero""superhero"substring of "mass"substring of "superhero"
words = ["mass", "as", "hero", "superhero"]. The green cells ("as" and "hero") are substrings of other words in the array.

The problem boils down to a pairwise check: for every string in the array, see if it appears inside any other string. If it does, include it in the result.

The brute force is good enough

Since the constraints are small (at most 100 words, each up to 30 characters), a nested loop that checks every pair works fine. For each word, you iterate through all other words and use Python's in operator to test if it is a substring. The moment you find a match, you add it and move on.

This O(n^2 * L) approach, where L is the maximum string length, is well within the time limits. There is no need for more advanced string matching algorithms like KMP or Aho-Corasick here.

A sorting optimization

One small improvement is to sort the words by length. A word can only be a substring of a longer (or equal-length) word. So after sorting, you only need to compare each word against the words that come after it in the sorted order. This does not change the worst-case complexity, but it can reduce the number of comparisons in practice since you skip words that are too short to contain the current word.

That said, sorting is optional. The brute force is clean enough for this problem.

The solution

def stringMatching(words):
    result = []
    for i, w in enumerate(words):
        for j, other in enumerate(words):
            if i != j and w in other:
                result.append(w)
                break
    return result

Step-by-step walkthrough

Let us walk through the algorithm with words = ["mass", "as", "hero", "superhero"].

Step 1: Check "mass"

checking:"mass"in "as"?noin "hero"?noin "superhero"?no✗ "mass" is not a substring of any other word

"mass" does not appear inside "as", "hero", or "superhero". Skip it.

Step 2: Check "as"

checking:"as"in "mass"?yesin "hero"?noin "superhero"?no✓ "as" added to result

"as" appears inside "mass". Add "as" to the result.

Step 3: Check "hero"

checking:"hero"in "mass"?noin "as"?noin "superhero"?yes✓ "hero" added to result

"hero" appears inside "superhero". Add "hero" to the result.

Step 4: Check "superhero"

checking:"superhero"in "mass"?noin "as"?noin "hero"?no✗ "superhero" is not a substring of any other word

"superhero" does not appear inside any other word. Skip it. Final result: ["as", "hero"].

For each word, we check whether it appears as a substring inside any other word. As soon as we find a match, we add it to the result and break out of the inner loop to avoid duplicates.

Complexity analysis

MetricValue
TimeO(n^2 * L)
SpaceO(n)

Time: O(n^2 * L). We compare every pair of words, and each substring check takes up to O(L) time where L is the maximum string length. With n words, that gives us O(n^2) pairs, each costing O(L) for the in check.

Space: O(n). The result list can hold at most n words. We use no additional data structures beyond that.

Building blocks

This problem uses two patterns you will see again and again.

Pairwise comparison

When a problem asks you to find elements that have a certain relationship with other elements in the same collection, the go-to approach is nested loops. You check every pair (i, j) where i != j. This same structure appears in problems like Two Sum, finding duplicate pairs, and checking if any two intervals overlap.

Early termination with break

Once you confirm that a word is a substring of some other word, you do not need to keep checking. Breaking out of the inner loop as soon as you find a match prevents duplicate entries in the result and saves unnecessary work. This pattern of "find one witness and stop" shows up in many search and validation problems.

Edge cases to watch for

  • All identical strings. If words = ["a", "a", "a"], every word is a substring of every other word. Each should appear in the result exactly once, not multiple times. The break after appending handles this.
  • No substrings at all. If every word is unique and no word appears inside another (e.g., ["abc", "def", "ghi"]), the result is an empty list.
  • Single-character words. A word like "a" is a substring of any word containing the letter "a". Make sure your solution does not skip short words.
  • A word equal to another. If words = ["abc", "abc"], then "abc" is a substring of the other "abc". Equal strings count as substrings of each other.
  • One word in the array. With only one word, there is nothing else to compare against. The result must be empty.

From understanding to recall

You might read through this solution and think it is obvious. But will you remember the clean pattern in two weeks when a similar problem appears? Spaced repetition helps you move from "I understand this" to "I can reproduce this from memory." Revisiting the pairwise comparison pattern and the early-break optimization at increasing intervals locks them in for good.

Related posts