String Matching in an Array: Substring Detection
You are given an array of strings words. Return all strings in words that are a substring of another word in the same array. A string words[i] is a substring of words[j] if words[i] can be found inside words[j] and i != j. You can return the answer in any order.
This is LeetCode 1408: String Matching in an Array.
words = ["mass", "as", "hero", "superhero"]
# "as" is a substring of "mass"
# "hero" is a substring of "superhero"
# Result: ["as", "hero"]
The problem boils down to a pairwise check: for every string in the array, see if it appears inside any other string. If it does, include it in the result.
The brute force is good enough
Since the constraints are small (at most 100 words, each up to 30 characters), a nested loop that checks every pair works fine. For each word, you iterate through all other words and use Python's in operator to test if it is a substring. The moment you find a match, you add it and move on.
This O(n^2 * L) approach, where L is the maximum string length, is well within the time limits. There is no need for more advanced string matching algorithms like KMP or Aho-Corasick here.
A sorting optimization
One small improvement is to sort the words by length. A word can only be a substring of a longer (or equal-length) word. So after sorting, you only need to compare each word against the words that come after it in the sorted order. This does not change the worst-case complexity, but it can reduce the number of comparisons in practice since you skip words that are too short to contain the current word.
That said, sorting is optional. The brute force is clean enough for this problem.
The solution
def stringMatching(words):
result = []
for i, w in enumerate(words):
for j, other in enumerate(words):
if i != j and w in other:
result.append(w)
break
return result
Step-by-step walkthrough
Let us walk through the algorithm with words = ["mass", "as", "hero", "superhero"].
Step 1: Check "mass"
"mass" does not appear inside "as", "hero", or "superhero". Skip it.
Step 2: Check "as"
"as" appears inside "mass". Add "as" to the result.
Step 3: Check "hero"
"hero" appears inside "superhero". Add "hero" to the result.
Step 4: Check "superhero"
"superhero" does not appear inside any other word. Skip it. Final result: ["as", "hero"].
For each word, we check whether it appears as a substring inside any other word. As soon as we find a match, we add it to the result and break out of the inner loop to avoid duplicates.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O(n^2 * L) |
| Space | O(n) |
Time: O(n^2 * L). We compare every pair of words, and each substring check takes up to O(L) time where L is the maximum string length. With n words, that gives us O(n^2) pairs, each costing O(L) for the in check.
Space: O(n). The result list can hold at most n words. We use no additional data structures beyond that.
Building blocks
This problem uses two patterns you will see again and again.
Pairwise comparison
When a problem asks you to find elements that have a certain relationship with other elements in the same collection, the go-to approach is nested loops. You check every pair (i, j) where i != j. This same structure appears in problems like Two Sum, finding duplicate pairs, and checking if any two intervals overlap.
Early termination with break
Once you confirm that a word is a substring of some other word, you do not need to keep checking. Breaking out of the inner loop as soon as you find a match prevents duplicate entries in the result and saves unnecessary work. This pattern of "find one witness and stop" shows up in many search and validation problems.
Edge cases to watch for
- All identical strings. If
words = ["a", "a", "a"], every word is a substring of every other word. Each should appear in the result exactly once, not multiple times. Thebreakafter appending handles this. - No substrings at all. If every word is unique and no word appears inside another (e.g.,
["abc", "def", "ghi"]), the result is an empty list. - Single-character words. A word like
"a"is a substring of any word containing the letter"a". Make sure your solution does not skip short words. - A word equal to another. If
words = ["abc", "abc"], then"abc"is a substring of the other"abc". Equal strings count as substrings of each other. - One word in the array. With only one word, there is nothing else to compare against. The result must be empty.
From understanding to recall
You might read through this solution and think it is obvious. But will you remember the clean pattern in two weeks when a similar problem appears? Spaced repetition helps you move from "I understand this" to "I can reproduce this from memory." Revisiting the pairwise comparison pattern and the early-break optimization at increasing intervals locks them in for good.
Related posts
- Find and Replace in String - String manipulation with index tracking
- Repeated Substring Pattern - Substring detection in a single string
- Word Subsets - Comparing strings against multiple criteria