Longest Common Prefix: Column-by-Column Comparison
LeetCode 14, Longest Common Prefix, asks you to find the longest string that is a prefix of every string in an array. If the strings share nothing in common at the start, return an empty string.
For example, given ["flower", "flow", "flight"], the answer is "fl". All three strings start with "fl", but at position 2 the characters diverge ('o', 'o', 'i').
The approach: vertical scanning
The cleanest way to think about this problem is column by column. Line the strings up vertically and scan from left to right. At each column (character position), check whether every string has the same character. The moment any string has a different character, or a string runs out of characters entirely, you stop. Everything to the left of that column is the common prefix.
This is called vertical scanning because you are reading down each column rather than across each row.
Here is why this works: the common prefix can only be as long as the shortest string. And the first point of disagreement across any column kills the prefix right there. By scanning left to right and stopping at the first mismatch, you never do unnecessary work.
You can use any string as the reference for column iteration. The first string, strs[0], is the natural choice. If the prefix turns out to be shorter than strs[0], you will hit a mismatch or an out-of-bounds condition before finishing.
The code
def longest_common_prefix(strs):
if not strs:
return ""
for i in range(len(strs[0])):
char = strs[0][i]
for s in strs[1:]:
if i >= len(s) or s[i] != char:
return strs[0][:i]
return strs[0]
Walk through what happens:
- If the input array is empty, return
"". - Loop through each index
iof the first string. - For each index, grab the character
strs[0][i]and compare it againsts[i]for every other string. - If any string is too short (
i >= len(s)) or has a different character (s[i] != char), return everything up to but not including positioni. - If you make it through the entire first string without a mismatch, the first string itself is the common prefix.
Visual walkthrough
Let's trace through ["flower", "flow", "flight"] one column at a time.
Step 1: Check column 0
All three characters are 'f'. They match. Add 'f' to the prefix.
Step 2: Check column 1
All three characters are 'l'. They match. Add 'l' to the prefix.
Step 3: Check column 2
'o', 'o', 'i'. Not all the same. Stop immediately.
Step 4: Result
We stopped at column 2 because the characters diverged. Everything before that column is the common prefix: "fl"
The algorithm checked exactly 3 columns before finding the mismatch. It never looked at positions 3, 4, or 5 of any string. That early termination is the key to the approach's efficiency.
Alternative approaches
Vertical scanning is not the only way to solve this problem. Here are three other approaches you might see.
Horizontal scanning
Compare the first two strings to find their common prefix. Then compare that result with the third string to narrow it down further. Repeat for every string in the array.
This works fine, but it processes strings one at a time rather than character positions one at a time. If the first two strings are very long but share a short prefix with the third, you do extra comparison work up front that gets thrown away.
Sort first, then compare endpoints
Sort the array of strings lexicographically. After sorting, the first and last strings are the most different. The common prefix of just those two strings is the common prefix of the entire array.
This is clever, but sorting costs O(S log n) where S is the total number of characters and n is the number of strings. For most inputs, that is more expensive than a single linear scan.
Binary search on prefix length
Binary search on the length of the prefix. For a given candidate length, check if all strings share a prefix of that length. If they do, try a longer length. If not, try shorter.
This runs in O(S log m) where m is the length of the shortest string. It is overkill for this problem but useful to know as a technique.
Why vertical scanning wins
Vertical scanning is the simplest and most intuitive approach. It maps directly to how you would solve the problem by hand: line up the strings, read down each column, and stop when you see a difference. The code is short, there is nothing to sort or binary search over, and the time complexity is optimal.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O(S), where S is the sum of all characters across all strings |
| Space | O(1), no extra data structures |
In the worst case (all strings are identical), you visit every character exactly once. In the best case (first characters differ), you visit only n characters where n is the number of strings.
Building blocks
This problem uses two fundamental techniques that appear across many string and array problems.
Character-by-character comparison
The core of vertical scanning is comparing characters at the same index across multiple strings. This same idea shows up in problems like checking if two strings are anagrams, finding mismatches between strings, and comparing sorted string representations.
Early termination
The algorithm stops the moment it finds a mismatch or a string that is too short. You do not finish scanning all columns and then figure out the answer. You return as soon as you have enough information. This pattern is critical in search problems, validation checks, and any problem where you can short-circuit a loop.
Edge cases
Single string. If the array contains exactly one string, that string is the entire common prefix. The inner loop never executes, and you return strs[0].
Empty string in the array. If any string is "", the answer is "" immediately. On the first iteration (i = 0), the check i >= len(s) triggers for the empty string and returns strs[0][:0], which is "".
All identical strings. If every string is the same, the loop runs through all characters of strs[0] without finding a mismatch. The function returns strs[0], which is correct.
No common prefix at all. If the first characters already differ (like ["dog", "racecar", "car"]), the function returns "" on the very first iteration. The check at i = 0 finds that 'd' != 'r' and returns strs[0][:0].
From understanding to recall
You can read this solution and follow every step. But reading is not the same as being able to reproduce it from scratch two weeks from now. The details that slip away are the small ones: iterating over the first string as your reference, the i >= len(s) boundary check, returning strs[0][:i] to slice up to but not including the mismatch position.
Spaced repetition fixes this. Instead of re-reading the full solution, you drill the individual building blocks (column comparison, early return on mismatch) at increasing intervals. Each review takes a minute or two, and after a few reps the pattern is locked in long-term memory. When you sit down for an interview and see a prefix or string comparison problem, the pieces are already there.
Related posts
- Valid Palindrome - Another easy string problem that uses character-by-character comparison
- Longest Substring Without Repeating Characters - A medium string problem that builds on scanning techniques