Last Substring in Lexicographical Order: Two-Pointer Suffix Comparison
You are given a string s and need to return its last substring in lexicographical order. At first glance, this seems like it might require generating all substrings. But there is a key observation that collapses the problem: the lexicographically largest substring of any string is always a suffix. Why? Because you can always extend a substring to the right to make it at least as large (or larger) in lexicographic order. So the problem reduces to finding the lexicographically largest suffix.
This is LeetCode 1163: Last Substring in Lexicographical Order.
Why this problem matters
This problem teaches two valuable skills. First, it trains you to recognize when a problem can be reduced to something simpler. The jump from "find the largest substring" to "find the largest suffix" is the kind of reasoning that shows up across many interview problems. Second, it forces you to develop a linear-time comparison strategy using two pointers on strings, a technique that transfers to other suffix-based problems.
The key insight
Since the answer must be a suffix, you only have n candidates: s[0:], s[1:], s[2:], and so on through s[n-1:]. Comparing all pairs naively would be O(n^2), but you can do better with a two-pointer approach.
Maintain two pointers i and j that represent the starting indices of two candidate suffixes. Use a third variable k to track how many characters have matched between s[i:] and s[j:] so far. At each step, compare s[i+k] and s[j+k]:
- If they are equal, increment
kand keep comparing. - If
s[i+k] > s[j+k], the suffix atjis strictly worse. Skipjforward byk+1positions (all positions betweenjandj+kproduce suffixes that also lose tos[i:]). Resetkto 0. - If
s[i+k] < s[j+k], the suffix atiis strictly worse. Moveitomax(i+k+1, j)so it never falls behindj, then setj = i+1and resetkto 0.
When j+k reaches the end of the string, i holds the starting index of the largest suffix.
The solution
def lastSubstring(s: str) -> str:
i, j, k = 0, 1, 0
n = len(s)
while j + k < n:
if s[i + k] == s[j + k]:
k += 1
elif s[i + k] > s[j + k]:
j = j + k + 1
k = 0
else:
i = max(i + k + 1, j)
j = i + 1
k = 0
return s[i:]
The loop runs until j+k goes past the end of the string. At that point, s[j:] has either been fully compared or eliminated. The surviving candidate at index i is the answer.
The max(i + k + 1, j) in the else branch is important. When s[i:] loses to s[j:], you know that every suffix starting between i and i+k also loses. But i must never go below j, because j is the current winner and the new i needs to be a fresh candidate.
Why is checking suffixes sufficient? Any substring of s that is not a suffix can be extended to the right. Appending more characters from the original string never makes the result lexicographically smaller (it either stays the same or gets larger). So the lexicographically largest substring always extends all the way to the end of s, making it a suffix.
Visual walkthrough
Step 1: Initialize i=0, j=1, k=0 for s="cacb"
Pointer i starts at 0, pointer j starts at 1. k tracks how many characters have matched so far between s[i:] and s[j:].
Step 2: Compare s[i+k]=s[0]='c' vs s[j+k]=s[1]='a'
s[0] > s[1] (c > a), so the suffix starting at j=1 cannot beat i=0. Move j to j+k+1 = 2. Reset k to 0.
Step 3: Compare s[i+k]=s[0]='c' vs s[j+k]=s[2]='c'
They are equal, so we cannot decide yet. Increment k to 1 and keep comparing.
Step 4: Compare s[i+k]=s[1]='a' vs s[j+k]=s[3]='b'
s[1] < s[3] (a < b), so s[j:] is larger. Move i to max(i+k+1, j) = max(2, 2) = 2. Set j = i+1 = 3, k = 0.
Step 5: Compare s[i+k]=s[2]='c' vs s[j+k]=s[3]='b'
s[2] > s[3] (c > b), so j would move to j+k+1 = 4. Since j+k >= n, the loop ends. The answer is s[i:] = s[2:] = "cb".
Notice how pointers never move backward. Each comparison either increments k or jumps one of the pointers forward, guaranteeing linear progress through the string.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Two-pointer suffix comparison | O(n) | O(1) |
Time: Each of the three variables i, j, and k only increases throughout the algorithm. The quantity i + j + k starts at 1 and can reach at most 3n, so the total number of iterations is O(n).
Space: The algorithm uses only three integer variables. The final s[i:] return creates a new string, but the comparison logic itself requires O(1) extra space.
The building blocks
1. Two-pointer comparison on strings
The core mechanic here is using two pointers to race through a string, comparing characters at corresponding offsets. This same pattern appears in problems like merging sorted arrays or finding the longest common prefix. The twist in this problem is the variable k that tracks the matching length, allowing you to skip over positions you have already ruled out.
2. Suffix-as-candidate reasoning
Recognizing that the answer must be a suffix is the crucial reduction step. This kind of reasoning, where you narrow the search space by proving structural constraints, appears in many string problems. For example, in longest duplicate substring problems you also work with suffixes, and in problems like longest common prefix you compare strings from their starting positions.
Edge cases
- Single character. The string itself is the only suffix and the answer. The loop body never executes because
j+k = 1which equalsn. - All identical characters. For a string like
"aaaa", every suffix comparison results in equal characters. The variablekkeeps incrementing untilj+kreachesn, andistays at 0. The full string is the answer. - Descending string. For
"dcba", the first character is the largest, soistays at 0 throughout and the answer is the full string. - Ascending string. For
"abcd", each comparison at position 0 loses to the next position. The pointerikeeps moving right, eventually landing on the last character"d". - Repeated largest character. For
"abcbc", the twoccharacters create candidates at indices 2 and 4. The algorithm correctly compares their suffixes and picks index 2 because"cbc"is larger than"c".
From understanding to recall
The two-pointer suffix comparison has three moving parts: i, j, and k. When you review this problem, focus on the decision logic at each step. What happens when characters match? What happens when i wins? What happens when j wins? Make sure you understand why the jump distances (k+1 positions forward) are safe, skipping only positions that provably cannot be the answer.
Then practice reconstructing the algorithm from scratch. Start with the reduction ("answer is always a suffix"), then set up the two pointers, then write the comparison loop. If you can walk through a small example like "cacb" on paper and arrive at the right answer, the code will follow naturally.
The max(i + k + 1, j) detail is easy to forget. Remind yourself why it is necessary: without it, i could fall behind j, causing you to re-examine positions that have already been eliminated.
Related posts
- Longest Common Prefix - character-by-character string comparison
- Longest Duplicate Substring - suffix-based string problem
- Orderly Queue - lexicographic string manipulation
CodeBricks breaks problems like this into reusable building blocks and schedules them for review at the right intervals. Instead of re-solving the same problem from scratch each time, you practice the patterns that transfer across problems.