Delete Columns to Make Sorted II: Greedy Column Selection
LeetCode 955, Delete Columns to Make Sorted II, is a medium problem that asks you to find the minimum number of columns to delete from an array of equal-length strings so that the remaining columns, read left to right, make the array of strings lexicographically sorted.
The problem
You are given an array of n strings, each of the same length. You can delete entire columns (the same character position from every string). Find the minimum number of columns to delete so that the resulting strings are in non-decreasing lexicographic order.
Input: strs = ["xga", "xfb", "yfa"]
Output: 1
Deleting column 1 gives ["xa", "xb", "ya"], which is sorted. You cannot keep all three columns because "xga" > "xfb" lexicographically (since at column 1, 'g' > 'f').
Why this problem matters
Greedy problems can be tricky because they require you to prove that a local decision leads to a global optimum. This problem is a clean example of that pattern. The greedy choice here is: process columns left to right, and only delete a column if keeping it would break the ordering for some pair of adjacent strings that is not yet "settled" by an earlier column. You never need to reconsider a column you already decided to keep.
The concept of "settled pairs" is central to getting this right. Two adjacent strings become settled once you find a kept column where the upper string has a strictly smaller character than the lower string. After that point, no future column can break their relative ordering, because lexicographic comparison stops at the first difference. This is a subtle but powerful observation that appears in other greedy string problems as well.
Understanding this problem gives you a reusable template: when you need to maintain a sorted invariant across multiple dimensions (here, columns), track which constraints are already satisfied and only worry about the ones that are not.
The brute force approach
The brute force way is to try every possible subset of columns to delete. For m columns, there are 2^m subsets. For each subset, you would construct the resulting strings and check if they are sorted. This runs in O(2^m * n * m) time, which is completely impractical for the constraint limits (up to 200 strings of length 200). You need a fundamentally different strategy.
The key insight
Process columns left to right. Maintain a set of "settled" row pairs where you already know the earlier row is strictly less than the later row based on columns you have kept so far. When considering a new column, check only the unsettled pairs. If the new column would cause any unsettled pair to go out of order (where the earlier row's character is strictly greater than the later row's character), delete that column. Otherwise, keep it and update which pairs become newly settled.
A pair of adjacent rows becomes "settled" once you find a kept column where the upper row has a strictly smaller character. After that, no future column can break the ordering of that pair, because lexicographic comparison stops at the first difference.
Walking through it step by step
Step 1: The initial grid
We have 3 strings of length 3. Process columns left to right and track which adjacent pairs are "settled" (strictly ordered by earlier kept columns).
Step 2: Check column 0 (x, x, y)
Column 0 does not break any pair. Keep it. Row 0 vs row 1: x == x, still tied. Row 1 vs row 2: x < y, now settled.
Step 3: Check column 1 (g, f, f)
For unsettled pair (rows 0,1): g > f. This breaks the ordering. We must delete column 1.
Step 4: Check column 2 (a, b, a)
For unsettled pair (rows 0,1): a < b. This is fine, and now pair 0-1 becomes settled. Pair 1-2 was already settled. Keep column 2.
Step 5: Result
After deleting column 1, the remaining strings are "xa", "xb", "ya", which are in sorted order. Answer: 1 column deleted.
The solution
def minDeletionSize(strs):
n = len(strs)
settled = [False] * (n - 1)
deleted = 0
for j in range(len(strs[0])):
can_keep = True
for i in range(n - 1):
if not settled[i] and strs[i][j] > strs[i + 1][j]:
can_keep = False
break
if not can_keep:
deleted += 1
else:
for i in range(n - 1):
if strs[i][j] < strs[i + 1][j]:
settled[i] = True
return deleted
-
Initialize tracking state. Create a
settledarray of booleans, one per adjacent pair. Initially, no pair is settled. Also initializedeleted = 0to count columns removed. -
Iterate through each column
j. For each column, you make a single decision: keep or delete. -
Check if the column is safe to keep. Loop through each adjacent pair
(i, i+1). If the pair is not yet settled and the character in rowiat columnjis strictly greater than the character in rowi+1, this column would break ordering. Setcan_keep = Falseand break early. -
Delete or keep. If
can_keepisFalse, incrementdeleted. The column is discarded and nothing changes about the settled state. -
Update settled pairs. If you keep the column, loop through the pairs again. Any unsettled pair where
strs[i][j]is strictly less thanstrs[i+1][j]becomes settled. These pairs are now guaranteed to be in order regardless of future columns. -
Return the count. After processing all columns,
deletedholds the minimum number of columns removed.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O(n * m) where n = number of strings, m = string length |
| Space | O(n) for the settled array |
Each column requires at most two passes through the n - 1 adjacent pairs: one to check if the column is safe, and one to update settled status. With m columns, this gives O(n * m) total work. The space is O(n) for the boolean array tracking settled pairs. No additional data structures are needed.
Building blocks
1. Greedy column-by-column processing
The greedy approach processes columns in order and makes an irrevocable decision about each one. This works because columns are independent dimensions of the lexicographic comparison. Keeping a column never makes things worse for pairs that are already settled, and deleting a column never helps settle new pairs. So the optimal strategy is to keep every column you can and only delete the ones that actively break an unsettled pair.
2. Settled pairs tracking
The settled array is the key data structure. It compresses the state of all prior column decisions into a simple boolean per pair. Once a pair is settled (strictly ordered by some kept column), you never need to check it again. This is what makes the greedy approach correct: settled pairs cannot be "unsettled" by future columns, because lexicographic ordering locks in at the first strict difference.
Why is the greedy approach optimal? Suppose you could get a better answer by deleting a column you chose to keep. But keeping that column only settled additional pairs (or left them unchanged). Deleting it would leave those pairs unsettled, potentially requiring more deletions later. So keeping a safe column is always at least as good as deleting it.
Edge cases
- Single string. Always sorted, answer is 0.
- Already sorted strings. No deletions needed. Every column is safe to keep and pairs settle progressively.
- All identical strings. All pairs are "equal" at every column, never settled by strict inequality, but no column causes a violation either. Answer is 0.
- Reverse-sorted strings. May need to delete many columns. The algorithm correctly identifies which columns break the unsettled pairs.
- Two strings that are identical except for one column. That column determines everything. If it is in the right order, keep it and all pairs settle. If not, delete it.
Common mistakes
-
Forgetting the settled concept. A common error is to check every pair at every column, ignoring that some pairs are already guaranteed to be in order. This leads to deleting columns that are actually safe to keep.
-
Confusing strict and non-strict ordering. A pair settles only on strict inequality (
<), not on equality (==). Equal characters leave the pair's ordering undetermined, so you still need to check future columns for that pair. -
Updating settled before checking. You must check the column for violations first, then update settled status only if you decide to keep it. If you update settled prematurely and then delete the column, your settled state is wrong.
-
Not breaking early on violation. Once you find a single unsettled pair where the column breaks ordering, you can immediately decide to delete. Continuing to check more pairs is unnecessary (though not incorrect, just wasteful).
-
Treating this like the simpler Delete Columns I problem. In that version, each column is checked independently. Here, columns interact through the settled state, and you must process them in order.
From understanding to recall
The core of this problem is a single idea: settled pairs. Once you internalize that concept, the algorithm writes itself. Process columns left to right, skip the column if it breaks any unsettled pair, and mark newly settled pairs when you keep a column.
When you review this problem, focus on being able to explain why the greedy approach is correct. The argument is that keeping a safe column never hurts: it can only settle more pairs, which makes future columns easier to keep as well. Deleting a safe column would leave pairs unsettled unnecessarily, potentially forcing more deletions down the road.
Practice writing the solution from scratch. The data structure is minimal (just a boolean array), and the logic has exactly two phases per column: check, then update. Once you can reproduce this pattern without looking, you will recognize it instantly in similar greedy problems.
Related posts
- Delete Columns to Make Sorted - The simpler version where each column is checked independently
- Orderly Queue - Another string ordering problem requiring careful analysis of what operations can achieve
- Longest Common Subsequence - String comparison with a different optimization target