Delete Columns to Make Sorted III: Longest Increasing Subsequence on Columns
LeetCode 960, Delete Columns to Make Sorted III, is a hard problem that asks you to delete the minimum number of columns from an array of equal-length strings so that every row of the remaining columns is individually sorted in non-decreasing order. Unlike the simpler variants (where you check columns independently or maintain lexicographic order across rows), this version requires each row to read as a sorted string after column deletion.
The problem
You are given an array of n strings, each of the same length m. You can delete entire columns. After deletion, every remaining row must be non-decreasing (each character is <= the next). Return the minimum number of columns to delete.
Input: strs = ["babca", "bbazb"]
Output: 3
Keeping columns 1 and 3 gives "ac" and "bz". Both rows are non-decreasing. You need to delete the other 3 columns.
Why this problem matters
This problem is a beautiful example of recognizing a classic algorithm hiding inside an unfamiliar setting. At first glance, it looks like a string manipulation or greedy problem. But the key insight transforms it into a variant of the Longest Increasing Subsequence (LIS), one of the most important DP patterns in competitive programming and interviews.
The "columns as elements" framing teaches you to think abstractly about what LIS really is. LIS does not require numbers. It requires a set of elements with a partial ordering, and you want the longest chain where each element is compatible with the previous one. Once you see that, this problem clicks into place.
Understanding this connection also helps you solve other problems where LIS appears in disguise: scheduling tasks with multi-dimensional constraints, chaining intervals, or nesting envelopes.
The brute force approach
The brute force way is to try every subset of columns to keep. For m columns, there are 2^m subsets. For each subset, check if every row is non-decreasing. This runs in O(2^m * n * m) time, which is far too slow when m can be up to 200.
You need a smarter approach that avoids enumerating subsets explicitly.
The key insight: this is LIS on columns
Think of each column as an "element" in a sequence. You want to find the longest subsequence of columns you can keep such that every row stays sorted. Two columns are "compatible" (column i can precede column j) if and only if every row's character at column i is <= the character at column j.
This is exactly the Longest Increasing Subsequence problem, where the "comparison" between elements checks all rows simultaneously instead of comparing single numbers.
The connection to LIS: define dp[j] as the length of the longest subsequence of sorted columns ending at column j. For each earlier column i, if column i can precede column j (meaning strs[r][i] <= strs[r][j] for every row r), then dp[j] = max(dp[j], dp[i] + 1). The answer is m - max(dp).
Once you see this reduction, the solution is the standard O(n^2) LIS DP, where n here is the number of columns, and each "comparison" costs O(number of rows).
The solution
def minDeletionSize(strs):
n = len(strs[0])
dp = [1] * n
for j in range(1, n):
for i in range(j):
if all(s[i] <= s[j] for s in strs):
dp[j] = max(dp[j], dp[i] + 1)
return n - max(dp)
-
Initialize the DP array. Set
dp[j] = 1for every columnj. Each column by itself is a valid subsequence of length 1. -
Iterate through columns. For each column
j, look at every earlier columni. -
Check compatibility. Column
ican precede columnjifstrs[r][i] <= strs[r][j]for every rowr. Theall()function checks this in one pass. -
Update the DP value. If column
iis compatible,dp[j] = max(dp[j], dp[i] + 1). This means we can extend the best subsequence ending atiby appending columnj. -
Compute the answer. The maximum value in
dpis the most columns we can keep. Subtract from the total number of columns to get the minimum deletions.
Walking through it step by step
Step 1: Define the DP state
dp[j] = length of the longest subsequence of sorted columns that ends with column j. Initialize every dp[j] to 1.
Step 2: Column compatibility
Column i can precede column j if every row satisfies strs[r][i] <= strs[r][j]. Here col 0 (b,b) vs col 3 (c,z): b<=c and b<=z, so col 0 can precede col 3.
Step 3: Fill the DP array
For each column j, check all earlier columns i. If column i can precede column j, update dp[j] = max(dp[j], dp[i] + 1).
Step 4: Compute the answer
max(dp) = 2, so we can keep at most 2 columns. Answer = 5 - 2 = 3 columns deleted.
Step 5: Verify with kept columns
Keeping columns {1,3} gives "ac" and "bz". Both rows are non-decreasing: a<=c and b<=z. Confirmed: delete 3 columns.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O(m^2 * n), where m = string length (number of columns) and n = number of strings |
| Space | O(m) for the dp array |
The outer loop runs m times, the inner loop runs up to m times, and each compatibility check scans all n rows. This gives O(m^2 * n). With m up to 200 and n up to 100, this is at most 200 * 200 * 100 = 4,000,000 operations, which is well within time limits.
The space is O(m) for the dp array. No additional data structures are needed.
Building blocks
1. Longest Increasing Subsequence DP
The core of this problem is the LIS recurrence: for each element, scan all previous elements and extend the best compatible predecessor. The pattern is:
dp[j] = 1
for i in range(j):
if compatible(i, j):
dp[j] = max(dp[j], dp[i] + 1)
This same structure appears in Russian Doll Envelopes (LeetCode 354), Longest String Chain (LeetCode 1048), and Number of Longest Increasing Subsequences (LeetCode 673). The only thing that changes is the definition of "compatible."
2. Multi-dimensional comparison
The compatibility check all(s[i] <= s[j] for s in strs) is a multi-dimensional comparison. Instead of comparing two numbers, you compare two columns across all rows. This pattern of "every dimension must satisfy the constraint" appears in envelope nesting, job scheduling with multiple resources, and other problems where elements have vector-valued attributes.
Why can't you use the O(n log n) binary search optimization for LIS here? That optimization relies on a total ordering where you can binary search a sorted array of tails. Column compatibility is only a partial order (two columns might be incomparable), so the patience sorting trick does not apply. The O(m^2 * n) approach is optimal for this problem.
Edge cases
- Single column. Only one column, nothing to delete. Answer is 0.
- Single string. If there is only one row, column compatibility reduces to single-character comparison. The problem becomes standard LIS on characters.
- Already sorted rows. If every row is already non-decreasing, you can keep all columns.
dp[j]will equalj + 1for everyj, and the answer is 0. - All identical characters. Every column is compatible with every other column, so
max(dp) = mand the answer is 0. - Reverse-sorted rows. In the worst case, no two columns are compatible and you can only keep 1 column. Answer is
m - 1. - Two columns. Either they are compatible (answer 0) or not (answer 1). A good sanity check for your compatibility function.
From understanding to recall
The key to this problem is a single mental leap: recognizing that "minimum columns to delete so each row is sorted" is equivalent to "find the longest compatible subsequence of columns," which is LIS with a custom comparator.
When you review this problem, practice two things. First, articulate the reduction: what are the "elements" (columns), what is the "ordering" (row-wise <=), and what is the answer (total minus LIS length). Second, write the DP from scratch. The recurrence is identical to standard LIS. The only addition is the all() check for compatibility.
Once you can reproduce this reduction and the DP in under two minutes, you will recognize the pattern whenever a problem asks you to find a minimum set of items to remove so that the remaining items satisfy some pairwise ordering constraint.
The LIS DP pattern is one of roughly 60 reusable building blocks that cover hundreds of LeetCode problems. Drilling them individually with spaced repetition is far more effective than grinding random problems and hoping they stick.
Related posts
- Longest Increasing Subsequence - The classic LIS problem with both O(n^2) and O(n log n) solutions
- Delete Columns to Make Sorted - The easy variant where each column is checked independently
- Delete Columns to Make Sorted II - The medium variant using greedy settled-pairs tracking