Skip to content
← All posts

Smallest Subsequence of Distinct Characters: Monotonic Stack Pattern

6 min read
leetcodeproblemmediumstringsstacksgreedy

Given a string s, return the lexicographically smallest subsequence of s that contains all the distinct characters of s exactly once.

This is LeetCode 1081: Smallest Subsequence of Distinct Characters, and it is the same problem as LeetCode 316: Remove Duplicate Letters. It is rated medium, and the core technique it teaches is a greedy monotonic stack that builds the smallest possible result character by character.

Input:  "cbacdcbc"
Output: "acdb"

The output must contain each distinct character (a, b, c, d) exactly once, and among all valid subsequences, "acdb" is the lexicographically smallest.

Input: "cbacdcbc"cbacdcbc01234567Monotonic stack builds the answeracdbbottom \u2192 topResult: "acdb"acdb
The input string "cbacdcbc" has four distinct characters. The monotonic stack greedily builds the lexicographically smallest subsequence: "acdb".

Why this problem matters

This problem sits at the intersection of three important ideas: greedy algorithms, monotonic stacks, and string manipulation. You have seen monotonic stacks in problems like Daily Temperatures and Largest Rectangle in Histogram, where the stack maintains an order and pops when that order breaks. Here, the twist is that you also need to track which characters are already in the stack and whether a popped character can reappear later.

The pattern of "build the smallest possible sequence by greedily popping larger elements when you know they will appear again" shows up in problems like Remove K Digits (LeetCode 402) and Create Maximum Number (LeetCode 321). Mastering it here gives you a template you can adapt to those problems directly.

The key insight

Imagine building the result one character at a time, left to right. For each new character, you ask: "Can I make the result smaller by removing the character currently on top of my stack?" The answer is yes if two conditions hold:

  1. The current character is smaller than the stack's top.
  2. The stack's top character appears again later in the string, so you will not lose it permanently.

If both conditions are true, you pop the top and check again. This greedy removal keeps the result as lexicographically small as possible. You also skip any character that is already in the stack, because including it twice would violate the "exactly once" constraint.

The solution

def smallest_subsequence(s: str) -> str:
    last_occurrence = {c: i for i, c in enumerate(s)}
    stack = []
    in_stack = set()

    for i, c in enumerate(s):
        if c in in_stack:
            continue
        while stack and c < stack[-1] and last_occurrence[stack[-1]] > i:
            in_stack.discard(stack.pop())
        stack.append(c)
        in_stack.add(c)

    return "".join(stack)

Let's walk through each part.

The first line builds a dictionary mapping each character to its last index in s. This is how we know whether a character appears later. If last_occurrence[stack[-1]] > i, the top of the stack will show up again, so popping it is safe.

The in_stack set tracks which characters are currently in the stack. When we encounter a character already in the set, we skip it entirely. This ensures each character appears exactly once.

The while loop is the heart of the algorithm. It pops from the stack as long as three things are true: the stack is not empty, the current character is smaller than the top, and the top character appears later. Each pop removes a character that would make the result larger and that we will get a chance to re-add in a better position.

After the while loop, we push the current character onto the stack and add it to in_stack. By the time we finish iterating through s, the stack contains the answer in order from bottom to top.

This problem is identical to LeetCode 316 (Remove Duplicate Letters). If you have solved one, you have solved both. The key detail that makes this harder than a standard monotonic stack problem is the in_stack set, which prevents duplicate characters and short-circuits the logic for characters you have already placed.

Visual walkthrough

Let's trace through the input "bcabc" step by step. Watch how the stack pops larger characters when a smaller one arrives and the popped characters appear later.

Step 1: Record the last occurrence of each character in "bcabc"

last_occurrence:a → 2b → 3c → 4Indices: b c a b c0 1 2 3 4

Knowing where each character last appears tells us whether it is safe to remove from the stack. If a character appears later, we can afford to pop it now and re-add it in a better position.

Step 2: i=0, char="b". Stack is empty, push "b".

Input:bcabcStack:bin_stack = {b}

Nothing to compare against. Push "b" onto the stack and mark it as in the stack.

Step 3: i=1, char="c". "c" > "b" (top), so just push.

Input:bcabcStack:bcin_stack = {b, c}

"c" is greater than the stack top "b", so the stack stays in order. Push "c".

Step 4: i=2, char="a". Pop "c" (last[c]=4 > 2), pop "b" (last[b]=3 > 2). Push "a".

Input:bcabcPop:cbboth appear later, safe to popStack:ain_stack = {a}

This is the key step. "a" is smaller than "c" and "b", and both appear again later. We safely remove them from the stack to get a smaller result.

Step 5: i=3, char="b". "b" > "a" (top), push "b".

Input:bcabcStack:abin_stack = {a, b}

"b" is not in the stack and is greater than the top. Push it.

Step 6: i=4, char="c". "c" > "b" (top), push "c". Done.

Input:bcabcStack:abcResult: "abc"

All characters processed. The stack holds "abc", the lexicographically smallest subsequence containing all distinct characters.

The critical moment is Step 4. When "a" arrives, both "c" and "b" are on the stack, and both appear later in the string (at indices 4 and 3 respectively). So we pop both, giving "a" the chance to sit at the front of the result. This greedy removal is what produces the smallest possible answer.

Complexity analysis

ApproachTimeSpace
Monotonic StackO(n)O(1)

Time is O(n) because each character is pushed onto the stack at most once and popped at most once. The inner while loop can pop multiple characters in a single iteration of the outer for loop, but across the entire algorithm, the total number of pops cannot exceed n.

Space is O(1) in terms of auxiliary space beyond the output. The stack, the in_stack set, and the last_occurrence dictionary all hold at most 26 entries (one per lowercase English letter). These do not grow with the input size.

The building blocks

1. Greedy monotonic stack

The core pattern is a stack that stays in a particular order (here, increasing) by popping elements that violate the order when it is safe to do so:

while stack and current < stack[-1] and can_remove(stack[-1]):
    stack.pop()
stack.append(current)

This same template appears in Remove K Digits, where you pop digits that are larger than the current one to make the number smaller. It also appears in Create Maximum Number, where you do the opposite (pop smaller digits to make the number larger). The only thing that changes between problems is the comparison direction and the definition of "safe to remove."

2. Last occurrence lookup

Pre-computing the last index of each character lets you answer "will this character appear again?" in O(1):

last_occurrence = {c: i for i, c in enumerate(s)}

This is a common preprocessing step for greedy string problems. You will see it in Partition Labels (LeetCode 763) as well, where knowing the last occurrence of each character helps you determine partition boundaries.

3. Duplicate tracking with a set

The in_stack set prevents the same character from entering the stack twice:

if c in in_stack:
    continue

This is a simple but essential piece. Without it, you would push duplicate characters and produce an invalid result. The set also needs to be updated on pops (in_stack.discard(stack.pop())), which is easy to forget.

Edge cases

Before submitting, verify your solution handles these scenarios:

  • Already sorted like "abc": each character is pushed once and never popped. Result is "abc".
  • Reverse sorted like "cba": "b" does not pop "c" because "c" does not appear later. Same for "a" and "b". Result is "cba".
  • Single character like "a": result is "a".
  • All same character like "aaaa": the first "a" is pushed, and the rest are skipped by the in_stack check. Result is "a".
  • Two characters alternating like "ababab": "a" is pushed first. "b" is pushed next (greater than "a"). The remaining characters are all skipped because both are already in the stack. Result is "ab".

The monotonic stack solution handles all of these without special cases.

From understanding to recall

You have seen how the monotonic stack builds the smallest subsequence: pop larger characters when they appear later, skip characters already in the stack, and push everything else. The algorithm is elegant and the code is short. But under interview pressure, it is easy to forget a detail. Do you pop when the current character is less than or equal to the top, or strictly less? Do you check last_occurrence against i or against i + 1? Do you discard from in_stack before or after popping?

These are recall issues, not understanding issues. Spaced repetition is how you lock them in. You practice writing this solution from scratch at increasing intervals. After a few rounds, the three-part while condition (stack and c < stack[-1] and last_occurrence[stack[-1]] > i) is automatic. You do not think about it. You just write it.

Related posts

CodeBricks breaks this problem into its reusable building blocks: the greedy monotonic stack, the last occurrence lookup, and the duplicate tracking set. You drill each piece independently until it is second nature. When you see a "build the smallest sequence" problem in an interview, the code flows without hesitation.