Number of Unique Good Subsequences: DP Without Leading Zeros
You are given a binary string binary. A subsequence of binary is "good" if it is non-empty, has no leading zeros (except for the subsequence "0" itself), and is unique. Return the number of unique good subsequences modulo 10^9 + 7.
This is LeetCode 1987: Number of Unique Good Subsequences. For example, binary = "101" returns 5. The five unique good subsequences are "0", "1", "10", "11", and "101". Notice that "01" is not included because it has a leading zero.
If you have solved Distinct Subsequences or Decode Ways, you already know how to count subsequences with DP. The twist here is the leading-zero constraint: you cannot just count all subsequences. You need a way to guarantee every counted subsequence starts with "1" (or is exactly "0").
Why this problem matters
Counting subsequences is a classic DP skill, but most problems let you count freely without worrying about what the subsequence looks like. This problem adds a structural constraint: no leading zeros. That single rule forces you to rethink how you build up the count.
The trick is to split your DP into two variables: how many unique subsequences end in "0", and how many end in "1". By tracking endings instead of beginnings, you can extend subsequences safely. Any subsequence that currently ends in "1" must have started with "1" (since it was built from prior valid subsequences). When you see a "0" in the input, you extend existing valid subsequences by appending "0", which preserves the no-leading-zero property. The only special case is the subsequence "0" by itself, which you handle with a boolean flag.
This pattern of splitting DP state by the last character appears in many counting problems on binary strings and sequences with restricted prefixes.
The approach
Define two variables:
ends1: number of unique good subsequences that end with"1"ends0: number of unique good subsequences that end with"0"(excluding the standalone"0")has_zero: a flag for whether"0"appears inbinary(so the standalone subsequence"0"counts)
Process each character in binary left to right:
- If the character is
"1": every existing subsequence (whether it ends in"0"or"1") can be extended by appending"1". Plus, you can start a brand new subsequence"1". Soends1 = ends0 + ends1 + 1. - If the character is
"0": every existing subsequence can be extended by appending"0". But you cannot start a new subsequence with"0"(that would be a leading zero). Instead, you note that the standalone"0"exists by settinghas_zero = true. Soends0 = ends0 + ends1.
The answer is ends0 + ends1 + has_zero, all taken modulo 10^9 + 7.
The key insight: when you see a "0", do not add 1 for a new subsequence starting with "0". That would create a leading zero. Instead, track "0" as a special case with a boolean flag. All other subsequences ending in "0" were built by extending valid subsequences that already started with "1".
Why uniqueness is handled automatically
You might wonder: how does this DP avoid counting duplicate subsequences? The answer is that the recurrence replaces ends0 and ends1 rather than adding to a running total. When you process a "1", the new ends1 is the full set of unique subsequences that end in "1" given all characters seen so far. It does not add to the old ends1. It overwrites it. This is the same technique used in problems like "number of distinct subsequences of a string," where the key to avoiding duplicates is to redefine the count at each step rather than accumulate.
Python solution
def numberOfUniqueGoodSubsequences(binary):
MOD = 10**9 + 7
ends0 = 0
ends1 = 0
has_zero = 0
for ch in binary:
if ch == "1":
ends1 = (ends0 + ends1 + 1) % MOD
else:
ends0 = (ends0 + ends1) % MOD
has_zero = 1
return (ends0 + ends1 + has_zero) % MOD
Time: O(n). One pass through the string with constant work per character.
Space: O(1). Only three variables, regardless of input size.
Step-by-step walkthrough
Let's trace through binary = "101" character by character.
Initialize: ends0 = 0, ends1 = 0, has_zero = false
Before processing any character, there are no subsequences yet.
Step 1: Process char "1"
Char is "1": ends1 = ends0 + ends1 + 1 = 0 + 0 + 1 = 1. We can start a new subsequence "1". ends0 stays 0.
Step 2: Process char "0"
Char is "0": ends0 = ends0 + ends1 = 0 + 1 = 1. We extend "1" to "10". We do NOT start a new subsequence "0" in ends0 (leading zero), but we set has_zero = true to count it separately.
Step 3: Process char "1"
Char is "1": ends1 = ends0 + ends1 + 1 = 1 + 1 + 1 = 3. Existing subsequences "1" and "10" extend to "11" and "101", plus a new "1" (but "1" already exists, so uniqueness is handled by the DP). Final answer: 3 + 1 + 1 = 5.
The final answer is ends1 + ends0 + has_zero = 3 + 1 + 1 = 5.
Walking through the transitions
A few transitions deserve closer attention because they reveal how the recurrence prevents leading zeros and duplicates.
Step 1 (char "1"), ends1 goes from 0 to 1. This is the simplest case. There are no prior subsequences. The + 1 in the formula ends1 = ends0 + ends1 + 1 starts the very first subsequence: "1". The + 1 only appears when the character is "1", never when it is "0". That is what prevents leading zeros.
Step 2 (char "0"), ends0 goes from 0 to 1. The formula ends0 = ends0 + ends1 takes every subsequence ending in "1" (just "1") and extends it with "0" to get "10". There is no + 1 here. You do not create a standalone "0" through ends0. Instead, has_zero flips to 1. This separation is the heart of the algorithm.
Step 3 (char "1"), ends1 goes from 1 to 3. Now ends0 = 1 and ends1 = 1. The formula gives ends1 = 1 + 1 + 1 = 3. Those three subsequences ending in "1" are: the original "1" extended from prior state, "10" extended to "101", and a brand new "1". But wait, "1" already exists. The overwrite ensures we count "1" only once. The three unique subsequences ending in "1" are "1", "11", and "101".
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Brute force (enumerate all 2^n subsequences) | O(2^n) | O(2^n) |
| DP with ends0 and ends1 | O(n) | O(1) |
Where n = length of binary. The DP approach is optimal in both time and space.
Building blocks
This problem is built on the subsequence counting DP pattern with state splitting by last character.
- State: two counters,
ends0andends1, plus a flaghas_zero - Transition (char "1"):
ends1 = ends0 + ends1 + 1(extend all existing + start new) - Transition (char "0"):
ends0 = ends0 + ends1(extend all existing, no new start) - Base case:
ends0 = 0,ends1 = 0,has_zero = 0 - Answer:
ends0 + ends1 + has_zero
The overwrite pattern (replacing ends1 instead of adding to it) is what handles uniqueness. This same technique appears in problems that count distinct subsequences of a single string.
Edge cases to watch for
- All zeros (e.g.,
"000"):ends1stays 0 the entire time.ends0also stays 0 because there are no subsequences to extend.has_zerobecomes 1. The answer is 1 (just the subsequence"0"). - All ones (e.g.,
"111"):has_zerostays 0.ends0stays 0. After processing 3 ones,ends1grows from 0 to 1 to 2 to 3. The answer is 3: the unique good subsequences are"1","11", and"111". Even though there are multiple ways to pick positions, only the distinct values matter. - Single character
"0": The answer is 1. Only the subsequence"0"is valid. - Single character
"1": The answer is 1. Only the subsequence"1"is valid. - String with no zeros:
has_zerostays 0, so you never add the standalone"0". The answer is justends1. - Very long strings: since we take modulo
10^9 + 7at every step, overflow is never an issue in Python (or in languages with modular arithmetic).
From understanding to recall
You now understand why the DP splits into ends0 and ends1, why the + 1 only appears for "1" characters, and why "0" is tracked as a special case. But understanding the logic once is different from reproducing it under pressure.
In an interview, the details that trip people up are: forgetting the has_zero flag, accidentally adding + 1 when the character is "0", or adding to the previous value instead of overwriting. These are small details, but any one of them breaks the solution.
The state-splitting pattern (tracking counts by what the subsequence ends with) is one of roughly 60 reusable building blocks in the CodeBricks system. You practice each block individually with spaced repetition until writing the solution from scratch feels automatic. This problem pairs naturally with distinct subsequences and decode ways. Drilling all three builds the reflex for subsequence-counting DP so you never confuse the transitions.
Related posts
- Distinct Subsequences - 2D DP for counting subsequences of one string that match another. Same counting-by-extension logic, but with a 2D table instead of two variables.
- Decode Ways - Linear DP on a string with conditional transitions. Similar one-pass structure where validity constraints shape the recurrence.
- Longest Palindromic Subsequence - Another subsequence DP problem, this time maximizing length rather than counting. Good contrast for understanding how the objective changes the recurrence.