Reconstruct Original Digits from English: Letter Frequency Trick
LeetCode 423, Reconstruct Original Digits from English, gives you a jumbled string of English letters that spell out digits 0 through 9, and asks you to figure out which digits are hiding in there. The output should be those digits sorted in ascending order. It sounds like a puzzle, and it is. But once you see the trick, the solution is elegant and fast.
The problem
Given a string s containing an out-of-order English representation of digits 0-9, return the digits in ascending order.
Each digit's English word ("zero", "one", "two", ..., "nine") contributes its letters to the string. Your job is to reverse-engineer which digits were used.
Example: s = "owoztneoer" returns "012" because the letters can be rearranged into "zero", "one", "two".
The question boils down to: how do you untangle overlapping letter sets to figure out which digit words are present?
Why this problem matters
This problem teaches you a valuable technique: using unique identifiers to disambiguate overlapping sets. The digit words share many letters ("one" and "zero" both have 'o' and 'e'), so you cannot just count a single letter and know the answer. But some letters only appear in one digit word, and that is the entire key.
This kind of reasoning, finding an invariant that lets you peel apart layers of a problem, shows up in frequency analysis, cryptography puzzles, and constraint-based scheduling. It is a useful mental model beyond just this one LeetCode problem.
The approach
The insight is that certain letters are unique to certain digit words:
'z'only appears in "zero" (0)'w'only appears in "two" (2)'u'only appears in "four" (4)'x'only appears in "six" (6)'g'only appears in "eight" (8)
Once you know how many of those five digits exist, you can subtract their letter contributions and use the remaining counts to identify the rest:
'o'identifies "one" (1) after removing zero, two, and four't'identifies "three" (3) after removing two and eight'f'identifies "five" (5) after removing four's'identifies "seven" (7) after removing six'i'identifies "nine" (9) after removing five, six, and eight
Two phases. Count, subtract, done.
def originalDigits(s):
from collections import Counter
count = Counter(s)
digits = [0] * 10
digits[0] = count['z']
digits[2] = count['w']
digits[4] = count['u']
digits[6] = count['x']
digits[8] = count['g']
digits[1] = count['o'] - digits[0] - digits[2] - digits[4]
digits[3] = count['t'] - digits[2] - digits[8]
digits[5] = count['f'] - digits[4]
digits[7] = count['s'] - digits[6]
digits[9] = count['i'] - digits[5] - digits[6] - digits[8]
result = []
for i in range(10):
result.append(str(i) * digits[i])
return ''.join(result)
Let's break each part down:
- Count the letters.
Counter(s)builds a frequency map of every character in the input string. This is the raw data we work from. - Phase 1: unique letters. For each of the five digits that have a unique letter, just read off the count of that letter. The number of
'z's tells you exactly how many zeros there are. Same for'w'and two,'u'and four, and so on. - Phase 2: subtraction. The remaining five digits share letters with Phase 1 digits. But after subtracting the Phase 1 contributions, a single letter becomes the unique identifier for each remaining digit. For example,
'o'appears in "zero", "one", "two", and "four". After subtracting out the counts from zero, two, and four, whatever'o'count remains must come from "one". - Build the result. Walk digits 0 through 9 in order. For each digit, append it to the result string as many times as it appears.
Step-by-step walkthrough
Let's trace through s = "owoztneoer" to see the algorithm in action.
Step 1: Count letter frequencies in the input string.
s = "owoztneoer"
Build a frequency map of all 10 characters in the input.
Step 2: Extract Phase 1 digits using unique letters.
We find one "zero" and one "two". No "four", "six", or "eight".
Step 3: Extract Phase 2 digits using remaining letters.
digits[1] = count['o'] - digits[0] - digits[2] - digits[4] = 3 - 1 - 1 - 0 = 1
The remaining 'o' count (after subtracting zero, two, four) tells us there is one "one".
Step 4: Build the result in ascending order.
digits = [1, 1, 1, 0, 0, 0, 0, 0, 0, 0]. Concatenate: "0" + "1" + "2" = "012".
The entire process is deterministic. No backtracking, no guessing. You count letters once, do some arithmetic, and assemble the answer.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O(n) |
| Space | O(1) |
Time: O(n). We scan the string once to build the frequency map. All the digit extraction is O(1) since it involves a fixed number of lookups and subtractions (exactly 10 digits, each with a constant number of operations).
Space: O(1). The frequency map holds at most 26 entries (one per lowercase letter). The digits array is always size 10. Both are bounded by constants, independent of input size.
Edge cases to watch for
- Single digit.
s = "zero"should return"0". The algorithm handles this naturally since only one unique letter is present. - Repeated digits.
s = "zerozerozero"has three'z's, sodigits[0] = 3, and the result is"000". - All ten digits present. The string contains all letters from "zero" through "nine". The subtraction logic still works because each digit's identifying letter is consumed exactly the right number of times.
- Large inputs. The string could be very long (up to 100,000 characters). The O(n) approach handles this without issues.
- No digits of a certain type. If a digit's count evaluates to 0, it simply contributes nothing to the output string. No special handling needed.
The building blocks
This problem is built from two core building blocks that appear across many other problems:
-
Frequency counting. The same character-counting technique you use in Group Anagrams and Valid Anagram. Build a frequency map, then reason about the counts. Here, instead of comparing two frequency maps, you are decomposing one map into constituent digit words.
-
Unique identifier extraction. The idea of finding a property that uniquely distinguishes one category from the rest. This is similar to how Ransom Note checks supply vs demand on character counts, but here you are layering the extractions in dependency order. Phase 1 digits have no dependencies. Phase 2 digits depend on Phase 1 results. That ordering is the key insight.
From understanding to recall
This problem has a satisfying "aha" moment when you realize certain letters are unique to certain digits. But that moment fades fast. A week from now, you might remember the general idea but forget which letters map to which digits, or forget the exact subtraction formulas for Phase 2.
Spaced repetition helps here. You do not need to memorize the letter-to-digit mapping by rote. What you need is to practice deriving it. Each time you revisit the problem, you write out the digit words, scan for unique letters, and build the subtraction chain. After a few repetitions, the derivation process becomes automatic. You see a "disambiguate overlapping sets" problem and you know to look for unique identifiers first, then peel away layers.
Related posts
- Group Anagrams - Another problem where frequency counting on characters is the core technique, but used for grouping instead of extraction
- Valid Anagram - The simplest frequency counting problem, a great warm-up for understanding character frequency maps
- Ransom Note - A supply-vs-demand frequency problem that uses the same counting mechanics in a different context