Number of Different Integers in a String: Parsing and Sets
You are given a string word that consists of digits and lowercase English letters. You need to replace every non-digit character with a space, then find the number of different integers in the resulting string. Two integers are considered different if their decimal representations without leading zeros are different.
This is LeetCode 1805: Number of Different Integers in a String.
Why this problem matters
Parsing problems like this one test your ability to handle string boundaries, leading zeros, and deduplication all in one pass. These are not just interview skills. They show up constantly in real-world tasks: extracting IDs from log files, tokenizing mixed-format input, cleaning messy data before analysis. Getting comfortable with the two-pointer scanning pattern and canonical normalization will pay off far beyond this single problem.
The key insight
The plan is simple: scan through the string, collect contiguous runs of digit characters, strip leading zeros to get a canonical form, and add each result to a set. The set handles deduplication automatically, so at the end you just return its size.
The tricky part is handling leading zeros correctly. The strings "01" and "1" represent the same integer, so you need a normalization step. Stripping leading zeros works perfectly here, but you also need to handle the edge case where the entire digit run is zeros (like "000"). After stripping all the zeros you get an empty string, so you need a fallback to represent zero.
The solution
def num_different_integers(word: str) -> int:
integers = set()
i = 0
n = len(word)
while i < n:
if word[i].isdigit():
j = i
while j < n and word[j].isdigit():
j += 1
integers.add(word[i:j].lstrip('0') or '0')
i = j
else:
i += 1
return len(integers)
Here is how each piece works:
- Outer loop with
i: walks through every character in the string. When it hits a letter, it just moves forward one step. - Inner loop with
j: onceilands on a digit,jadvances until it hits a non-digit or the end of the string. This gives you the sliceword[i:j]containing the full digit run. lstrip('0'): removes all leading zeros from the extracted substring."00123"becomes"123", and"034"becomes"34".or '0': handles the all-zeros case. Iflstrip('0')produces an empty string (meaning the original was something like"000"), theorexpression falls back to"0".- Set insertion: adding the normalized string to a set means duplicates are ignored automatically. You never need to check "is this already there?" yourself.
i = j: after processing a digit run, jumpidirectly to wherejstopped. This avoids re-scanning digits you already processed.
Using lstrip('0') or '0' handles the edge case where the integer is just "0" or "000". After stripping all zeros, you get an empty string, so the or '0' fallback ensures zero is represented correctly.
Visual walkthrough
Let's trace through the example "a123bc34d8ef34" step by step, watching the set grow as we find each digit run.
num_different_integers("a123bc34d8ef34"):Step 1: Skip letter 'a', find digit run "123" (indices 1-3)
"123"Normalized:"123"{"123"}Size: 1Scan from index 0. 'a' is not a digit, skip. At index 1, we find digits. Advance j until non-digit at index 4. Extract "123", strip leading zeros (none), add "123" to set.
Step 2: Skip letters 'bc', find digit run "34" (indices 6-7)
"34"Normalized:"34"{"123", "34"}Size: 2Continue from index 4. 'b' and 'c' are letters, skip. At index 6, digits start. Extract "34", add to set. Set now has two entries.
Step 3: Skip letter 'd', find digit run "8" (index 9)
"8"Normalized:"8"{"123", "34", "8"}Size: 3Continue from index 8. 'd' is a letter, skip. At index 9, single digit '8'. Extract and add to set. Set now has three entries.
Step 4: Skip letters 'ef', find digit run "34" (indices 12-13)
"34"Normalized:"34"{"123", "34", "8"}Size: 3Continue from index 10. 'e' and 'f' are letters. At index 12, digits "34" again. After stripping leading zeros, "34" is already in the set. No new entry added.
Step 5: End of string, return set size
{"123", "34", "8"}Size: 3All characters processed. The set contains {"123", "34", "8"}. Return 3.
The set has 3 unique integer strings: "123", "34", and "8". The duplicate "34" was already present.
Notice how the second occurrence of "34" at indices 12-13 does not increase the set size. The set already contains "34" from the earlier extraction, so the duplicate is silently ignored. This is exactly why a set is the right data structure here.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Single pass with set | O(n) | O(n) |
Time: O(n) where n is the length of the string. Each character is visited at most twice, once by the outer pointer i and once by the inner pointer j.
Space: O(n) for the set of integer strings in the worst case. If every character is a digit separated by a letter, you could end up with roughly n/2 entries, each of length 1.
The building blocks
1. Two-pointer digit extraction
The pattern of using two pointers to extract contiguous sequences from a mixed string is reusable in many contexts. One pointer marks the start of a segment, the other advances to find the end. After processing the segment, the outer pointer jumps to where the inner pointer stopped.
i = 0
while i < n:
if s[i].isdigit():
j = i
while j < n and s[j].isdigit():
j += 1
process(s[i:j])
i = j
else:
i += 1
This avoids the overhead of splitting the string or using regex. It processes everything in a single pass with constant extra work per character.
2. Leading zero normalization
When you need a canonical representation of a numeric string, stripping leading zeros is the standard approach. This lets you compare numeric strings by value rather than by raw characters.
normalized = num_str.lstrip('0') or '0'
The or '0' guard is critical. Without it, the string "0" would normalize to an empty string, which would be incorrect. This two-part idiom is worth committing to memory because it shows up whenever you need to compare numeric strings.
Edge cases
- No digits: the string is all letters, so no integers are extracted and the answer is 0
- Single digit:
"a1b"contains one integer, return 1 - Leading zeros:
"a01b001c1"has one unique integer (1), because"01","001", and"1"all normalize to"1" - All zeros:
"000"should count as one integer (0), not zero integers - Very long digit sequences: the constraints allow up to 1000 characters, so integer values can exceed standard int range in some languages. Python handles arbitrary precision natively, but in Java or C++ you would need to compare strings rather than converting to int
- Adjacent groups:
"1a1"has one unique integer (1), since both groups normalize to the same value
From understanding to recall
This problem looks simple on the surface, but the devil is in the details. Leading zeros, the all-zeros edge case, and the choice between string comparison and integer conversion are all places where a careless implementation breaks. Solving it once teaches you the pattern. Solving it again a week later, then a month later, locks it in. Spaced repetition helps you remember the lstrip trick and the two-pointer scanning pattern so they become automatic when you encounter a parsing problem in an interview.
Related posts
- Valid Anagram - Another string problem where character frequency and set-based thinking are key
- Group Anagrams - Uses hash maps and string normalization, a related pattern to leading zero handling
- Find the Index of the First Occurrence in a String - String scanning fundamentals
String parsing problems reward precision over cleverness. The algorithm here is not fancy, but getting every edge case right is what separates a correct solution from a buggy one. If you want to build that precision into long-term memory, CodeBricks uses spaced repetition to bring problems back at the right intervals so the patterns stick.