Skip to content
← All posts

Number of Different Integers in a String: Parsing and Sets

5 min read
leetcodeproblemeasystringshash-map

You are given a string word that consists of digits and lowercase English letters. You need to replace every non-digit character with a space, then find the number of different integers in the resulting string. Two integers are considered different if their decimal representations without leading zeros are different.

This is LeetCode 1805: Number of Different Integers in a String.

Scanning "a123bc34d8ef34"a123bc34d8ef3401234567891011121312334834Unique set: {123, 34, 8} = 3 distinct integers
Green cells are digit characters. Blue cells are letters. The string contains integers: 123, 34, 8, 34. Since 34 appears twice, the unique count is 3.

Why this problem matters

Parsing problems like this one test your ability to handle string boundaries, leading zeros, and deduplication all in one pass. These are not just interview skills. They show up constantly in real-world tasks: extracting IDs from log files, tokenizing mixed-format input, cleaning messy data before analysis. Getting comfortable with the two-pointer scanning pattern and canonical normalization will pay off far beyond this single problem.

The key insight

The plan is simple: scan through the string, collect contiguous runs of digit characters, strip leading zeros to get a canonical form, and add each result to a set. The set handles deduplication automatically, so at the end you just return its size.

The tricky part is handling leading zeros correctly. The strings "01" and "1" represent the same integer, so you need a normalization step. Stripping leading zeros works perfectly here, but you also need to handle the edge case where the entire digit run is zeros (like "000"). After stripping all the zeros you get an empty string, so you need a fallback to represent zero.

The solution

def num_different_integers(word: str) -> int:
    integers = set()
    i = 0
    n = len(word)

    while i < n:
        if word[i].isdigit():
            j = i
            while j < n and word[j].isdigit():
                j += 1
            integers.add(word[i:j].lstrip('0') or '0')
            i = j
        else:
            i += 1

    return len(integers)

Here is how each piece works:

  • Outer loop with i: walks through every character in the string. When it hits a letter, it just moves forward one step.
  • Inner loop with j: once i lands on a digit, j advances until it hits a non-digit or the end of the string. This gives you the slice word[i:j] containing the full digit run.
  • lstrip('0'): removes all leading zeros from the extracted substring. "00123" becomes "123", and "034" becomes "34".
  • or '0': handles the all-zeros case. If lstrip('0') produces an empty string (meaning the original was something like "000"), the or expression falls back to "0".
  • Set insertion: adding the normalized string to a set means duplicates are ignored automatically. You never need to check "is this already there?" yourself.
  • i = j: after processing a digit run, jump i directly to where j stopped. This avoids re-scanning digits you already processed.

Using lstrip('0') or '0' handles the edge case where the integer is just "0" or "000". After stripping all zeros, you get an empty string, so the or '0' fallback ensures zero is represented correctly.

Visual walkthrough

Let's trace through the example "a123bc34d8ef34" step by step, watching the set grow as we find each digit run.

Tracing num_different_integers("a123bc34d8ef34"):

Step 1: Skip letter 'a', find digit run "123" (indices 1-3)

a123bc34d8ef34
Extracted:"123"Normalized:"123"
Set:{"123"}Size: 1

Scan from index 0. 'a' is not a digit, skip. At index 1, we find digits. Advance j until non-digit at index 4. Extract "123", strip leading zeros (none), add "123" to set.

Step 2: Skip letters 'bc', find digit run "34" (indices 6-7)

a123bc34d8ef34
Extracted:"34"Normalized:"34"
Set:{"123", "34"}Size: 2

Continue from index 4. 'b' and 'c' are letters, skip. At index 6, digits start. Extract "34", add to set. Set now has two entries.

Step 3: Skip letter 'd', find digit run "8" (index 9)

a123bc34d8ef34
Extracted:"8"Normalized:"8"
Set:{"123", "34", "8"}Size: 3

Continue from index 8. 'd' is a letter, skip. At index 9, single digit '8'. Extract and add to set. Set now has three entries.

Step 4: Skip letters 'ef', find digit run "34" (indices 12-13)

a123bc34d8ef34
Extracted:"34"Normalized:"34"
Set:{"123", "34", "8"}Size: 3

Continue from index 10. 'e' and 'f' are letters. At index 12, digits "34" again. After stripping leading zeros, "34" is already in the set. No new entry added.

Step 5: End of string, return set size

a123bc34d8ef34
Set:{"123", "34", "8"}Size: 3

All characters processed. The set contains {"123", "34", "8"}. Return 3.

All characters processed. Return 3.

The set has 3 unique integer strings: "123", "34", and "8". The duplicate "34" was already present.

Notice how the second occurrence of "34" at indices 12-13 does not increase the set size. The set already contains "34" from the earlier extraction, so the duplicate is silently ignored. This is exactly why a set is the right data structure here.

Complexity analysis

ApproachTimeSpace
Single pass with setO(n)O(n)

Time: O(n) where n is the length of the string. Each character is visited at most twice, once by the outer pointer i and once by the inner pointer j.

Space: O(n) for the set of integer strings in the worst case. If every character is a digit separated by a letter, you could end up with roughly n/2 entries, each of length 1.

The building blocks

1. Two-pointer digit extraction

The pattern of using two pointers to extract contiguous sequences from a mixed string is reusable in many contexts. One pointer marks the start of a segment, the other advances to find the end. After processing the segment, the outer pointer jumps to where the inner pointer stopped.

i = 0
while i < n:
    if s[i].isdigit():
        j = i
        while j < n and s[j].isdigit():
            j += 1
        process(s[i:j])
        i = j
    else:
        i += 1

This avoids the overhead of splitting the string or using regex. It processes everything in a single pass with constant extra work per character.

2. Leading zero normalization

When you need a canonical representation of a numeric string, stripping leading zeros is the standard approach. This lets you compare numeric strings by value rather than by raw characters.

normalized = num_str.lstrip('0') or '0'

The or '0' guard is critical. Without it, the string "0" would normalize to an empty string, which would be incorrect. This two-part idiom is worth committing to memory because it shows up whenever you need to compare numeric strings.

Edge cases

  • No digits: the string is all letters, so no integers are extracted and the answer is 0
  • Single digit: "a1b" contains one integer, return 1
  • Leading zeros: "a01b001c1" has one unique integer (1), because "01", "001", and "1" all normalize to "1"
  • All zeros: "000" should count as one integer (0), not zero integers
  • Very long digit sequences: the constraints allow up to 1000 characters, so integer values can exceed standard int range in some languages. Python handles arbitrary precision natively, but in Java or C++ you would need to compare strings rather than converting to int
  • Adjacent groups: "1a1" has one unique integer (1), since both groups normalize to the same value

From understanding to recall

This problem looks simple on the surface, but the devil is in the details. Leading zeros, the all-zeros edge case, and the choice between string comparison and integer conversion are all places where a careless implementation breaks. Solving it once teaches you the pattern. Solving it again a week later, then a month later, locks it in. Spaced repetition helps you remember the lstrip trick and the two-pointer scanning pattern so they become automatic when you encounter a parsing problem in an interview.

Related posts

String parsing problems reward precision over cleverness. The algorithm here is not fancy, but getting every edge case right is what separates a correct solution from a buggy one. If you want to build that precision into long-term memory, CodeBricks uses spaced repetition to bring problems back at the right intervals so the patterns stick.