Skip to content
← All posts

String to Integer (atoi): State Machine Parsing

8 min read
leetcodeproblemmediumarrays

LeetCode 8, String to Integer (atoi), is one of those problems that sounds trivial until you start coding it. Convert a string to a 32-bit signed integer. Simple, right? Then you start handling leading whitespace, optional signs, non-digit characters, and integer overflow, and suddenly your clean two-liner is a mess of if-statements.

The trick is to stop thinking of it as a bunch of special cases and start thinking of it as a state machine. Once you see the states, the code writes itself.

The problem

Implement a function that converts a string to a 32-bit signed integer. The rules:

  1. Skip any leading whitespace.
  2. Read an optional '+' or '-' sign. If no sign is present, assume positive.
  3. Read digits until you hit a non-digit character or the end of the string.
  4. Convert the digits to an integer. If no digits were read, return 0.
  5. Clamp the result to the 32-bit signed integer range: [-2**31, 2**31 - 1].

A few examples:

  • "42" returns 42. Straightforward.
  • " -42" returns -42. Skip whitespace, read sign, read digits.
  • "4193 with words" returns 4193. Stop at the space.
  • "" returns 0. No digits.
  • "words and 987" returns 0. The first non-whitespace character is not a digit or sign.
skipskip-sign4digit2digitastopbstopcstopwhitespacesigndigitsstop
Parsing " -42abc". Each character falls into a state: skip whitespace, read sign, collect digits, stop at non-digit. Result: -42.

The state machine approach

Instead of writing a chain of if-else blocks for each rule, think of the parser as moving through four states:

  1. START - we are skipping leading whitespace. If we see a space, stay here. If we see a sign or digit, transition.
  2. SIGNED - we just read a '+' or '-'. The only valid next character is a digit. Anything else means we stop.
  3. DIGIT - we are collecting digits. Keep going as long as we see digits. Anything else triggers a stop.
  4. STOP - we are done. Return whatever we have accumulated so far.

The transitions look like this:

Current StateSpaceSign (+/-)Digit (0-9)Anything else
STARTSTARTSIGNEDDIGITSTOP
SIGNEDSTOPSTOPDIGITSTOP
DIGITSTOPSTOPDIGITSTOP
STOPSTOPSTOPSTOPSTOP

That is the entire logic. No nested ifs. No special casing. You just look up the current state and the character type, and the table tells you what to do.

You do not need to literally build a transition table in code (though you can). The important insight is recognizing that there are only four states and that each character type maps cleanly to a next state. Even a simple sequential approach follows this same pattern implicitly.

The sequential implementation

Most people do not literally code a state machine with a dictionary of transitions. They write it sequentially, handling each phase in order. This is totally fine for interviews and is actually what most interviewers expect.

def my_atoi(s):
    INT_MAX = 2**31 - 1
    INT_MIN = -(2**31)

    i = 0
    n = len(s)

    # Phase 1: skip whitespace
    while i < n and s[i] == ' ':
        i += 1

    # Phase 2: read optional sign
    sign = 1
    if i < n and s[i] in ('+', '-'):
        sign = -1 if s[i] == '-' else 1
        i += 1

    # Phase 3: read digits
    result = 0
    while i < n and s[i].isdigit():
        digit = int(s[i])

        # Overflow check before multiplying
        if result > (INT_MAX - digit) // 10:
            return INT_MAX if sign == 1 else INT_MIN

        result = result * 10 + digit
        i += 1

    return sign * result

That is the full solution. Walk through the string once, left to right, processing each phase exactly once. The index i acts as our position in both the string and the state machine.

Visual walkthrough

Let's trace through the input " -42abc" step by step. The pointer i moves left to right through the string, and the result accumulates as we go.

Step 1: Skip leading whitespace.

result = 0
-42abci

i=0 and i=1 are spaces. Skip them both. i advances to 2.

Step 2: Read the sign character.

sign = -1
-42abci

chars[2] = '-'. Record sign = -1. Advance i to 3.

Step 3: Read first digit '4'.

result = 4
-42abci

chars[3] = '4'. result = 0 * 10 + 4 = 4. Advance i.

Step 4: Read second digit '2'.

result = 42
-42abci

chars[4] = '2'. result = 4 * 10 + 2 = 42. Advance i.

Step 5: Hit non-digit 'a'. Stop.

return -42
-42abci

chars[5] = 'a'. Not a digit. Stop reading. Return sign * result = -1 * 42 = -42.

Notice how each phase is handled in order: whitespace, then sign, then digits. As soon as we hit 'a' (a non-digit), we stop immediately. We never even look at 'b' or 'c'.

Overflow handling: the tricky part

The overflow check is where most people get stuck. You need to detect that adding the next digit would push the result past 2**31 - 1 (which is 2147483647) before it actually overflows.

The key line is:

if result > (INT_MAX - digit) // 10:
    return INT_MAX if sign == 1 else INT_MIN

Why this works: you want to check if result * 10 + digit > INT_MAX. Rearranging, that is result > (INT_MAX - digit) / 10. By doing integer division on the right side, you avoid any overflow in the check itself.

Let's say result = 214748365 and digit = 0. Then (2147483647 - 0) // 10 = 214748364. Since 214748365 > 214748364, we know the next multiplication would overflow, so we clamp immediately.

A common mistake is checking after the multiplication. If you compute result * 10 + digit first and then compare, you have already overflowed (in languages like C++ or Java). Python handles big integers natively, but the problem specifically asks you to clamp to 32-bit range, so you still need the pre-check to return the correct clamped value.

The explicit state machine version

If you want to show off clean design (or if the interviewer specifically asks for a state machine), here is the explicit version:

def my_atoi(s):
    INT_MAX = 2**31 - 1
    INT_MIN = -(2**31)

    state = "start"
    sign = 1
    result = 0

    transitions = {
        "start":  {"space": "start",  "sign": "signed", "digit": "digit", "other": "stop"},
        "signed": {"space": "stop",   "sign": "stop",   "digit": "digit", "other": "stop"},
        "digit":  {"space": "stop",   "sign": "stop",   "digit": "digit", "other": "stop"},
        "stop":   {"space": "stop",   "sign": "stop",   "digit": "stop",  "other": "stop"},
    }

    def char_type(c):
        if c == ' ':
            return "space"
        if c in ('+', '-'):
            return "sign"
        if c.isdigit():
            return "digit"
        return "other"

    for c in s:
        state = transitions[state][char_type(c)]

        if state == "signed":
            sign = -1 if c == '-' else 1
        elif state == "digit":
            digit = int(c)
            if result > (INT_MAX - digit) // 10:
                return INT_MAX if sign == 1 else INT_MIN
            result = result * 10 + digit
        elif state == "stop":
            break

    return sign * result

Both versions produce identical results. The explicit state machine is more extensible (add a new state by adding a row to the table), but the sequential version is more concise and what you will typically write under time pressure.

Edge cases worth knowing

Empty string. No characters to read. result stays 0. Return 0.

Only whitespace. " " skips all spaces, finds no sign or digits. Return 0.

Sign with no digits. "-" or "+". We read the sign, but the digit loop never executes because the next character (if any) is not a digit. Return 0.

Leading zeros. "0000042" works correctly. 0 * 10 + 0 = 0, repeated, then 0 * 10 + 4 = 4, then 4 * 10 + 2 = 42.

Overflow positive. "2147483648" is one past INT_MAX. The overflow check catches it and returns 2147483647.

Overflow negative. "-2147483649" overflows in the negative direction. Clamped to -2147483648.

Multiple signs. "+-12" reads '+' as the sign, then sees '-' which is not a digit, so it stops. Return 0.

Complexity analysis

Time: O(n). We walk through the string once, left to right. Each character is examined exactly once.

Space: O(1). A few integer variables. No auxiliary data structures.

ApproachTimeSpace
Sequential parsingO(n)O(1)
Explicit state machineO(n)O(1)

Both approaches are equivalent in complexity. The state machine version has slightly more constant overhead from the dictionary lookups, but it is negligible.

The building blocks

String to Integer (atoi) is built from two fundamental bricks that appear across many string and number manipulation problems.

Sequential state parsing

The idea of walking through a string and transitioning between phases (whitespace, sign, digits, stop) is a pattern you will see again and again. Parsing problems almost always boil down to: what state am I in, and what does this character mean in that state?

Other problems that use sequential state parsing:

  • Valid Number (LeetCode 65) uses the same state machine idea but with more states for decimals and exponents.
  • Decode String (LeetCode 394) parses digits to build a repeat count, then processes bracket contents.
  • Basic Calculator (LeetCode 224) reads signs, digits, and parentheses in a stateful pass.

Pre-multiplication overflow check

The technique of checking result > (INT_MAX - digit) // 10 before multiplying is a reusable trick for any problem where you build a number digit by digit and need to detect overflow before it happens.

Other problems that use this pattern:

  • Reverse Integer (LeetCode 7) reverses digits and must detect overflow.
  • Multiply Strings (LeetCode 43) builds results digit by digit.

If you can write the atoi solution from scratch, including the whitespace skip, sign handling, digit accumulation, and overflow check, you have both building blocks locked in. The sequential state parsing pattern is the same skeleton used in much harder parsing problems. The overflow check pattern transfers directly to Reverse Integer.

Common mistakes

Forgetting the overflow check. Python will not crash on big integers, but the problem requires clamping. If you skip the check, you return the mathematically correct but out-of-range value.

Checking overflow after multiplication. In Python this is functionally harmless, but it shows the interviewer you do not understand the pattern. In C++ or Java, this is an actual bug.

Handling sign inside the digit loop. The sign should be read exactly once, before the digit loop starts. If you process it inside the digit loop, strings like "12-34" will be parsed incorrectly.

Not stopping at non-digit characters. Some people try to skip non-digits and keep looking for more digits. The problem says to stop at the first non-digit after the digit sequence.

Treating the sign as a digit. '+' and '-' are not digits. If you use s[i].isdigit() in the sign-reading phase, you will skip the sign entirely.

Reading about it is not enough

String to Integer (atoi) is a medium-difficulty problem, but the actual algorithm is straightforward. The difficulty comes from the edge cases: overflow, missing digits, multiple signs, whitespace-only input. These are the details that evaporate from memory two weeks after you read the solution.

The two building blocks here (sequential state parsing and pre-multiplication overflow check) are small. They take a few minutes each to drill. But once they are in your long-term memory, you can reassemble the full atoi solution from scratch. And when you hit Valid Number or Reverse Integer, you are not starting from zero. You are combining bricks you already own.

Related posts