Stone Game III: Optimal Play with Suffix DP

May 30, 20265 min read

leetcodeproblemharddynamic-programming

Alice and Bob play a game with a row of stones, where each stone has a value (possibly negative). They take turns, with Alice going first. On each turn, the current player takes 1, 2, or 3 stones from the front of the remaining row. Both players play optimally to maximize their own total score. Return "Alice" if Alice scores more, "Bob" if Bob scores more, or "Tie" if they tie.

This is LeetCode 1406: Stone Game III. Unlike Stone Game II where the number of piles you can take grows dynamically, here it is always 1, 2, or 3. That fixed branching factor is what makes the problem solvable in O(n) time. The challenge is recognizing the DP state and understanding how negative stone values create non-obvious optimal strategies.

Stone Game III with stoneValue = [1, 2, 3, 7]. Each player takes 1, 2, or 3 stones from the front on their turn. Both play optimally. The DP tracks score advantage at each position.

Why greedy fails

You might think: just always take the highest-value stones you can. But negative stone values break any greedy approach. Consider stoneValue = [1, 2, 3, 7]. If Alice greedily takes 3 stones to grab 1 + 2 + 3 = 6, Bob takes the remaining 7 and wins. If Alice takes just 1 stone (value 1), Bob faces [2, 3, 7] and the game plays out differently. The problem is that sometimes you want to take fewer stones to force your opponent into a bad position, or take more stones to "eat" negative values before your opponent can dodge them.

Greedy has no way to evaluate these downstream effects. You need a DP that considers every possible future game state.

The key insight: suffix DP with minimax

The game state at any point is just the index i of the next stone to be taken. That is it. No other parameter matters, because the take count is always 1, 2, or 3 regardless of what happened before.

Define dp[i] as the score advantage the current player can achieve starting at index i. "Advantage" means the current player's total minus the opponent's total, assuming both play optimally from index i onward.

When the current player takes k stones (where k is 1, 2, or 3):

They gain stoneValue[i] + stoneValue[i+1] + ... + stoneValue[i+k-1].
The opponent then faces position i + k and achieves advantage dp[i + k].
The current player's advantage is (sum of k stones) - dp[i + k], because the opponent's advantage flips to a disadvantage for the current player.

The recurrence is:

dp[i] = max over k=1,2,3 of (sum(stones[i..i+k-1]) - dp[i+k])

We fill the DP from right to left. The base case is dp[n] = 0 (no stones left, no advantage). At the end, dp[0] tells us Alice's advantage. If dp[0] > 0, Alice wins. If dp[0] is negative, Bob wins. If dp[0] == 0, it is a tie.

Practice game theory DP with spaced repetition

The solution

def stoneGameIII(stoneValue):
    n = len(stoneValue)
    dp = [0] * (n + 1)

    for i in range(n - 1, -1, -1):
        dp[i] = float('-inf')
        total = 0
        for k in range(1, 4):
            if i + k > n:
                break
            total += stoneValue[i + k - 1]
            dp[i] = max(dp[i], total - dp[i + k])

    if dp[0] > 0:
        return "Alice"
    elif dp[0] < 0:
        return "Bob"
    else:
        return "Tie"

Step-by-step walkthrough

Let's trace through stoneValue = [1, 2, 3, 7] to see how the DP fills from right to left.

Step 1: Compute dp[3] (last stone)

dp[3] = stones[3] - dp[4] = 7 - 0 = 7

Only one stone left. The current player takes it. Score advantage = 7 - 0 = 7.

Step 2: Compute dp[2] (starting at index 2)

Take 1: 3 - dp[3] = 3 - 7 = -4

Take 2: (3+7) - dp[4] = 10 - 0 = 10 ← best

Take 1 stone: advantage = 3 - 7 = -4. Take 2 stones: advantage = (3+7) - 0 = 10. Best = 10.

Step 3: Compute dp[1] (starting at index 1)

Take 1: 2 - dp[2] = 2 - 10 = -8

Take 2: (2+3) - dp[3] = 5 - 7 = -2

Take 3: (2+3+7) - dp[4] = 12 - 0 = 12 ← best

Three options. Taking all 3 remaining stones gives advantage 12, which is the best.

Step 4: Compute dp[0] (starting at index 0)

Take 1: 1 - dp[1] = 1 - 12 = -11

Take 2: (1+2) - dp[2] = 3 - 10 = -7

Take 3: (1+2+3) - dp[3] = 6 - 7 = -1 ← best

Alice starts here. All three options give a negative advantage. The best is -1 (take 3 stones).

dp[0] = -1< 0

Alice: 6

Bob: 7

dp[0] is negative, so Alice scores less than Bob. Return "Bob". Alice takes [1, 2, 3] = 6, Bob takes [7] = 7.

The subtraction total - dp[i + k] is the core of the minimax reasoning. When the current player takes k stones, their immediate gain is total. The opponent then plays from i + k and achieves advantage dp[i + k]. From the current player's perspective, that opponent advantage becomes a disadvantage, so we subtract it.

Since dp[0] = -1, Alice's advantage is negative. Bob wins. With optimal play, Alice takes stones [1, 2, 3] for 6 points, and Bob takes [7] for 7 points.

Complexity analysis

Metric	Value
Time	O(n)
Space	O(n)

Time: O(n). We compute dp[i] for each index from n-1 down to 0. At each index, we try at most 3 values of k, so each state takes O(1) work. Total: O(n) states times O(1) per state.

Space: O(n). The DP array stores one value per index. You can optimize to O(1) space by keeping only three previous DP values, since dp[i] depends only on dp[i+1], dp[i+2], and dp[i+3].

Building blocks

This problem is built on two fundamental patterns:

Suffix DP. Processing the array from right to left and defining states as "what happens from index i onward" is a classic suffix DP pattern. Unlike prefix DP where you build up from the start, suffix DP naturally captures "remaining game" semantics. The same idea shows up in problems like Best Time to Buy and Sell Stock (tracking best future price) and any problem where the decision at position i depends on the best outcome from i+1 onward.

Minimax with advantage tracking. Instead of tracking both players' scores separately, we define dp[i] as the advantage of the current player. The subtraction score - dp[i+k] encodes both "I gain" and "my opponent will play optimally against me." This reduces the two-player game to a single-player optimization. The same pattern appears in Predict the Winner and Stone Game, and once you internalize it, the entire family of two-player game problems becomes approachable.

Drill game theory DP with spaced repetition

Edge cases to watch for

All negative values: e.g., [-1, -2, -3]. Both players are forced to take negative stones. The player who takes fewer total negative value wins. dp[0] correctly handles this.
Single stone: [5]. Alice takes it. dp[0] = 5 > 0, return "Alice".
Exactly three stones: [1, 2, 3]. Alice can take all three in one move. She always does this if it is optimal, but the DP still checks all options.
Tie scenario: [1, 2, 3, -9, 5, 1]. Certain value distributions lead to equal scores. The DP returns 0 and we output "Tie".
Large negative followed by large positive: [-1000, 1000]. Alice is forced to take at least one stone. Taking 1 gives her -1000 and Bob gets 1000. Taking both gives her 0. The DP correctly picks taking both.

From understanding to recall

You have just traced through the suffix DP approach to Stone Game III. The recurrence dp[i] = max(sum - dp[i+k]) is simple, but will you remember three things a week from now? First, that dp[i] tracks advantage (not raw score). Second, that you fill from right to left. Third, that the subtraction handles the minimax implicitly.

The fixed branching factor of 3 is what makes this O(n) instead of the O(n^3) that Stone Game II requires. Recognizing which variant you are in, and how the time complexity changes, matters for interviews.

The suffix DP and minimax advantage patterns are two of roughly 60 reusable building blocks that cover hundreds of LeetCode problems. Drilling them individually with spaced repetition turns recognition into reflex.

Stone Game II - Variable take count with M parameter
Stone Game - The original interval DP game
Predict the Winner - Same minimax DP pattern

Try CodeBricks free