Stone Game II: Minimax Dynamic Programming

May 27, 20266 min read

leetcodeproblemmediumarraysdynamic-programming

Alice and Bob play a game with piles of stones arranged in a row. On each turn, the current player takes X piles from the front of the remaining row, where 1 X 2*M. After taking X piles, M updates to max(M, X). Alice goes first with M = 1. Both players play optimally. Return the maximum number of stones Alice can collect.

This is LeetCode 1140: Stone Game II, and it combines two patterns that show up across game theory DP: suffix sum precomputation and minimax recursion over a changing parameter. Unlike the original Stone Game where players pick from either end, here the number of piles you can take grows as the game progresses.

Stone Game II with piles = [2, 7, 9, 4, 4]. Alice goes first with M=1 and can take 1 or 2 piles from the front. The suffix sums help compute the remaining stones at each position.

Why greedy fails

Your first thought might be: take as many piles as possible each turn, or always take the piles with the most stones. But greedy breaks quickly. Consider piles = [2, 7, 9, 4, 4]. If Alice greedily takes 2 piles (the maximum allowed with M=1) to grab 2 + 7 = 9 stones, Bob can then grab 9 with X=1, and Alice finishes with [4, 4]. Alice gets 17, Bob gets 9. But what if Alice only takes 1 pile? Then M stays at 1, and Bob can only take 1 or 2 piles from [7, 9, 4, 4]. The cascading effect of M means you cannot evaluate any single move without considering the entire future game tree.

The key insight: suffix sums and memoized recursion

The game state is fully described by two things: the current index i (which pile we are up to) and the current value of M. At state (i, M), the current player can take X piles for any X from 1 to 2*M.

Here is the critical observation. If you know the total stones remaining from index i onward (the suffix sum), and you know how many stones the opponent will collect from the remaining state, then the current player's stones equal the suffix sum minus the opponent's stones.

Define dp(i, M) as the maximum stones the current player can collect starting from index i with the given M. The suffix sum from index i is suffix[i]. When the current player takes X piles:

They consume piles i through i + X - 1.
The opponent then faces state (i + X, max(M, X)) and plays optimally, collecting dp(i + X, max(M, X)) stones.
The current player's total from this point is suffix[i] - dp(i + X, max(M, X)).

The current player tries all valid X values and picks the best:

dp(i, M) = max over X in 1..2*M of (suffix[i] - dp(i + X, max(M, X)))

Base case: when i + 2*M >= n, the current player can take all remaining piles. dp(i, M) = suffix[i].

The answer is dp(0, 1), which gives Alice's maximum stones when she starts at index 0 with M = 1.

Practice game theory DP with spaced repetition

The solution

def stoneGameII(piles):
    n = len(piles)
    suffix = [0] * (n + 1)
    for i in range(n - 1, -1, -1):
        suffix[i] = suffix[i + 1] + piles[i]

    memo = {}

    def dp(i, m):
        if i >= n:
            return 0
        if i + 2 * m >= n:
            return suffix[i]
        if (i, m) in memo:
            return memo[(i, m)]

        best = 0
        for x in range(1, 2 * m + 1):
            best = max(best, suffix[i] - dp(i + x, max(m, x)))

        memo[(i, m)] = best
        return best

    return dp(0, 1)

Step-by-step walkthrough

Let's trace through piles = [2, 7, 9, 4, 4] to see how the recursion plays out.

Step 1: Alice's turn (M=1, can take 1 or 2 piles)

Option A: take 1 pile (X=1), score = 2, new M = max(1,1) = 1

Option B: take 2 piles (X=2), score = 9, new M = max(1,2) = 2

Alice evaluates both options. Taking 2 piles [2, 7] = 9 stones leads to a better outcome (total 17) than taking 1 pile [2] = 2 stones (total 10). Alice takes 2.

Step 2: Bob's turn (M=2, remaining = [9, 4, 4])

Bob takes 1 pile (X=1), score = 9, new M = max(2,1) = 2

Bob can take 1 to 4 piles. With 3 piles left and M=2, Bob can take up to all 3 (since 2*2=4 and there are only 3). Bob takes 1 pile [9] = 9 stones, leaving Alice with [4, 4].

Step 3: Alice's turn (M=2, remaining = [4, 4])

Alice takes 2 piles (X=2), score = 8, new M = max(2,2) = 2

Alice can take 1 to 4 piles. With 2 piles left, she takes both. Alice gains 4 + 4 = 8 stones.

Step 4: Why this is optimal

Alice: 9 + 8 = 17

Bob: 9 = 9

The DP explores all possible X values at every state (index, M). At each step, the current player picks the X that maximizes their own stones, knowing the opponent will also play optimally.

Answer: 17

Alice collects 17 stones with optimal play. The suffix sum at each index tells us the total remaining stones, and the DP picks the best X at each state (index, M).

The subtraction suffix[i] - dp(i + X, max(M, X)) is the heart of the solution. You take whatever is left (suffix[i]) and subtract what your opponent will optimally collect after your move. This is the same minimax principle from Predict the Winner, adapted for a variable number of takes per turn.

Complexity analysis

Metric	Value
Time	O(n^3)
Space	O(n^2)

Time: O(n^3). The state space is O(n^2) since i ranges from 0 to n and M can grow up to n. At each state, you try up to 2*M values of X, which in the worst case is O(n). So the total work is O(n^2) states times O(n) transitions per state.

Space: O(n^2). The memo table stores one value per (i, M) pair. The suffix sum array adds O(n), which is dominated by the memo table. Recursion depth is at most O(n).

Building blocks

This problem is built on two fundamental patterns:

Suffix sum precomputation. Computing suffix[i] = total stones from index i to the end lets you express each player's total as a simple subtraction: "everything remaining minus what the opponent collects." Without suffix sums, you would need to sum slices of the piles array inside the recursion, adding an unnecessary O(n) factor per state. This same precomputation shows up in range sum queries, subarray sum problems, and anywhere you need quick access to the total of a suffix or prefix.

Minimax with memoization. The recurrence suffix[i] - dp(i + X, max(M, X)) captures the adversarial structure. The current player maximizes, and the subtraction encodes the fact that the opponent also plays optimally. This is the same core idea behind Predict the Winner and Stone Game, extended here with the variable M parameter. Once you internalize "my score = total remaining - opponent's best," the entire family of two-player game DP problems uses the same skeleton.

Drill minimax DP with spaced repetition

Edge cases to watch for

Single pile: piles = [5]. Alice takes the only pile. dp(0, 1) = 5.
Two piles: piles = [1, 2]. With M=1, Alice can take 1 or 2 piles. She takes both for 3 stones.
All equal piles: e.g., piles = [3, 3, 3, 3]. The order of takes still matters because M changes. Alice takes 1 pile, M stays 1, Bob takes 1 or 2, and so on.
Large M early: if Alice takes X = 2 on her first move (when M=1), M jumps to 2. Bob can then take up to 4 piles. The exponential growth of 2*M means games end faster than you might expect.
Decreasing piles: piles = [10, 8, 6, 4, 2]. Greedy (take as many as possible) happens to work here, but only the DP is correct in general.

From understanding to recall

You have just walked through the suffix-sum-plus-minimax approach to Stone Game II. The recurrence makes sense now, but will you remember the state definition (i, M) and the transition suffix[i] - dp(i + X, max(M, X)) a week from now?

The tricky parts are the M update rule (max(M, X), not just X), the base case (i + 2*M >= n means take everything), and remembering to precompute suffix sums so each state is O(1) to evaluate. Spaced repetition helps you lock in these details until writing the solution from scratch feels automatic.

The suffix sum precomputation and minimax memoization patterns are two of roughly 60 reusable building blocks that cover hundreds of LeetCode problems. Drilling them individually with spaced repetition beats random problem grinding every time.

Stone Game - The original two-player game with interval DP, where players pick from either end
Predict the Winner - Same minimax DP pattern with the "score difference" trick
Burst Balloons - Interval DP that shares the "fill by increasing length" structure

Try CodeBricks free