Stone Game: Game Theory and Dynamic Programming

May 24, 20266 min read

leetcodeproblemmediummathdynamic-programming

Alex and Lee play a game with piles of stones. There is an even number of piles arranged in a row, and each pile has a positive number of stones. On each turn, a player takes the entire pile from either the left end or the right end of the row. Alex goes first. The player with the most stones at the end wins. Return true if Alex can win the game (both players play optimally).

This is LeetCode 877: Stone Game, and it sits at the intersection of game theory and dynamic programming. The twist? There is a one-line math proof that Alex always wins. But the DP approach is what you actually need to learn, because it generalizes to harder variants like Predict the Winner where the math shortcut disappears.

Piles = [5, 3, 4, 5]. Each turn, a player picks from the left or right end. Alex goes first and always wins with optimal play.

Why this problem matters

Stone Game is a gateway to interval DP and game theory minimax, two patterns that show up across a whole family of problems. Once you can think in terms of "what advantage can the current player squeeze out of a subarray?", you unlock problems like Predict the Winner, Burst Balloons, and Minimum Cost to Merge Stones.

The problem also teaches an important meta-lesson: sometimes the optimal approach for an interview is not the cleverest mathematical trick. The O(1) math proof is elegant, but interviewers often want to see the DP because it demonstrates a transferable skill. Know both.

The math insight: Alex always wins

Before we write any DP, consider this observation. The piles have even length. Number them by index: 0, 1, 2, ..., n-1. Split them into two groups:

Even-indexed piles: indices 0, 2, 4, ...
Odd-indexed piles: indices 1, 3, 5, ...

Because the total number of piles is even, one group must have a strictly larger sum than the other (all pile values are positive, so the total is odd). Alex, going first, can always choose to take every pile from whichever group is larger.

How? If Alex wants all even-indexed piles, they start by taking from the left (index 0). That forces Lee to choose between two odd-indexed piles. No matter what Lee picks, Alex's next choice will again be an even-indexed pile. Alex controls the parity.

Alex can always force a win by choosing all even-indexed or all odd-indexed piles. Since the two groups have different sums (all values are positive), Alex picks the group with the larger sum. This means the answer is always true.

That means the answer is always return True. One line, O(1) time, O(1) space. But this only works because the problem guarantees an even number of piles with all positive values. Change either constraint, and the trick breaks. The DP approach handles the general case.

The approach: interval DP

Define dp[i][j] as the maximum score advantage the current player can achieve when choosing from piles[i] through piles[j].

"Current player" means whoever's turn it is. Both players use the same optimal strategy, so the recurrence is symmetric. When it is your turn on piles[i..j], you have two choices:

Pick left (piles[i]): you gain piles[i], then your opponent faces piles[i+1..j] and plays optimally. Their best advantage from that subarray is dp[i+1][j]. Since their gain is your loss, your net advantage is piles[i] - dp[i+1][j].
Pick right (piles[j]): same logic. Your net advantage is piles[j] - dp[i][j-1].

You pick whichever gives you the larger advantage:

dp[i][j] = max(piles[i] - dp[i+1][j], piles[j] - dp[i][j-1])

Base case: dp[i][i] = piles[i] (one pile left, you take it).

If dp[0][n-1] > 0, the first player (Alex) wins.

The solution

def stoneGame(piles):
    n = len(piles)
    dp = [[0] * n for _ in range(n)]

    for i in range(n):
        dp[i][i] = piles[i]

    for length in range(2, n + 1):
        for i in range(n - length + 1):
            j = i + length - 1
            dp[i][j] = max(
                piles[i] - dp[i + 1][j],
                piles[j] - dp[i][j - 1],
            )

    return dp[0][n - 1] > 0

Practice interval DP with spaced repetition

Visual walkthrough

Let's trace the DP table for piles = [5, 3, 4, 5]. We fill diagonals from shorter subarrays to longer ones.

Base case: dp[i][i] = piles[i] (single pile, current player takes it)

When only one pile remains, the current player takes it all. dp[0][0]=5, dp[1][1]=3, dp[2][2]=4, dp[3][3]=5.

Length 2: dp[i][j] = max(piles[i] - dp[i+1][j], piles[j] - dp[i][j-1])

dp[0][1] = max(5 - dp[1][1], 3 - dp[0][0]) = max(5-3, 3-5) = 2. Take 5, opponent gets 3. Similarly dp[1][2] = 1 and dp[2][3] = 1.

Length 3: dp[0][2] = max(5 - dp[1][2], 4 - dp[0][1]) = max(4, 2) = 4

dp[0][2]: take piles[0]=5, opponent faces dp[1][2]=1, net = 4. Or take piles[2]=4, opponent faces dp[0][1]=2, net = 2. Best is 4. dp[1][3] = max(3-1, 5-1) = 4.

Full table: dp[0][3] = max(5 - dp[1][3], 5 - dp[0][2]) = max(1, 1) = 1

dp[0][3] = max(5 - 4, 5 - 4) = 1. Both picks give the same advantage. Since dp[0][3] > 0, Alex wins. The first player always wins when the pile count is even.

The table fills diagonal by diagonal: first the main diagonal (length 1), then the next diagonal up (length 2), and so on until we reach dp[0][3]. Each cell depends on the cell below it and the cell to its left, both of which sit on a shorter diagonal and are already computed.

Complexity analysis

Metric	Value
Time	O(n^2)
Space	O(n^2)

The DP table has n^2 / 2 cells (upper triangle), and each cell takes O(1) to compute. Total time: O(n^2). The table itself uses O(n^2) space.

For this specific problem (even-length piles, all positive), you can also solve it in O(1) time and O(1) space with return True. But the DP solution is the one worth drilling, because it applies to the general case where the math shortcut does not exist.

Building blocks

This problem is built on two fundamental patterns:

Interval DP. The state dp[i][j] represents a contiguous subarray, and you fill the table by increasing interval length. The same structure appears in Burst Balloons (LeetCode 312), where dp[i][j] is the max coins from popping balloons in a range, and in Minimum Cost to Merge Stones (LeetCode 1000). Once you internalize the "fill by diagonal length" loop structure, all interval DP problems share the same skeleton.

Game theory minimax. The "your gain minus opponent's best" recurrence is the minimax pattern. You maximize your advantage, and the subtraction handles the fact that your opponent is also playing optimally. This same idea appears in Predict the Winner (the generalized version of this problem that allows odd-length arrays and zeros) and in Can I Win (LeetCode 464).

Drill the interval DP pattern

Edge cases to watch for

Two piles: dp[0][1] = max(piles[0] - piles[1], piles[1] - piles[0]). Alex picks the bigger pile and wins.
All equal piles: e.g., [3, 3, 3, 3]. Alex and Lee tie in total stones, but the problem guarantees distinct totals (all positive, even count), so Alex still wins.
Large disparity at one end: e.g., [2, 1, 1, 100]. Alex takes 100 from the right. Greedy happens to work here, but the DP handles cases where greedy fails.
Greedy traps: [3, 7, 2, 3]. Greedy says take 3 from either end. But taking the left 3 gives Alex access to 7 later. The DP correctly navigates these multi-step dependencies.

From understanding to recall

You have just seen two approaches to Stone Game: the elegant math proof and the general-purpose interval DP. The math proof is satisfying, but the DP is what matters for your interview toolkit. Interval DP shows up in too many problems to skip.

The tricky part is the recurrence: piles[i] - dp[i+1][j] versus piles[j] - dp[i][j-1]. That subtraction encodes the minimax logic, and it is easy to mix up which index moves. Spaced repetition helps you lock in the table structure, the fill order (by diagonal length), and the sign convention until writing it from scratch feels automatic.

The interval DP and game theory minimax patterns are two of roughly 60 reusable building blocks that cover hundreds of LeetCode problems. Drilling them individually with spaced repetition beats random problem grinding every time.

House Robber - Another DP problem with optimal substructure
Predict the Winner - The generalized version of this game theory problem
Coin Change - Uses similar DP table construction

Try CodeBricks free