Probability of Two Boxes Having the Same Number of Distinct Balls: Combinatorial Backtracking

May 31, 20267 min read

leetcodeproblemhardmathdynamic-programmingbacktracking

You are given 2n balls of k distinct colors. The array balls has length k, where balls[i] is the number of balls of color i. You must distribute all the balls into two boxes so that each box contains exactly n balls. The distribution is done uniformly at random over all valid ways to split the balls. Return the probability that both boxes contain the same number of distinct colors.

This is LeetCode 1467: Probability of a Two Boxes Having the Same Number of Distinct Balls.

With balls = [1, 1], every way to split 2 balls into two boxes of size 1 gives each box exactly 1 distinct color. The probability is 1.0.

Why this problem matters

This problem is a masterclass in combining combinatorics with backtracking. On the surface, it looks like a pure probability question. But underneath, you need to enumerate all ways to partition balls by color, count arrangements using multinomial coefficients, and filter for the equal-distinct-count constraint. Each of those pieces is a building block that shows up in many other problems.

The problem also teaches you an important lesson about counting. You cannot simply enumerate raw distributions and count them equally, because different splits correspond to different numbers of distinguishable arrangements. A split that gives 2 of 3 red balls to Box 1 has more arrangements than a split that gives 0 of 3. The multinomial coefficients capture this weighting, and understanding why they matter is the core challenge.

Finally, this is one of the few LeetCode problems where the answer is a floating-point probability rather than an integer count. That forces you to think carefully about what "total" and "favorable" mean in a combinatorial context, and it connects algorithmic problem-solving to discrete probability.

The key insight

The key insight is to decompose the problem color by color. For each color i with balls[i] balls, you choose how many go to Box 1 (call it j, ranging from 0 to balls[i]), and the remaining balls[i] - j go to Box 2. You backtrack over all colors, building up a split one color at a time.

At each leaf of the backtracking tree, you have a complete assignment: you know exactly how many balls of each color are in each box. You check two things. First, does Box 1 have exactly n balls total? If not, skip this split. Second, do both boxes have the same number of distinct colors? A color is "present" in a box if at least one ball of that color is assigned to it.

To count the number of distinct arrangements that correspond to a given split, you multiply binomial coefficients: for each color i, the number of ways to choose which j of the balls[i] balls go to Box 1 is C(balls[i], j). The product of these over all colors gives the total arrangements for this split. The total number of arrangements across all valid (equal-size) splits is C(2n, n). The probability is the sum of arrangement counts for favorable splits divided by C(2n, n).

The solution

from math import comb

def get_probability(balls):
    n = sum(balls) // 2
    k = len(balls)
    total_ways = comb(sum(balls), n)

    favorable = 0

    def backtrack(i, box1_count, box2_count, box1_distinct, box2_distinct, ways):
        nonlocal favorable
        if box1_count > n or box2_count > n:
            return
        if i == k:
            if box1_count == n and box1_distinct == box2_distinct:
                favorable += ways
            return
        for j in range(balls[i] + 1):
            d1 = 1 if j > 0 else 0
            d2 = 1 if balls[i] - j > 0 else 0
            backtrack(
                i + 1,
                box1_count + j,
                box2_count + balls[i] - j,
                box1_distinct + d1,
                box2_distinct + d2,
                ways * comb(balls[i], j),
            )

    backtrack(0, 0, 0, 0, 0, 1)
    return favorable / total_ways

The function processes one color at a time. For color i, it tries giving j balls to Box 1 (from 0 up to balls[i]), with the rest going to Box 2. The variable ways accumulates the product of binomial coefficients along the path. At a leaf node (when i == k), if Box 1 has exactly n balls and both boxes have the same distinct color count, the current ways value gets added to favorable.

The early return when box1_count > n or box2_count > n is a critical pruning step. It cuts off branches where one box is already overfull, which prevents exploring many useless subtrees.

The pruning check if box1_count > n or box2_count > n: return can dramatically reduce the search space. Without it, you would explore every combination of splits and only filter at the leaves. With it, you prune entire subtrees the moment a box exceeds its capacity. For inputs like balls = [6, 6, 6, 6, 6, 6], this pruning is the difference between seconds and minutes.

Practice backtracking patterns with spaced repetition

Visual walkthrough

The steps below show how the algorithm processes balls = [2, 1, 1], which has 4 total balls and n = 2 balls per box.

Step 1Enumerate splits per color

Step 2Backtrack over all distributions

Step 3Filter by equal size and count distinct colors

Step 4Compute probability with multinomial coefficients

The backtracking tree has 3 * 2 * 2 = 12 leaves, one for each combination of per-color splits. Of those, only the leaves where Box 1 has exactly 2 balls are valid. Among the valid leaves, we check which ones have the same number of distinct colors in both boxes, weight them by their multinomial coefficients, and sum them up.

Complexity analysis

Approach	Time	Space
Backtracking	O(prod(balls[i]+1))	O(k)

Time complexity is O(prod(balls[i] + 1)) in the worst case. For each color i, the algorithm tries balls[i] + 1 choices, so the total number of leaves is the product of (balls[i] + 1) over all colors. In practice, the pruning on box sizes eliminates a large fraction of these leaves. The constraint sum(balls) <= 48 and k <= 8 means the worst case is bounded and manageable.

Space complexity is O(k) for the recursion stack. The backtracking goes at most k levels deep (one per color), and each level stores a constant number of variables. No auxiliary data structures are needed beyond the input.

The building blocks

1. Multinomial coefficients via binomial products

ways * comb(balls[i], j)

Each split of balls[i] balls into j for Box 1 and balls[i] - j for Box 2 has C(balls[i], j) distinguishable arrangements. The total arrangements for a complete split is the product of these binomial coefficients across all colors. This is the multinomial coefficient in disguise: it counts the number of ways to assign labeled balls to boxes, given a fixed per-color distribution.

2. Backtracking with pruning

def backtrack(i, box1_count, box2_count, box1_distinct, box2_distinct, ways):
    if box1_count > n or box2_count > n:
        return
    if i == k:
        if box1_count == n and box1_distinct == box2_distinct:
            favorable += ways
        return
    for j in range(balls[i] + 1):
        backtrack(i + 1, ...)

The backtracking skeleton is familiar: iterate over choices, recurse, and check the goal at the base case. The twist is the pruning on box1_count and box2_count. You carry forward partial sums so you can bail out early when a box is guaranteed to overflow. This transforms a potentially expensive enumeration into a tractable one.

Drill backtracking building blocks

Edge cases

All balls are the same color. If k = 1, both boxes will have exactly 1 distinct color (the same one). The probability is always 1.0.
One ball per color. If every balls[i] = 1, then each color goes entirely to one box. You need to count the number of ways to split k colors into two groups of size n = k/2 such that both groups have the same size (which they always do here). The answer is always 1.0 in this scenario.
A color has zero balls. The constraints guarantee balls[i] >= 1, so this does not arise. But if it did, the color would contribute 0 distinct colors to both boxes and would not affect the result.
Large counts for a single color. When one color dominates (for example, balls = [6, 1, 1]), most splits will put the bulk of that color in one box. The pruning ensures you do not waste time on impossible splits.

From understanding to recall

The hardest part of this problem to remember is not the backtracking itself. It is the way you weight each split by its multinomial coefficient. Under pressure, it is tempting to treat all splits as equally likely, but that gives the wrong answer. Each split corresponds to a different number of distinguishable permutations, and you must account for that.

Practice the three-phase construction. First, compute the total ways as C(2n, n). Second, write the backtracking loop that tries 0..balls[i] balls of color i for Box 1. Third, multiply comb(balls[i], j) into the running product at each level. If you can reconstruct those three pieces from memory, the rest of the solution falls into place.

The connection between multinomial coefficients and backtracking is worth drilling on its own. Try applying it to problems where you count arrangements of colored objects, such as distributing candy or counting distinct permutations of a multiset. Once the mental model of "choose per group, multiply binomial coefficients" becomes automatic, this problem stops being hard.

Permutations - Backtracking to enumerate all orderings, the foundation for understanding arrangement counting
Combination Sum - Exploring all ways to split a budget across choices, a closely related backtracking structure
Count All Valid Pickup and Delivery Options - Combinatorial counting with constraints, another problem where multinomial reasoning is essential

This problem is a beautiful synthesis of combinatorics and search. The backtracking gives you structure, the multinomial coefficients give you accuracy, and the pruning gives you speed. Once you see how those three pieces fit together, you have a template that works for a whole family of probability and counting problems.

Try CodeBricks free