Range Sum Query 2D Immutable: 2D Prefix Sums Explained
You are given a 2D matrix of integers, and you need to answer many queries asking for the sum of elements inside a rectangular subregion. Each query should run in constant time. This is LeetCode #304: Range Sum Query 2D - Immutable, a medium problem that introduces one of the most useful techniques in competitive programming and interviews: 2D prefix sums with inclusion-exclusion.
The problem
You are given an m x n matrix of integers. Implement a class NumMatrix that supports:
NumMatrix(matrix): initializes the object with the integer matrix.sumRegion(row1, col1, row2, col2): returns the sum of all elements inside the rectangle defined by its upper-left corner(row1, col1)and lower-right corner(row2, col2), inclusive.
matrix = [
[3, 0, 1, 4, 2],
[5, 6, 3, 2, 1],
[1, 2, 0, 1, 5],
[4, 1, 0, 1, 7],
[1, 0, 3, 0, 5]
]
sumRegion(2, 1, 4, 3) # 2+0+1 + 1+0+1 + 0+3+0 = 8
The query asks for the sum of all values in rows 2 through 4, columns 1 through 3. If you add them up manually, the answer is 8. The challenge is doing this efficiently when there are thousands of queries on the same matrix.
The brute force
The simplest approach is to iterate over every cell in the rectangle for each query and sum them up. For a query spanning r rows and c columns, that costs O(r * c) per call. If the matrix is large and there are many queries, this adds up fast.
In an interview, always mention the brute force first. It shows you understand the problem before optimizing. For this problem, say: "I could loop over the rectangle each time, but that is O(m * n) per query. I want O(1) per query after some precomputation."
The key insight: 2D prefix sums and inclusion-exclusion
You have probably seen 1D prefix sums before: build an array where P[i] is the sum of all elements from index 0 to i - 1, then compute any range sum [l, r] as P[r + 1] - P[l] in O(1).
The 2D version extends this idea to a matrix. Define P[i][j] as the sum of all elements in the submatrix from (0, 0) to (i - 1, j - 1). To build it, use the same inclusion-exclusion principle but in two dimensions:
P[i][j] = matrix[i-1][j-1] + P[i-1][j] + P[i][j-1] - P[i-1][j-1]
You add the value from above, add the value from the left, then subtract the overlap (the top-left corner that got counted twice). This is inclusion-exclusion applied during construction.
To answer a query sumRegion(r1, c1, r2, c2), you apply inclusion-exclusion again:
sum = P[r2+1][c2+1] - P[r1][c2+1] - P[r2+1][c1] + P[r1][c1]
Start with the full rectangle from (0, 0) to (r2, c2). Subtract the rows above the query region. Subtract the columns to the left of the query region. But now you have subtracted the top-left corner twice, so add it back once.
The extra row and column of zeros in the prefix sum matrix eliminate all boundary checks. Without them, you would need special cases for when r1 = 0 or c1 = 0. The padding makes the formula universal.
Step-by-step walkthrough
Let's build the prefix sum matrix for our example and then answer the query sumRegion(2, 1, 4, 3).
Step 1: The original matrix
This is the input. We want O(1) rectangular region sums.
Step 2: Create a (rows+1) x (cols+1) prefix sum matrix, filled with zeros
The extra row and column of zeros handle boundary cases cleanly. No special-case code needed.
Step 3: Fill the prefix sum row by row
P[i][j] = matrix[i-1][j-1] + P[i-1][j] + P[i][j-1] - P[i-1][j-1]. This is inclusion-exclusion in reverse: add top, add left, subtract the overlap.
Step 4: Query sumRegion(2, 1, 4, 3) using inclusion-exclusion
P[5][4] = 38 (full rectangle)
- P[2][4] = 24 (top rows)
- P[5][1] = 14 (left columns)
+ P[2][1] = 8 (add back overlap)
38 - 24 - 14 + 8 = 8
P[5][4] - P[2][4] - P[5][1] + P[2][1] = 38 - 24 - 14 + 8 = 8. Four lookups, constant time.
The key observations:
- Building the prefix sum matrix takes O(m * n), visiting each cell once.
- Answering any rectangular query takes exactly four lookups and three arithmetic operations. That is O(1) regardless of the rectangle size.
- The extra row and column of zeros at index 0 mean we never need to check whether we are at a boundary.
The code
class NumMatrix:
def __init__(self, matrix):
m, n = len(matrix), len(matrix[0])
self.P = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
self.P[i][j] = (
matrix[i - 1][j - 1]
+ self.P[i - 1][j]
+ self.P[i][j - 1]
- self.P[i - 1][j - 1]
)
def sumRegion(self, row1, col1, row2, col2):
return (
self.P[row2 + 1][col2 + 1]
- self.P[row1][col2 + 1]
- self.P[row2 + 1][col1]
+ self.P[row1][col1]
)
-
Constructor: Create a
(m+1) x (n+1)prefix sum matrixPinitialized to zeros. The extra row and column handle the boundary. Then fill it row by row using the 2D prefix sum recurrence: current cell value plus the prefix from above, plus the prefix from the left, minus the overlap. -
sumRegion: Apply the inclusion-exclusion formula.
P[row2+1][col2+1]gives the sum of the entire rectangle from(0, 0)to(row2, col2). Subtract the part above the query (P[row1][col2+1]), subtract the part to the left (P[row2+1][col1]), and add back the top-left corner that was subtracted twice (P[row1][col1]). -
Both the constructor loop and the query formula use 1-indexed positions in
Pthat map to 0-indexed positions in the original matrix. This offset-by-one is what makes the zero-padded approach work cleanly.
Complexity analysis
Time: O(m * n) for precomputation during initialization. O(1) per sumRegion query.
Space: O(m * n) for the prefix sum matrix.
| Approach | Precompute | Query | Space |
|---|---|---|---|
| Brute force | O(1) | O(m * n) | O(1) |
| 2D prefix sums | O(m * n) | O(1) | O(m * n) |
The trade-off is clear: spend O(m * n) space and precomputation time once, then answer unlimited queries in constant time. When the number of queries is large (which the problem guarantees), this is a massive win.
Common mistakes
-
Off-by-one errors in the prefix sum indices. The prefix matrix has dimensions
(m+1) x (n+1), notm x n. When fillingP[i][j], the corresponding matrix element ismatrix[i-1][j-1]. When querying,sumRegion(r1, c1, r2, c2)maps toP[r2+1][c2+1], notP[r2][c2]. Getting any of these offsets wrong produces silent wrong answers. -
Forgetting to subtract the overlap. The recurrence
P[i][j] = matrix[i-1][j-1] + P[i-1][j] + P[i][j-1] - P[i-1][j-1]requires the subtraction ofP[i-1][j-1]because the regions from above and from the left overlap. Leaving it out double-counts the top-left rectangle. -
Mixing up the query formula signs. The query is
P[r2+1][c2+1] - P[r1][c2+1] - P[r2+1][c1] + P[r1][c1]. The two subtractions remove the top strip and the left strip, and the addition corrects for the corner that was subtracted twice. Swapping the signs or the indices is the most common bug. -
Trying to modify the matrix after initialization. This problem specifies that the matrix is immutable. If the matrix could change, you would need a different data structure like a 2D Binary Indexed Tree (Fenwick Tree). Make sure you confirm with the interviewer whether updates are required.
The most dangerous bug is getting the prefix sum indices off by one. Draw a small 2x2 example on paper and manually verify that your formula gives the correct answer for all four possible single-cell queries before coding the full solution.
The building blocks
2D prefix sum with inclusion-exclusion
def build_2d_prefix(matrix):
m, n = len(matrix), len(matrix[0])
P = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
P[i][j] = (
matrix[i - 1][j - 1]
+ P[i - 1][j]
+ P[i][j - 1]
- P[i - 1][j - 1]
)
return P
def query_2d(P, r1, c1, r2, c2):
return (
P[r2 + 1][c2 + 1]
- P[r1][c2 + 1]
- P[r2 + 1][c1]
+ P[r1][c1]
)
This pattern is a direct extension of 1D prefix sums. Once you internalize the inclusion-exclusion formula, it applies to any problem that asks for rectangular region sums on a static matrix. Problems like "count the number of ones in a submatrix" or "find the submatrix with the maximum sum" build directly on this technique.
Think of 2D prefix sums as the matrix equivalent of 1D prefix sums. If you can solve Range Sum Query (1D) in your sleep, the 2D version is just applying the same idea twice, once for rows and once for columns, with an overlap correction.
Edge cases
- Single cell query:
sumRegion(r, c, r, c)should returnmatrix[r][c]. The formula handles this correctly without any special case. - Full matrix query:
sumRegion(0, 0, m-1, n-1)returns the total sum. This is justP[m][n]. - Single row or single column: the formula degenerates to a 1D prefix sum query, but the 2D formula still works.
- Matrix with negative values: the prefix sum approach works with any integers. Negative values do not break inclusion-exclusion.
- 1x1 matrix: the prefix sum matrix is 2x2 with one meaningful entry. Everything works as expected.
From understanding to recall
You now understand how 2D prefix sums work and why inclusion-exclusion gives O(1) queries. But there is a gap between understanding the diagram and writing the code from scratch in an interview. The index offsets alone are enough to trip you up under pressure.
The way to close that gap is deliberate practice. Write the prefix sum construction and query formula from memory. Check against the correct version. Repeat at increasing intervals. After a few reps, the formula P[r2+1][c2+1] - P[r1][c2+1] - P[r2+1][c1] + P[r1][c1] becomes muscle memory, and you stop second-guessing whether it is r2+1 or r2.
Related posts
- Maximum Subarray - The 1D version of prefix sum optimization
- Maximal Square - Another 2D DP problem on matrices
- Set Matrix Zeroes - Matrix manipulation requiring careful state tracking
2D prefix sums are one of those techniques that feel tricky the first time and obvious the tenth time. The only way to get from one to the other is repetition.