Valid Sudoku: Hash Set Validation
You are given a 9x9 Sudoku board represented as a 2D array of characters. Each cell contains a digit from '1' to '9' or a '.' for empty. Determine whether the board is valid. A valid board means that each row, each column, and each of the nine 3x3 sub-boxes contains no duplicate digits. The board does not need to be solvable. Only the filled cells need to be checked.
That is LeetCode 36: Valid Sudoku. For example, the standard partially-filled board shown below is valid because no row, column, or box repeats a digit. But if you changed one cell so that a column contained two 8s, the board would be invalid.
Why this problem matters
Valid Sudoku is a clean test of set-based duplicate detection applied to structured data. Instead of checking a single list for duplicates, you are checking 27 groups simultaneously (9 rows, 9 columns, 9 boxes). The challenge is not algorithmic complexity. It is organizing your bookkeeping so that every cell is validated against exactly the right three groups.
The box indexing formula, (r // 3, c // 3), is the key detail that trips people up. Once you see how floor division maps any cell to its box, the rest of the solution falls into place.
The key insight
For each filled cell at position (r, c) with digit d, you need to verify three things:
dhas not appeared before in rowrdhas not appeared before in columncdhas not appeared before in the 3x3 box containing(r, c)
You can track all of this with sets. Maintain one set per row, one set per column, and one set per box. As you scan the board, check whether d is already in any of the three relevant sets. If it is, the board is invalid. If not, add d to all three sets and move on.
The only tricky part is identifying which box a cell belongs to. A 9x9 board has nine 3x3 boxes arranged in a 3x3 grid. The box containing cell (r, c) is at position (r // 3, c // 3). Row 0 through 2 all have r // 3 = 0. Row 3 through 5 all have r // 3 = 1. Same logic for columns. This maps every cell to one of nine box indices: (0,0), (0,1), (0,2), (1,0), ..., (2,2).
The solution
def is_valid_sudoku(board: list[list[str]]) -> bool:
rows = [set() for _ in range(9)]
cols = [set() for _ in range(9)]
boxes = [[set() for _ in range(3)] for _ in range(3)]
for r in range(9):
for c in range(9):
d = board[r][c]
if d == ".":
continue
if d in rows[r] or d in cols[c] or d in boxes[r // 3][c // 3]:
return False
rows[r].add(d)
cols[c].add(d)
boxes[r // 3][c // 3].add(d)
return True
Let's walk through the logic.
Setup: Create 9 row sets, 9 column sets, and a 3x3 grid of box sets. Each starts empty.
Scan: Iterate through every cell in the board. Skip empty cells (dots). For each filled cell, check whether its digit already exists in the corresponding row set, column set, or box set. If any check fails, return False immediately.
Record: If the digit passes all three checks, add it to all three sets. This ensures future cells in the same row, column, or box will detect a conflict if they share this digit.
Result: If you scan all 81 cells without finding a duplicate, the board is valid. Return True.
Visual walkthrough
Let's trace through the algorithm on a partially filled board. Watch how the sets grow as each cell is processed.
Step 1: Scan row 0. Collect filled digits into the row-0 set.
rows[0]: {5, 3, 7}
Row 0 contains 5, 3, and 7. No duplicates. Add each to the set for row 0.
Step 2: Scan column 1. Collect filled digits into the col-1 set.
cols[1]: {3, 9, 6}
Column 1 contains 3, 9, and 6. No duplicates. Each digit is checked against the col-1 set before being added.
Step 3: Scan box (0, 0). Rows 0-2, columns 0-2.
boxes[0,0]: {5, 3, 6, 9, 8}
The top-left 3x3 box has five filled cells: 5, 3, 6, 9, 8. No duplicates in this box.
Step 4: Box index formula. Cell (4, 4) belongs to box (4//3, 4//3) = box (1, 1).
boxes[1,1]: {6, 8, 3, 2}
The center box (rows 3-5, cols 3-5) is identified by floor-dividing the row and column by 3. Cell (4,4) is empty, so it is skipped.
Step 5: All 81 cells scanned. No duplicates in any row, column, or box.
Every filled digit passed all three checks. The board is valid. Return True.
The key thing to notice is that every filled cell is checked against exactly three sets before being added to them. The box indexing formula ensures that cells in the same 3x3 region share the same box set, regardless of which specific row or column they are in.
You can also encode all three checks into a single set by storing tuples like (r, d) for rows, (d, c) for columns, and (r // 3, c // 3, d) for boxes. This reduces the code to one set and three membership checks per cell. Both approaches have the same complexity.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O(1) |
| Space | O(1) |
Time: The board is always 9x9, so you scan exactly 81 cells. Each cell involves at most three O(1) set lookups and three O(1) set insertions. The total work is bounded by a constant, making it O(1).
Space: You maintain at most 27 sets, each holding at most 9 elements. That is at most 243 entries total. Since the board size is fixed, the space usage is O(1).
If the board were n x n instead of 9x9, the complexity would be O(n^2) time and O(n^2) space. But Sudoku boards are always 9x9, so both are constant.
The building blocks
This problem is built on two reusable pieces that CodeBricks drills independently:
Set-based duplicate detection
The core pattern of using a set to detect whether you have seen a value before:
seen = set()
for item in collection:
if item in seen:
return True
seen.add(item)
return False
This O(n) pattern replaces the O(n^2) brute force of comparing every pair. You will use it in Contains Duplicate, Two Sum (with a hash map variant), Longest Consecutive Sequence, and dozens of other problems. The idea is always the same: check membership first, then insert.
In Valid Sudoku, you apply this pattern 27 times in parallel, once for each row, column, and box. The structure changes, but the check-then-add logic is identical.
Box indexing with floor division
The formula (r // 3, c // 3) maps any cell in a 9x9 grid to its 3x3 box:
box_row = r // 3
box_col = c // 3
Floor division groups consecutive indices into buckets. Rows 0, 1, 2 all map to bucket 0. Rows 3, 4, 5 map to bucket 1. Rows 6, 7, 8 map to bucket 2. The same logic applies to columns. This gives you nine distinct (box_row, box_col) pairs, one for each sub-box.
This indexing trick generalizes beyond Sudoku. Anytime you need to partition a grid into fixed-size blocks, floor division by the block size gives you the block coordinates. You will see it in image processing (dividing an image into tiles), spatial hashing (mapping coordinates to grid cells), and any problem that groups elements by region.
Edge cases
- Empty board: All cells are dots. No filled cells means no duplicates. Return
True. - Single filled cell: Only one digit on the board. It cannot conflict with anything. Return
True. - Full valid board: All 81 cells filled with a valid Sudoku solution. Every set ends up with exactly the digits 1 through 9. Return
True. - Duplicate in a row: Two identical digits in the same row. Caught by the
d in rows[r]check. - Duplicate in a column: Two identical digits in the same column. Caught by the
d in cols[c]check. - Duplicate in a box only: Two identical digits in the same 3x3 box but different rows and columns. Only caught by the
d in boxes[r // 3][c // 3]check. This is the case people miss when they only validate rows and columns.
Common mistakes
1. Forgetting box validation. Checking only rows and columns misses duplicates within a 3x3 box. All three constraints must be checked.
2. Wrong box index formula. Using r % 3 instead of r // 3 gives you the position within a box, not the box index. Floor division groups cells into boxes. Modulo gives the offset within a box. Make sure you use //.
3. Not skipping empty cells. If you forget the if d == "." check, you will add "." to every set and immediately flag it as a duplicate in the second row (since row 0 already added ".").
4. Using integers instead of strings. The board contains characters ('5', not 5). If you convert to int, it works fine, but mixing types (comparing '5' to 5) will silently fail the membership check.
From understanding to recall
You have walked through the solution and understand why set-based validation works. The logic is clean: for each cell, check three sets and add to three sets. But under interview pressure, the details can slip. Which set do you index with r? Which with c? Is it r // 3 or r % 3?
Spaced repetition fixes this. You practice writing the three-set setup, the scan loop with the triple membership check, and the box indexing formula from scratch. Do it today, again in two days, again in five. After a few reps, you see "valid Sudoku" and the nine-row, nine-column, nine-box set structure flows out automatically.
Set-based duplicate detection and box indexing are two of roughly 60 reusable building blocks that cover hundreds of LeetCode problems. Learning these individually and drilling them with spaced repetition is far more effective than grinding random problems and hoping the patterns stick.
Related posts
- Set Matrix Zeroes - Another matrix problem where the challenge is organizing bookkeeping across rows and columns
- Spiral Matrix - Classic matrix traversal that tests your ability to manage boundaries and directions in a 2D grid
CodeBricks breaks Valid Sudoku into its set-based duplicate detection and box indexing building blocks, then drills them independently with spaced repetition. You type the triple-set setup, the scan loop with membership checks, and the floor-division formula from scratch until the pattern is automatic. When a validation or partitioning problem shows up in your interview, you do not think about it. You just write it.