Skip to content
← All posts

Making A Large Island: Union-Find Grid Strategy

6 min read
leetcodeproblemhardarraysgraphmatrix

You are given an n x n binary grid. You may change at most one 0 to a 1. Return the size of the largest island after this operation. An island is a group of 1s connected horizontally or vertically.

This is LeetCode 827: Making A Large Island, and it is one of the best hard problems for learning how to combine DFS labeling with a greedy scan. The brute force approach of flipping every 0 and running a fresh BFS each time would be far too slow. The trick is preprocessing the grid so that each flip query can be answered in constant time.

The problem

Given an n x n binary grid of 0s and 1s, you may flip at most one 0 to 1. After the flip, return the size of the largest connected island. Two cells are connected if they share an edge (up, down, left, right). If the grid is already all 1s, the answer is n * n. Constraints: 1 <= n <= 500.

Before: two islands (size 3, size 4)1100100000110011flipAfter: flip (1,1) to 1, total = 81100110000110011
Flipping one 0 to 1 at position (1,1) bridges the blue island (size 3) and green island (size 4) into a single island of size 8.

The visual above shows a 4x4 grid with two separate islands. The blue island has 3 cells and the green island has 4 cells. By flipping the 0 at position (1,1) to a 1, both islands merge into one connected component of size 8. The goal is to find which single flip produces the biggest possible island.

The key insight

Instead of re-running DFS for every 0 cell, preprocess the grid in a single pass. Label each island with a unique ID and record its size. Then for each 0 cell, look at its four neighbors, collect the distinct island IDs, and sum their sizes plus one. This turns an O(n^4) brute force into an O(n^2) solution.

The two phases are:

  1. Label phase: DFS from every unvisited 1. Assign each connected component a unique integer ID (starting at 2 to avoid confusion with the original 0/1 values). Track the size of each component in a dictionary.
  2. Query phase: For each 0 cell, check the four neighbors. Gather the set of distinct island IDs touching this cell. Sum their sizes and add 1 for the flipped cell itself. Track the maximum.

The solution

def largest_island(grid: list[list[int]]) -> int:
    n = len(grid)
    island_size = {}
    island_id = 2

    def dfs(r, c, uid):
        if r < 0 or r >= n or c < 0 or c >= n:
            return 0
        if grid[r][c] != 1:
            return 0
        grid[r][c] = uid
        return 1 + dfs(r+1, c, uid) + dfs(r-1, c, uid) + dfs(r, c+1, uid) + dfs(r, c-1, uid)

    for r in range(n):
        for c in range(n):
            if grid[r][c] == 1:
                size = dfs(r, c, island_id)
                island_size[island_id] = size
                island_id += 1

    if not island_size:
        return 1

    max_island = max(island_size.values())

    for r in range(n):
        for c in range(n):
            if grid[r][c] == 0:
                seen = set()
                total = 1
                for dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]:
                    nr, nc = r + dr, c + dc
                    if 0 <= nr < n and 0 <= nc < n and grid[nr][nc] > 1:
                        uid = grid[nr][nc]
                        if uid not in seen:
                            seen.add(uid)
                            total += island_size[uid]
                max_island = max(max_island, total)

    return max_island

The DFS phase overwrites each 1 with the island's unique ID, so the original grid becomes the label map. This avoids needing a separate visited array. During the query phase, the seen set prevents double-counting when two neighbors belong to the same island. For example, if a 0 cell has two neighbors both labeled 3, we only count island 3's size once.

Visual walkthrough

Here is the algorithm broken into steps. Each card shows one phase of the process.

Step 1: Label each island with a unique ID

DFS from each unvisited 1. Assign island_id = 2, 3, 4, ... to each component.

We start IDs at 2 because the grid already uses 0 and 1. Each DFS call marks every cell in one connected component with the same ID.

Step 2: Record the size of each island

island_size = {2: 3, 3: 4}

While labeling, count cells per island. Store the mapping from island ID to its size in a dictionary.

Step 3: Iterate over every 0 cell

for r, c in all cells where grid[r][c] == 0:

Each 0 is a candidate to flip. We want to know: if we turn this 0 into a 1, how large is the resulting island?

Step 4: Check the 4 neighbors of the 0 cell

neighbors = set of island IDs adjacent to (r, c)

Look up, down, left, right. Collect the distinct island IDs touching this cell. Using a set avoids double-counting when two neighbors belong to the same island.

Step 5: Sum the sizes of distinct neighboring islands + 1

total = 1 + sum(island_size[id] for id in neighbors)

The +1 accounts for the flipped cell itself. We add the sizes of all unique islands this cell would connect.

Step 6: Track the maximum across all 0 cells

max_island = max(max_island, total)

Update the running maximum after checking each 0 cell. The answer is the largest total found.

Step 7: Handle the all-1s edge case

if no 0 exists: return n * n

If the entire grid is land, there is nothing to flip. The answer is simply the total number of cells.

The labeling phase runs in O(n^2) since each cell is visited at most once by DFS. The query phase also runs in O(n^2) because each 0 cell does constant work (checking at most 4 neighbors and looking up sizes in a dictionary). The entire algorithm is two passes over the grid.

Complexity analysis

AspectValueWhy
TimeO(n^2)One DFS pass to label all cells, one scan to check all 0 cells. Each cell visited at most twice.
SpaceO(n^2)The island_size dictionary holds at most O(n^2) entries. DFS recursion stack can go up to O(n^2) in the worst case.
Grid modificationIn-placeWe overwrite 1s with island IDs, avoiding a separate visited array.

The building blocks

This problem combines two patterns that appear frequently in grid problems.

1. Connected component labeling with DFS

The technique of assigning a unique ID to each connected region in a grid is the foundation of this solution. You scan every cell, and when you find an unvisited 1, you run DFS to flood-fill the entire component with a label. This is the same pattern used in Number of Islands, but extended to track sizes.

for r in range(n):
    for c in range(n):
        if grid[r][c] == 1:
            size = dfs(r, c, island_id)
            island_size[island_id] = size
            island_id += 1

The key detail: start IDs at 2 (not 0 or 1) so they do not collide with the grid's original values.

2. Neighbor aggregation with deduplication

For each candidate cell, you check its neighbors and aggregate information from distinct groups. The seen set ensures you do not double-count an island that touches the candidate from multiple directions. This neighbor-check-and-aggregate pattern shows up in many grid problems where you need to combine information from surrounding cells.

Edge cases

All 1s: If the grid contains no 0 cells, there is nothing to flip. The answer is n * n. The code handles this because max_island starts as the largest existing island, and the query loop simply never executes.

All 0s: If the grid is entirely water, flipping one cell gives an island of size 1. The code handles this because island_size will be empty (no DFS runs), and we return 1 before reaching the query phase.

Single cell grid: A 1x1 grid with value 1 returns 1. A 1x1 grid with value 0 returns 1 after flipping. Both cases work correctly with the existing logic.

Island touching itself around a 0: When a single island wraps around a 0 cell, two or more neighbors of that 0 share the same island ID. The seen set prevents counting that island's size multiple times. Without deduplication, the answer would be too large.

From understanding to recall

You have seen how labeling islands with unique IDs and recording their sizes lets you answer "what if I flip this zero?" in constant time per cell. The two-phase structure, label then query, is clean and elegant. But during an interview, you need to write it without hesitation.

The details that trip people up: starting IDs at 2, using a set to deduplicate neighbor islands, handling the all-1s case separately. Spaced repetition drills these specifics into muscle memory. After a few rounds of writing the DFS labeling loop and the neighbor aggregation from scratch, the whole solution flows out naturally.

Related posts

  • Number of Islands - The essential grid DFS problem that teaches the flood-fill traversal used in the labeling phase of this solution
  • Max Area of Island - Extends Number of Islands by tracking component sizes during DFS, which is exactly the preprocessing step here
  • Surrounded Regions - Another grid DFS problem that modifies cells in place, similar to how we overwrite 1s with island IDs

The Making A Large Island pattern combines island labeling with a smart query phase. CodeBricks breaks this into its component skills, DFS flood fill, component size tracking, and neighbor aggregation with deduplication, then drills each one independently with spaced repetition. When this problem appears in your interview, the two-phase approach will feel automatic.