Skip to content
← All posts

Minimize Malware Spread: Union-Find Component Analysis

6 min read
leetcodeproblemhardgraphunion-find

LeetCode Minimize Malware Spread (problem 924) asks you to think carefully about how infections propagate through a network. You are given an n x n adjacency matrix representing a graph, plus a list of initially infected nodes. You must remove exactly one node from the infected list so that the total number of infected nodes after the malware finishes spreading is minimized. If there is a tie, return the node with the smallest index.

The problem

You have a network of n nodes connected by edges (given as an adjacency matrix graph). Some nodes in the list initial start infected. Malware spreads from infected nodes to all reachable nodes through edges. You must pick one node from initial to "remove" (effectively immunize it). After removal, the malware spreads from the remaining infected nodes. Return the node whose removal results in the fewest total infected nodes.

Component A (size 4)Component B (size 2)C (size 1)0infected123saved!4infected56infected
Removing node 0 saves the entire component A (4 nodes). Node 0 is the only infected node in its component, so removing it prevents all spread there. Components B and C have their own infected nodes regardless.

The brute force approach

The naive strategy is to try removing each infected node one at a time, then simulate the spread (via BFS or DFS) from all remaining infected nodes, and count the total infections. Pick the removal that yields the smallest count.

This works, but it is expensive. For each node you remove, you need to run a full traversal of the graph. With m initially infected nodes and n total nodes, the brute force runs in O(m * n^2) time. When the graph is large and there are many infected nodes, this becomes impractical.

The key realization is that you do not need to simulate spread at all. The structure of connected components tells you everything.

The key insight

Malware spreads through entire connected components. If a component contains even one infected node, every node in that component will eventually be infected. So the question becomes: which infected node, when removed, prevents the most nodes from being reached?

The answer is only interesting for components that contain exactly one infected node. If a component has two or more infected nodes, removing just one of them does not help because the other infected node will still spread to the entire component. But if a component has exactly one infected node, removing that node saves the entire component.

So the algorithm is:

  1. Find all connected components (using Union-Find)
  2. For each component, count how many initially-infected nodes it contains
  3. Among components with exactly one infected node, find the one with the largest size
  4. Return that infected node (smallest index if there is a tie)

The core insight: only components with exactly one infected node can be "saved" by removing it. Among those, pick the largest component. Union-Find makes this efficient.

Step-by-step walkthrough

Step 1: Build connected components with Union-Find

root=0, size=4root=4, size=2root=6, size=10123456

Process every edge in the adjacency matrix. Union connected nodes. After all unions, nodes in the same component share a root.

Step 2: Count infected nodes per component

infected count: 1infected count: 1infected count: 10123456

initial = [0, 4, 6]. For each infected node, find its root and increment the count for that component.

Step 3: Identify components with exactly one infected node

1 infected = removable1 infected = removable1 infected = removable0123456

All three components have exactly one infected node, so removing any of them would save that entire component from infection.

Step 4: Choose the node whose removal saves the most nodes

size 4 = BESTsize 2size 1remove 00123456

Removing node 0 saves 4 nodes (component A). Removing node 4 saves 2. Removing node 6 saves 1. Answer: node 0.

The code

def minMalwareSpread(graph, initial):
    n = len(graph)
    parent = list(range(n))
    size = [1] * n

    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]
            x = parent[x]
        return x

    def union(a, b):
        ra, rb = find(a), find(b)
        if ra == rb:
            return
        if size[ra] < size[rb]:
            ra, rb = rb, ra
        parent[rb] = ra
        size[ra] += size[rb]

    for i in range(n):
        for j in range(i + 1, n):
            if graph[i][j]:
                union(i, j)

    infected_set = set(initial)
    comp_infected_count = {}
    for node in initial:
        root = find(node)
        comp_infected_count[root] = comp_infected_count.get(root, 0) + 1

    best_node = min(initial)
    best_saved = 0
    for node in initial:
        root = find(node)
        if comp_infected_count[root] == 1:
            if size[root] > best_saved or (size[root] == best_saved and node < best_node):
                best_saved = size[root]
                best_node = node

    return best_node

The solution has three phases. First, you build the Union-Find structure by iterating over the upper triangle of the adjacency matrix and unioning connected nodes. Second, you count how many infected nodes belong to each component by finding the root of each node in initial. Third, you scan the infected nodes looking for the one whose component has exactly one infected node and the largest size.

The find function uses path compression (pointing each node directly at its grandparent) to keep the tree flat. The union function uses union by size, always attaching the smaller tree under the larger one. Together these give nearly O(1) amortized operations.

If no component has exactly one infected node, removing any node from initial does not reduce the total infections. In that case, you return the smallest index in initial, which the initialization best_node = min(initial) handles automatically.

Complexity analysis

ApproachTimeSpace
Union-FindO(n^2 * alpha(n))O(n)
Brute force BFSO(m * n^2)O(n)

The dominant cost is iterating over the adjacency matrix to build the Union-Find structure, which takes O(n^2) iterations. Each union/find operation is O(alpha(n)), the inverse Ackermann function, which is effectively constant. The second pass over initial is O(m) where m is the number of initially infected nodes. Space is O(n) for the parent and size arrays.

The building blocks

Union-Find (Disjoint Set Union)

Union-Find is the backbone of this solution. It lets you group nodes into components and query component membership in near-constant time. The same structure appears in Number of Provinces, where you count how many distinct connected components exist, and in Redundant Connection, where you detect the edge that would create a cycle. Here, you use it not just to find components but also to track their sizes.

Component-level counting

After building the Union-Find, you perform a counting pass: for each infected node, increment a counter for its component root. This "count per group" pattern is common in Union-Find problems. In Accounts Merge, you group accounts by their root and then collect all emails belonging to each group. The idea is the same: use the root as a dictionary key to aggregate information about the component.

Edge cases

  • All infected nodes in the same component. Every node in initial maps to the same root. The component has more than one infected node, so no removal helps. Return min(initial).
  • Each infected node in its own isolated component. Every component has exactly one infected node. Pick the one with the largest component size, or smallest index on tie.
  • A single infected node. There is only one choice: remove that node. It saves its entire component.
  • Multiple components tied for largest size. When two infected nodes each have a component of the same size, return the node with the smaller index. The comparison node < best_node handles this.
  • Disconnected graph with isolated nodes. Isolated nodes form components of size 1. If an isolated node is infected, removing it saves exactly 1 node (itself).

From understanding to recall

The trick to remembering this solution is to think in terms of components, not individual nodes. Once you realize that malware spreads to an entire component, the problem reduces to: "which component can I save?" And the answer is always the largest component that has exactly one infected node inside it.

When you review this problem, focus on three mental steps: (1) build components, (2) count infected per component, (3) pick the best single-infected component. If you can reproduce that reasoning chain from memory, the code writes itself. The Union-Find is just the tool that makes step 1 efficient.

Practice writing the three-phase structure: union loop, counting loop, selection loop. Each phase is short and independent. Once you internalize that separation, you will not confuse this problem with other graph problems that require BFS or DFS simulation.

Related posts

  • Number of Provinces - The foundational Union-Find problem for counting connected components
  • Accounts Merge - Union-Find with component-level aggregation, the same counting pattern used here
  • Redundant Connection - Using Union-Find to detect when adding an edge would merge two components