Minimize Malware Spread II: Graph Component Analysis

May 25, 20267 min read

leetcodeproblemhardgraphunion-find

LeetCode Minimize Malware Spread II (problem 928) is a harder variant of its predecessor. You are given an n x n adjacency matrix representing a graph and a list of initially infected nodes. Malware spreads from every infected node to all reachable nodes. You must choose one infected node to completely remove from the graph (deleting the node and all its edges). Return the node whose removal minimizes the final number of infected nodes. If there is a tie, return the smallest index.

The critical difference from problem 924: here you remove the node entirely, including its edges. This changes the problem fundamentally because removing a node can disconnect parts of the graph that were previously connected through it.

Build Union-Find on clean nodes only. Node 0 uniquely reaches component {1,2} (saves 2). Node 4 uniquely reaches {3} and {5} (saves 2 total). Tie broken by smallest index: answer is node 0.

Why this problem matters

This problem teaches you to think about graph connectivity from the perspective of "what happens when a node disappears." In real systems, this models scenarios like identifying which compromised server to isolate from a network to minimize damage. The technique of building a Union-Find on a subset of nodes (clean nodes only) and then analyzing which external nodes (infected) can reach each component is a powerful pattern that shows up in many graph problems.

The brute force approach

The naive approach: for each infected node, temporarily remove it and all its edges from the graph. Then simulate BFS/DFS from all remaining infected nodes. Count how many nodes get infected. Pick the removal that yields the smallest total.

def minMalwareSpread(graph, initial):
    n = len(graph)
    initial_set = set(initial)
    best_node = min(initial)
    best_count = n + 1

    for removed in initial:
        infected = set(initial) - {removed}
        visited = set(infected)
        queue = list(infected)
        while queue:
            node = queue.pop(0)
            for neighbor in range(n):
                if neighbor == removed:
                    continue
                if graph[node][neighbor] and neighbor not in visited:
                    visited.add(neighbor)
                    queue.append(neighbor)
        if len(visited) < best_count or (len(visited) == best_count and removed < best_node):
            best_count = len(visited)
            best_node = removed

    return best_node

This runs a full BFS for each infected node you consider removing, giving O(m * n^2) time where m is the number of initially infected nodes. For large graphs with many infected nodes, this is too slow.

The key insight

Instead of simulating spread, think about which clean nodes each infected node can uniquely infect. Build a Union-Find structure on clean nodes only, completely ignoring infected nodes and their edges. This gives you the connected components of the "healthy" subgraph.

Then for each infected node, look at which clean components it connects to (via edges in the original graph). A clean component is "saveable" by removing an infected node only if that infected node is the sole infected source reaching that component. If two or more infected nodes can reach the same clean component, removing just one of them does not help because the other will still infect it.

So the algorithm becomes:

Build Union-Find on clean nodes only
For each infected node, find which clean component roots it reaches
For each clean component, track how many distinct infected nodes reach it
For each infected node, sum up the sizes of components it uniquely reaches (where it is the only infected source)
Return the infected node with the largest total savings (smallest index on tie)

The core difference from Minimize Malware Spread I: there you keep all nodes and just "immunize" one. Here you remove the node entirely. That means you must build your Union-Find on clean nodes only, then check which infected nodes connect to which clean components.

Walking through it step by step

Step 1: Separate infected and clean nodes

Infected nodes (0, 4) are set aside. Clean nodes (1, 2, 3, 5) will form our Union-Find structure.

Step 2: Build Union-Find on clean nodes only (ignore infected)

Only union clean-to-clean edges. Edge 1-2 is the only clean-clean edge. Components: {1,2}, {3}, {5}.

Step 3: For each infected node, find reachable clean components

Node 0 connects to nodes 1 and 2 (both in component {1,2}). Node 4 connects to nodes 3 and 5 (components {3} and {5}).

Step 4: Count how many infected nodes reach each component

Component {1,2}: reached only by node 0 (count=1). Component {3}: only by node 4 (count=1). Component {5}: only by node 4 (count=1).

Step 5: Tally savings for each infected node

Node 0 uniquely reaches component of size 2. Node 4 uniquely reaches components of sizes 1+1=2. Tie: return smallest index = 0.

Practice this problem with spaced repetition

The optimized solution

def minMalwareSpread(graph, initial):
    n = len(graph)
    initial_set = set(initial)
    parent = list(range(n))
    size = [1] * n

    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]
            x = parent[x]
        return x

    def union(a, b):
        ra, rb = find(a), find(b)
        if ra == rb:
            return
        if size[ra] < size[rb]:
            ra, rb = rb, ra
        parent[rb] = ra
        size[ra] += size[rb]

    for i in range(n):
        if i in initial_set:
            continue
        for j in range(i + 1, n):
            if j in initial_set:
                continue
            if graph[i][j]:
                union(i, j)

    comp_sources = {}
    for infected_node in initial:
        seen_roots = set()
        for neighbor in range(n):
            if neighbor in initial_set:
                continue
            if graph[infected_node][neighbor]:
                root = find(neighbor)
                seen_roots.add(root)
        for root in seen_roots:
            if root not in comp_sources:
                comp_sources[root] = set()
            comp_sources[root].add(infected_node)

    savings = {}
    for root, sources in comp_sources.items():
        if len(sources) == 1:
            node = next(iter(sources))
            savings[node] = savings.get(node, 0) + size[root]

    best_node = min(initial)
    best_saved = 0
    for node in initial:
        saved = savings.get(node, 0)
        if saved > best_saved or (saved == best_saved and node < best_node):
            best_saved = saved
            best_node = node

    return best_node

The solution has four phases:

Build Union-Find on clean nodes. Iterate over all pairs of nodes, skipping any node in the initial infected set. For every edge between two clean nodes, union them together. This produces the connected components of the subgraph with all infected nodes removed.
Map each infected node to the clean components it reaches. For each infected node, scan its neighbors. For each clean neighbor, find its root in the Union-Find. Collect the set of distinct roots this infected node touches. Then record, for each component root, which infected nodes can reach it.
Identify uniquely reachable components. For each component root, check how many distinct infected sources reach it. If exactly one infected node reaches it, that component's size is added to that infected node's "savings" total.
Pick the winner. Scan through all infected nodes, find the one with the highest savings (smallest index on tie).

Complexity analysis

Metric	Value
Time	O(n^2)
Space	O(n)

Building the Union-Find requires iterating over the adjacency matrix: O(n^2). For each infected node, scanning its neighbors is O(n), and with m infected nodes that is O(m * n). Since m is at most n, the total is O(n^2). Each union/find operation is O(alpha(n)), effectively constant. Space is O(n) for the parent array, size array, and the component-to-sources mapping.

Building blocks

1. Union-Find on a subset

The key structural insight is building Union-Find on clean nodes only. This is slightly different from the standard "union everything" pattern. You skip infected nodes entirely during the union phase, effectively deleting them from the graph before analyzing connectivity.

parent = list(range(n))
size = [1] * n

for i in range(n):
    if i in initial_set:
        continue
    for j in range(i + 1, n):
        if j in initial_set:
            continue
        if graph[i][j]:
            union(i, j)

This gives you the components that would exist if all infected nodes were removed. Each component represents a group of clean nodes that are connected to each other without passing through any infected node.

2. Component analysis with source tracking

After building the clean components, you need to know which infected nodes can reach each component. This is the "reverse mapping" step: for each infected node, look at its clean neighbors and record which component roots it touches.

comp_sources = {}
for infected_node in initial:
    seen_roots = set()
    for neighbor in range(n):
        if neighbor in initial_set:
            continue
        if graph[infected_node][neighbor]:
            root = find(neighbor)
            seen_roots.add(root)
    for root in seen_roots:
        if root not in comp_sources:
            comp_sources[root] = set()
        comp_sources[root].add(infected_node)

A component that is reached by exactly one infected node can be saved by removing that node. A component reached by multiple infected nodes cannot be saved by any single removal.

This "unique source" pattern appears in many graph problems. The idea: if a resource (clean component) has exactly one dependency (infected node), removing that dependency frees the resource. If it has multiple dependencies, no single removal helps.

Edge cases

Two infected nodes both reach the same clean component. That component cannot be saved by removing either one alone. It contributes zero savings to both.
An infected node with no clean neighbors. Removing it saves zero clean nodes. It still might be the answer if no other removal helps either (return smallest index).
All nodes are infected. The clean subgraph is empty. No removal saves anything. Return min(initial).
A single infected node. Removing it saves all clean nodes it can reach. Since it is the only source for every reachable component, you sum up all reachable clean component sizes.
Disconnected clean components reachable by the same infected node. Each component is analyzed independently. The infected node gets credit for all components it uniquely reaches.

Build your pattern library with CodeBricks

From understanding to recall

The mental model for this problem is: "remove infected nodes, see what clean islands remain, then check which infected node is the sole bridge to each island." If an island has only one infected neighbor, that neighbor is its single point of failure. Removing it saves the entire island.

When reviewing this problem, focus on the difference from problem 924. In 924, you keep the node in the graph and just immunize it, so you build Union-Find on all nodes and count infected per component. In 928, you remove the node and its edges entirely, so you must build Union-Find on clean nodes only and then analyze which infected nodes reach which components from the outside.

The four-phase structure (build clean UF, map sources, filter unique, pick best) is worth committing to memory. Each phase is short and has a clear purpose. If you can recall those four steps, the implementation follows naturally.

Minimize Malware Spread - The original version with slightly different removal rules
Number of Islands - Foundation graph traversal and connected components
Accounts Merge - Union-Find for grouping connected entities

Mastering graph problems like this one requires more than reading solutions once. You need to revisit the pattern after a few days, then a week, then a month. CodeBricks uses spaced repetition to schedule reviews at the optimal intervals so you build lasting recall of these techniques.

Try CodeBricks free