Minimize Hamming Distance After Swap Operations: Union-Find Grouping

June 3, 20267 min read

leetcodeproblemmediumarraysunion-find

You are given two integer arrays source and target, both of length n. You are also given an array allowedSwaps where each allowedSwaps[i] = [ai, bi] indicates that you are allowed to swap source[ai] and source[bi]. Return the minimum Hamming distance of source and target after performing any number of swap operations on source. The Hamming distance is the number of positions where the corresponding values differ.

This is LeetCode 1722: Minimize Hamming Distance After Swap Operations, a medium-difficulty problem that combines Union-Find with multiset counting.

source = [1, 2, 3, 4], target = [2, 1, 4, 5], allowedSwaps = [[0,1],[2,3]]. Union-Find groups indices {0,1} and {2,3}. Within each group, values can be freely rearranged. Only value 5 has no match in its group.

Why this problem matters

Union-Find (also known as Disjoint Set Union) is one of the most versatile data structures in algorithm design, and this problem is a clean demonstration of why. The trick is recognizing that swap operations are transitive. If you can swap indices 0 and 1, and you can swap indices 1 and 2, then you can rearrange the values at indices 0, 1, and 2 in any order you want. That transitivity is exactly what Union-Find captures.

Problems that involve grouping elements by some equivalence relation, then doing something independently within each group, are a natural fit for Union-Find. Once you build that mental model, a whole family of problems becomes accessible: accounts merge, smallest string with swaps, and similar connected-component problems all follow the same pattern.

The key insight

If you can swap indices a and b, and you can swap indices b and c, then through a chain of swaps you can rearrange values at indices a, b, and c into any order. This means the allowed swaps define connected components via Union-Find. Within each component, you have complete freedom to rearrange the source values however you like.

Given that freedom, the best you can do within a component is to match as many source values to target values as possible. The number of matches is the size of the intersection of the source multiset and the target multiset for that component. Any target value not covered by a matching source value is a mismatch that you cannot fix, no matter how you rearrange.

The algorithm:

Build a Union-Find structure from the allowed swaps.
Group indices by their connected component.
For each component, count the source values and target values using a Counter.
The number of matches within a component is the overlap between the source and target multisets (computed via Counter intersection).
The number of mismatches for the component is its size minus the number of matches.
Sum up the mismatches across all components.

The solution

from collections import Counter, defaultdict

def minimum_hamming_distance(source: list[int], target: list[int], allowed_swaps: list[list[int]]) -> int:
    n = len(source)
    parent = list(range(n))

    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]
            x = parent[x]
        return x

    def union(a, b):
        ra, rb = find(a), find(b)
        if ra != rb:
            parent[ra] = rb

    for a, b in allowed_swaps:
        union(a, b)

    groups = defaultdict(list)
    for i in range(n):
        groups[find(i)].append(i)

    result = 0
    for indices in groups.values():
        src_count = Counter(source[i] for i in indices)
        tgt_count = Counter(target[i] for i in indices)
        matches = sum((src_count & tgt_count).values())
        result += len(indices) - matches

    return result

The find function uses path compression (the line parent[x] = parent[parent[x]]) to keep the tree flat, which makes repeated lookups nearly constant time. The union function merges two sets by pointing one root to the other.

After building the Union-Find, we group all indices by their root representative. For each group, we build two Counter objects: one for the source values at those indices and one for the target values. The & operator on Counter objects computes the minimum count for each key, which is exactly the multiset intersection. That tells us how many values can be matched. The remaining values in the group are unavoidable mismatches.

Notice that we never actually perform any swaps. We do not need to. Once we know which indices belong to the same group, we know we can arrange them optimally, so we just count whether the right values are available.

The path compression in find (setting parent[x] = parent[parent[x]]) is called "path halving" and makes the amortized cost of each operation nearly O(1). For this problem, full path compression (making every node point directly to the root) works too, but path halving is simpler to write and has the same asymptotic behavior. Combined with union by rank, Union-Find operations run in O(alpha(n)) amortized time, where alpha is the inverse Ackermann function, which is effectively constant for all practical input sizes.

Practice Union-Find patterns with spaced repetition

Visual walkthrough

Let us trace through the algorithm with source = [1, 2, 3, 4], target = [2, 1, 4, 5], and allowedSwaps = [[0,1],[2,3]]. The key question at each step is: within each connected component, how many target values can we satisfy with the source values we have?

Step 1: Build Union-Find from allowed swaps

Swap [0,1] merges indices 0 and 1 into one component. Swap [2,3] merges indices 2 and 3. We end up with two connected components: {0,1} and {2,3}.

Step 2: Collect source values per component

For each component, gather the source values at those indices. Component A has source[0]=1 and source[1]=2. Component B has source[2]=3 and source[3]=4.

Step 3: Compare source and target multisets per component

Component A: source has {1,2} and target needs {2,1}. Both values match perfectly. Component B: source has {3,4} and target needs {4,5}. Value 4 matches, but 5 is not in the source set.

Step 4: Sum mismatches across all components

Component A contributes 0 mismatches and Component B contributes 1 mismatch. The minimum Hamming distance is 0 + 1 = 1.

The walkthrough shows that the algorithm never needs to figure out exactly which swaps to perform. It only needs to know which indices are connected and whether the right values exist in each group. This is a powerful simplification: instead of searching through swap sequences, you reduce the problem to counting.

Complexity analysis

Approach	Time	Space
Union-Find + Counter	O(n * alpha(n))	O(n)

Time: Building the Union-Find takes O(n * alpha(n)) where alpha is the inverse Ackermann function, effectively O(n). Grouping indices and building Counters is O(n). The Counter intersection for each group is proportional to the group size, and all group sizes sum to n. Overall, the time is O(n * alpha(n)), which is practically linear.

Space: The parent array uses O(n). The groups dictionary and Counter objects together use O(n) total, since each index and value appears in exactly one group.

The building blocks

1. Union-Find with path compression

parent = list(range(n))

def find(x):
    while parent[x] != x:
        parent[x] = parent[parent[x]]
        x = parent[x]
    return x

def union(a, b):
    ra, rb = find(a), find(b)
    if ra != rb:
        parent[ra] = rb

The Union-Find data structure tracks which elements belong to the same set. Each element starts as its own root. The find function walks up the parent chain to find the root, compressing the path along the way. The union function connects two sets by making one root point to the other. This is the foundational building block for any problem that involves merging groups or finding connected components without an explicit graph traversal.

2. Counter intersection for multiset matching

src_count = Counter(source[i] for i in indices)
tgt_count = Counter(target[i] for i in indices)
matches = sum((src_count & tgt_count).values())
mismatches = len(indices) - matches

Python's Counter class supports the & operator, which computes the minimum of corresponding counts. This is exactly multiset intersection: for each value, it tells you how many copies exist in both multisets. The sum of the intersection gives the total number of values that can be matched. This pattern appears whenever you need to compare two collections and find their overlap, accounting for duplicate values correctly.

Drill Union-Find building blocks with spaced repetition

Edge cases

No allowed swaps: The Hamming distance is simply the count of positions where source[i] differs from target[i], since you cannot rearrange anything.
All indices connected: You can rearrange source freely, so the answer is the number of values in target that cannot be satisfied by the multiset of source values.
source equals target: The answer is 0 regardless of swaps, because every position already matches.
Duplicate values: The Counter intersection handles duplicates correctly. If source has two 3s in a group and target needs three 3s, only two can be matched.

From understanding to recall

The core pattern to internalize here is the "Union-Find grouping" template. When a problem gives you pairs that define some equivalence or connectivity, build a Union-Find, group elements by component, and then solve each component independently. The specific "solve" step varies by problem (here it is multiset matching), but the grouping step is always the same.

Practice rebuilding this solution in two phases. First, write the Union-Find and group indices by root. Second, write the per-group comparison using Counter intersection. Each phase is short and self-contained. Once you can write both from memory, you will recognize this pattern instantly in related problems like Smallest String With Swaps or Accounts Merge.

Accounts Merge - Union-Find to group connected items and merge their data
Number of Islands - Connected components on a grid, solvable with Union-Find or DFS
Smallest String With Swaps - The same Union-Find grouping pattern applied to string characters

This problem is a clean example of how Union-Find transforms a seemingly complex rearrangement problem into a simple counting exercise. The moment you realize that connected indices can be rearranged freely, the entire solution reduces to grouping and comparing multisets. Build that intuition with deliberate practice, and the pattern will serve you well across many graph and array problems.

Try CodeBricks free