Smallest String With Swaps: Union-Find for Connected Components
You are given a string s and a list of pairs of indices. Each pair [i, j] means you can swap the characters at positions i and j any number of times, in any order. Return the lexicographically smallest string you can produce.
This is LeetCode 1202: Smallest String With Swaps, and it is one of the cleanest examples of how Union-Find turns a seemingly complex string manipulation problem into a connected components exercise.
Why this problem matters
At first glance, this looks like a sorting problem with constraints. You might think about simulating swaps or running some kind of bubble sort limited to certain positions. That path leads to exponential complexity and dead ends.
The real structure here is a graph. Each index is a node, each swap pair is an edge, and connected components tell you which groups of characters can be freely rearranged. Once you see it that way, the problem collapses into "sort each group independently." Union-Find is the perfect tool for finding those groups efficiently.
This problem teaches you to recognize when a set of pairwise relationships implies larger group structures, a pattern that shows up across graph, string, and array problems.
The key insight
If you can swap indices a and b, and you can swap indices b and c, then you can rearrange the characters at a, b, and c in any order you want. The reasoning: swap a and b to move a character from a to b, then swap b and c to move it to c. By chaining swaps, any permutation within the group is reachable.
This means every connected component of indices is an independent group where you have full freedom to rearrange. To get the lexicographically smallest string, just sort the characters within each component and place them back at the component's indices in sorted order.
The solution
def smallestStringWithSwaps(s, pairs):
n = len(s)
parent = list(range(n))
rank = [0] * n
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry:
return
if rank[rx] < rank[ry]:
rx, ry = ry, rx
parent[ry] = rx
if rank[rx] == rank[ry]:
rank[rx] += 1
for i, j in pairs:
union(i, j)
groups = {}
for i in range(n):
root = find(i)
if root not in groups:
groups[root] = []
groups[root].append(i)
result = list(s)
for indices in groups.values():
chars = sorted(result[i] for i in indices)
for i, c in zip(sorted(indices), chars):
result[i] = c
return ''.join(result)
Let's walk through each part.
First, we initialize Union-Find with each index as its own parent. Then we process every swap pair with union(i, j), which merges the components of the two indices.
After all unions, we group indices by their root. Each group contains all indices that are transitively connected through swap pairs. Within each group, we extract the characters, sort them, and assign them back to the group's indices (also sorted). This guarantees each position gets the smallest available character.
Path compression (parent[x] = parent[parent[x]]) and union by rank keep the Union-Find operations in near-O(1) amortized time. Without these optimizations, find can degrade to O(n) per call on skewed trees. With them, the entire Union-Find phase is effectively linear.
Visual walkthrough
Let's trace through s = "dcab" with pairs = [[0,3],[1,2]]. Watch how Union-Find groups the indices, and how sorting within each group produces the smallest string.
Step 1: Identify the string and swap pairs
s = "dcab", pairs = [[0,3],[1,2]]. Each pair means we can swap those two indices any number of times.
Step 2: Build Union-Find to group connected indices
Union(0,3) merges indices 0 and 3. Union(1,2) merges indices 1 and 2. After processing all pairs, the parent array reveals two components.
Step 3: Group characters by their connected component
Indices 0 and 3 share root 0, so their characters are grouped: ['d','b']. Indices 1 and 2 share root 1, so their characters are grouped: ['c','a'].
Step 4: Sort characters within each component
Component {0,3}: ['d','b'] sorted becomes ['b','d']. Component {1,2}: ['c','a'] sorted becomes ['a','c']. Sorting gives us the lexicographically smallest arrangement per group.
Step 5: Build the result string
Place each sorted character at its original index. Index 0 gets 'b', index 1 gets 'a', index 2 gets 'c', index 3 gets 'd'. Result: "bacd".
The final result "bacd" is the lexicographically smallest string achievable. Indices 0 and 3 were free to exchange their characters (d and b), so b goes to position 0 and d goes to position 3. Indices 1 and 2 were free to exchange (c and a), so a goes to position 1 and c goes to position 2.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Union-Find + sort | O(n log n) | O(n) |
Time: Processing all pairs takes O(p * alpha(n)) where p is the number of pairs and alpha is the inverse Ackermann function (effectively constant). Grouping indices is O(n). Sorting within groups takes O(n log n) in the worst case (if all indices land in one component). Total: O(n log n + p).
Space: The parent and rank arrays are O(n). The groups dictionary holds all n indices. The result array is O(n). Total: O(n).
The building blocks
1. Union-Find data structure
The core template for Union-Find with path compression and union by rank:
parent = list(range(n))
rank = [0] * n
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry:
return
if rank[rx] < rank[ry]:
rx, ry = ry, rx
parent[ry] = rx
if rank[rx] == rank[ry]:
rank[rx] += 1
This template is reusable across dozens of problems. The find and union functions stay the same. What changes is how you iterate over the input and what you do with the resulting components.
2. Group and sort by component
After building the Union-Find, collect indices by root and sort characters within each group:
groups = {}
for i in range(n):
root = find(i)
if root not in groups:
groups[root] = []
groups[root].append(i)
result = list(s)
for indices in groups.values():
chars = sorted(result[i] for i in indices)
for i, c in zip(sorted(indices), chars):
result[i] = c
The key detail: sort both the indices and the characters, then zip them together. The smallest character goes to the smallest index within the group, the next smallest character to the next index, and so on. This greedy assignment produces the lexicographically smallest arrangement.
Edge cases
- No pairs: no swaps are possible. The string is already the answer. Each index is its own component, no sorting happens.
- All indices in one component: every index is transitively connected. You can sort the entire string. The result is just
sorted(s). - Duplicate characters: sorting handles duplicates naturally. If two indices share the same character, their relative order does not affect the result.
- Single character string:
n = 1, no pairs can exist. Returnsas-is. - Pairs that form a chain:
[[0,1],[1,2],[2,3]]connects all indices 0 through 3 into one component. Union-Find correctly merges them all through transitivity. - Redundant pairs: the same pair appearing multiple times, or pairs that connect already-connected indices. Union-Find handles these with a no-op when
find(x) == find(y).
From understanding to recall
You have seen how Union-Find groups indices by connectivity and how sorting within each group produces the lexicographically smallest string. The logic is clean: union the pairs, group by root, sort each group, place characters back.
But writing it from scratch under time pressure is a different challenge. The path compression line, the rank comparison in union, the grouping loop, the zip of sorted indices with sorted characters. Each of these is a small detail that is easy to forget or get wrong in an interview.
Spaced repetition closes that gap. You write the Union-Find template and the group-and-sort logic from memory at increasing intervals. After a few rounds, the code flows automatically. You do not need to re-derive the algorithm. You recognize the connected component structure, reach for Union-Find, and write the solution without hesitation.
Related posts
- Number of Provinces - The classic Union-Find introduction, counting connected components in an adjacency matrix
- Accounts Merge - Union-Find applied to grouping email addresses that share an account, then sorting within groups
- Redundant Connection - Using Union-Find to detect cycle-creating edges, building on the same find/union template
CodeBricks breaks Smallest String With Swaps into its Union-Find and group-sort building blocks, then drills them independently with spaced repetition. You type each piece from scratch until the pattern is automatic. When a connected component problem shows up in your interview, you do not hesitate. You just write it.