GCD Sort of an Array: Union-Find with Prime Factorization
GCD Sort of an Array is LeetCode 1998. You are given an integer array nums, and you can swap nums[i] and nums[j] if gcd(nums[i], nums[j]) > 1. Return true if you can sort the array in non-decreasing order using any number of swaps, or false otherwise.
A few examples:
[7, 21, 3]returnstrue. 7 and 21 share factor 7, 21 and 3 share factor 3. Through transitive swaps, you can rearrange the array to[3, 7, 21].[5, 2, 6, 2]returnsfalse. 5 shares no common factor with any other element, so it is stuck at index 0. The sorted order needs 2 at index 0.[8, 9, 4, 2, 3]returnstrue. All elements are connected through shared prime factors and can be rearranged into sorted order.
The key insight is that swap eligibility is transitive. If a can swap with b, and b can swap with c, then through a series of swaps, a, b, and c can all reach any position among themselves. This is exactly the structure that union-find was built for.
The approach
Checking every pair for gcd > 1 would be O(n^2), which is too slow when n can be up to 30,000. Instead, you use the same trick from Largest Component Size by Common Factor: union each number with its prime factors rather than comparing pairs.
Here is the plan:
- Factorize each number into its prime factors using trial division.
- Union each number with its primes. Two numbers sharing a prime factor end up in the same union-find component automatically.
- Sort a copy of the array. This gives you the target position for each element.
- Check each index. For every index
i, verify thatnums[i]andsorted_nums[i]are in the same union-find component. If they are,nums[i]can reach its sorted position through transitive swaps. If any index fails this check, returnfalse.
That is the entire algorithm. The union-find groups tell you which elements can freely rearrange among themselves. If every element can reach where it needs to go, the array is sortable.
Think of union-find components as "swap neighborhoods." Any element can move to any position within its neighborhood through a chain of valid swaps. The array is sortable if and only if every element's target position is in the same neighborhood.
Python solution
def gcdSort(nums):
parent = {}
rank = {}
def find(x):
if x not in parent:
parent[x] = x
rank[x] = 0
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
def union(a, b):
ra, rb = find(a), find(b)
if ra == rb:
return
if rank[ra] < rank[rb]:
ra, rb = rb, ra
parent[rb] = ra
if rank[ra] == rank[rb]:
rank[ra] += 1
def prime_factors(n):
factors = []
d = 2
while d * d <= n:
if n % d == 0:
factors.append(d)
while n % d == 0:
n //= d
d += 1
if n > 1:
factors.append(n)
return factors
for num in nums:
for p in prime_factors(num):
union(num, p)
sorted_nums = sorted(nums)
for a, b in zip(nums, sorted_nums):
if find(a) != find(b):
return False
return True
Visual walkthrough
Let's trace the algorithm on nums = [7, 21, 3]. We factorize each number, union it with its primes, then check whether each element can reach its sorted position.
Step 1: Factorize each number and union with its primes
7 has prime factor {7}. Union(7, prime 7) is a self-link. 21 = 3 * 7. 3 is prime. Start processing.
Step 2: Process 7. Union 7 with prime 7 (no-op)
7 is its own prime. Union(7, 7) does nothing. Component: {7}.
Step 3: Process 21. Union 21 with prime 3 and prime 7
Union(21, 3) links 21 to 3. Union(21, 7) merges {21, 3} with {7}. Component: {7, 21, prime 3}.
Step 4: Process 3. Union 3 with prime 3
Union(3, prime 3). Prime 3 is already in the {7, 21} component. Now all three are connected.
Step 5: Compare to sorted array
Sorted = [3, 7, 21]. At each index, nums[i] and sorted[i] share the same root. Answer = true.
After processing all numbers, every element shares the same root. The sorted array is [3, 7, 21]. At each index, the original value and the sorted value have the same root, so the answer is true.
Complexity analysis
| Metric | Value | Why |
|---|---|---|
| Time | O(n * sqrt(M) + n log n) | Factorizing each of n numbers costs O(sqrt(M)) per number where M is the max value. Sorting costs O(n log n). Union-find operations are nearly O(1) each. |
| Space | O(n + M) | The dictionary-based union-find stores entries for each number and each prime factor encountered. |
The dominant cost is either the factorization pass or the sort, depending on the input. For typical constraints (n up to 30,000 and values up to 100,000), this runs comfortably within time limits.
Building blocks
GCD Sort is built from two reusable bricks that appear across many problems.
1. Union-Find with lazy initialization
When nodes span a wide numeric range or are not known ahead of time, use a dictionary-based union-find. Nodes are created on first access. This avoids allocating a massive array and naturally handles mixed node types, since both the original numbers and their prime factors live in the same structure.
parent = {}
rank = {}
def find(x):
if x not in parent:
parent[x] = x
rank[x] = 0
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
This same lazy union-find pattern appears in Largest Component Size by Common Factor, Accounts Merge, and any problem where the node space is dynamic.
2. Prime factorization by trial division
Extract the unique prime factors of a number in O(sqrt(n)) time. The inner loop divides out all copies of each prime, so each factor is collected exactly once.
def prime_factors(n):
factors = []
d = 2
while d * d <= n:
if n % d == 0:
factors.append(d)
while n % d == 0:
n //= d
d += 1
if n > 1:
factors.append(n)
return factors
This is the standard approach when values are not too large (up to around 10^5 or 10^6). For larger values, you might switch to a sieve-based factorization, but trial division is sufficient here.
The "union through shared attributes" pattern is the real technique to internalize. Instead of comparing all pairs (O(n^2)), you union each element with its attributes. Elements sharing an attribute end up in the same component automatically. This reduces grouping problems from quadratic to nearly linear time.
Edge cases
All elements are the same. For example, nums = [5, 5, 5]. The array is already sorted. Every pair has gcd = 5, so they are all connected. Return true.
Element is 1. The number 1 has no prime factors greater than 1, so it cannot swap with anything. If 1 is already in its correct sorted position, that is fine. But if 1 needs to move, the answer is false. For instance, nums = [3, 1] returns false because 1 is stuck at index 1 but needs to be at index 0.
Already sorted. If the array is already in order, each nums[i] == sorted_nums[i], and find(a) == find(a) is always true. Return true without needing any swaps.
Two large primes. If nums = [99991, 99989], these are both prime and share no common factor. If the array needs them in the opposite order, the answer is false.
All even numbers. Every element shares the factor 2, so they all end up in one component. The array can always be sorted. Return true.
Duplicate values. Duplicates share all the same prime factors and end up in the same component. They do not cause issues because nums[i] == sorted_nums[i] for duplicates that are already in place, and duplicates that need to move share the same root.
From understanding to recall
You have seen how prime factorization feeds into union-find and how the final comparison against the sorted array determines the answer. The algorithm is clean: factorize, union, sort, compare roots. But there are details that slip away. The lazy dictionary-based union-find, the trial division loop with its inner while to divide out repeated primes, the final zip comparison. These are small pieces, but under interview pressure, any one of them can trip you up.
The good news is that both building blocks, union-find and prime factorization, are reusable across many problems. If you drill them individually with spaced repetition, GCD Sort becomes an assembly problem rather than a derivation problem. You pull out two bricks you already own, connect them, and add the sorted-comparison check. That is a five-minute solution instead of a thirty-minute struggle.
Related posts
- Largest Component Size by Common Factor - The closest sibling problem, using the same prime factorization and union-find technique to find component sizes
- Number of Provinces - A clean introduction to union-find for counting connected components
- Accounts Merge - Union-find applied to grouping elements by shared attributes, the same "union through shared properties" pattern