Largest Component Size by Common Factor: Union-Find with Prime Factorization
LeetCode 952, Largest Component Size by Common Factor, is a hard problem that combines two powerful ideas: union-find and prime factorization. You are given an array of positive integers and need to find the largest group of numbers where every pair in the group is connected through a chain of shared common factors greater than 1.
The problem
You are given an integer array nums. Two values nums[i] and nums[j] are connected if they share a common factor greater than 1, meaning gcd(nums[i], nums[j]) > 1. Connections are transitive: if a connects to b and b connects to c, then a, b, and c are all in the same component. Return the size of the largest connected component.
Input: nums = [4, 6, 15, 35]
Output: 4
Here is why: 4 and 6 share factor 2. 6 and 15 share factor 3. 15 and 35 share factor 5. Even though 4 and 35 do not share a direct common factor, they are connected transitively through 6 and 15. All four numbers end up in a single component.
Why this problem matters
This problem teaches you how to use union-find on an implicit graph where the edges are not given directly. Instead of being told which pairs are connected, you have to discover connections through a mathematical property (shared prime factors). This pattern shows up whenever you need to group elements by some transitive relationship that is expensive to check pairwise.
The prime factorization trick is the real star here. Instead of comparing every pair of numbers (which would be O(n^2)), you factorize each number and union it with its prime factors. This reduces the problem to O(n * sqrt(max)) time, which handles arrays of 20,000 numbers with values up to 100,000 without breaking a sweat.
Once you learn this "union through shared attributes" technique, you can apply it to problems like grouping strings by shared characters, connecting accounts by shared emails, or linking nodes by shared properties. The underlying idea is always the same: instead of comparing all pairs, union each element with the attributes it possesses.
The brute force approach
The most direct approach is to check every pair of numbers for a common factor. If gcd(nums[i], nums[j]) > 1, union them. Then count the size of the largest component.
from math import gcd
def largestComponentSize_brute(nums):
n = len(nums)
parent = list(range(n))
rank = [0] * n
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
def union(a, b):
ra, rb = find(a), find(b)
if ra == rb:
return
if rank[ra] < rank[rb]:
ra, rb = rb, ra
parent[rb] = ra
if rank[ra] == rank[rb]:
rank[ra] += 1
for i in range(n):
for j in range(i + 1, n):
if gcd(nums[i], nums[j]) > 1:
union(i, j)
from collections import Counter
count = Counter(find(i) for i in range(n))
return max(count.values())
This works, but the O(n^2) pairwise comparison is too slow when n can be up to 20,000. We need a way to avoid checking every pair.
The key insight
Instead of asking "do these two numbers share a factor?", flip the question: "which numbers share this prime factor?" If you factorize each number into its primes, then all numbers that share a given prime belong in the same component.
The trick is to union each number with its prime factors directly. You extend the union-find structure to include both the numbers and the prime factors as nodes. For each number, you factorize it and union the number with each of its primes. Numbers that share a prime will naturally end up in the same component because they both got unioned with that prime.
Union each number with its prime factors instead of comparing pairs. Two numbers sharing a prime factor will end up in the same union-find component automatically, without ever being compared to each other.
Walking through it step by step
Let's trace the algorithm on nums = [4, 6, 15, 35]. For each number, we find its prime factors and union the number with each factor.
Step 1: Process 4 (primes: {2})
4 is linked to prime 2. Component: {4, 2}.
Step 2: Process 6 (primes: {2, 3})
4 and 6 are now connected through prime 2. Component: {4, 6, 2, 3}.
Step 3: Process 15 (primes: {3, 5})
6 and 15 connected through prime 3. Component: {4, 6, 15, 2, 3, 5}.
Step 4: Process 35 (primes: {5, 7})
15 and 35 connected through prime 5. All numbers in one component!
Step 5: Count component sizes
Largest component has 4 numbers. Answer = 4.
After processing all four numbers, every one of them has the same root in the union-find structure. The largest (and only) component contains all 4 numbers.
The solution
from collections import Counter
def largestComponentSize(nums):
parent = {}
rank = {}
def find(x):
if x not in parent:
parent[x] = x
rank[x] = 0
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
def union(a, b):
ra, rb = find(a), find(b)
if ra == rb:
return
if rank[ra] < rank[rb]:
ra, rb = rb, ra
parent[rb] = ra
if rank[ra] == rank[rb]:
rank[ra] += 1
def prime_factors(n):
factors = []
d = 2
while d * d <= n:
if n % d == 0:
factors.append(d)
while n % d == 0:
n //= d
d += 1
if n > 1:
factors.append(n)
return factors
for num in nums:
for p in prime_factors(num):
union(num, p)
count = Counter(find(num) for num in nums)
return max(count.values())
-
Lazy initialization. The union-find uses a dictionary instead of an array. Nodes are created on first access. This lets us handle both the original numbers and their prime factors without pre-allocating a massive array.
-
Prime factorization. For each number, we extract its prime factors using trial division up to sqrt(n). This gives us the set of primes that define the number's connections.
-
Union with primes. For each number, we union it with every one of its prime factors. If two numbers share the prime 3, they both get unioned with 3, which puts them in the same component.
-
Path compression. The
findfunction uses path halving (parent[x] = parent[parent[x]]) to keep the tree flat. This ensures near-constant time per operation. -
Union by rank. The shorter tree always attaches under the taller one, preventing degenerate chains.
-
Count component sizes. After all unions, we call
findon each original number (not on the primes) and count how many numbers share each root. The maximum count is the answer.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O(n * sqrt(max(nums))) |
| Space | O(n + max(nums)) |
The time complexity comes from factorizing each of the n numbers. Trial division for a number v takes O(sqrt(v)) time. Each union and find operation is effectively O(1) amortized with path compression and union by rank. The space stores the union-find parent and rank for each number and each prime factor encountered, which is bounded by O(n + max(nums)) in the worst case.
Building blocks
1. Union-Find with lazy initialization
When nodes are not known ahead of time or span a wide range, use a dictionary-based union-find instead of an array-based one. Nodes are created the first time they are accessed.
parent = {}
rank = {}
def find(x):
if x not in parent:
parent[x] = x
rank[x] = 0
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
This pattern avoids allocating an array of size max(nums) and naturally handles mixed node types (both original numbers and prime factors live in the same structure).
2. Prime factorization by trial division
Extract the unique prime factors of a number in O(sqrt(n)) time. This is the standard approach when you need the prime decomposition and the numbers are not too large.
def prime_factors(n):
factors = []
d = 2
while d * d <= n:
if n % d == 0:
factors.append(d)
while n % d == 0:
n //= d
d += 1
if n > 1:
factors.append(n)
return factors
The inner while loop divides out all copies of each factor, so the outer loop only encounters each prime once. After the loop, if n > 1, the remaining value is a prime factor larger than sqrt of the original number.
The "union through shared attributes" pattern works for any transitive grouping problem. Instead of comparing all pairs O(n^2), union each element with its attributes. Elements sharing an attribute end up in the same component automatically. This reduces time from O(n^2) to O(n * cost_of_attribute_extraction).
Edge cases
- Single element:
nums = [1]. The number 1 has no prime factors, so it forms its own component of size 1. Make sure your factorization handles 1 correctly (it should return an empty list of factors). - All primes with no overlap:
nums = [2, 3, 5, 7, 11]. Every number is its own prime and shares no factors with any other. Each forms a component of size 1. Answer is 1. - All even numbers:
nums = [2, 4, 6, 8, 10]. Every number shares the factor 2. One giant component. Answer is 5. - Duplicate values:
nums = [6, 6, 6]. All three 6s share the same prime factors. They all union with primes 2 and 3, so they end up in one component. Answer is 3. - Number 1 in the array: 1 has no prime factors greater than 1, so it cannot connect to any other number. It always forms a lone component of size 1.
- Large primes: Two large primes like 99991 and 99989 share no factors. They remain in separate components even though they are close in value.
Common mistakes
-
Comparing all pairs instead of using prime factors. The O(n^2) approach times out on large inputs. The prime factorization trick is essential for this problem's constraints.
-
Forgetting to handle the number 1. The number 1 has no prime factors. If your factorization function returns an empty list, 1 never gets unioned with anything, which is correct. But if your code assumes every number has at least one prime factor, it may crash or produce wrong results.
-
Using an array-based union-find with indices. Since you are unioning numbers with their primes (which can be up to 100,000), a dictionary-based approach is cleaner than pre-allocating a huge array. If you do use an array, make sure it is large enough.
-
Counting primes in the component size. When counting the largest component, only count the original numbers from
nums, not the prime factor nodes. If you count prime nodes too, you will overcount. -
Not dividing out repeated factors. When factorizing, you need unique primes. If 12 = 2 * 2 * 3, you should union 12 with 2 and 12 with 3 (not union 12 with 2 twice). Make sure your factorization loop divides out all copies of each prime.
From understanding to recall
You have traced the union-find steps and seen how prime factorization turns a pairwise comparison into a linear scan. The "union through shared attributes" pattern is clean and powerful. But will you remember to use prime factorization as the bridge between numbers and their groups when you see this problem again in three weeks?
The details that matter are subtle: using a dictionary-based union-find, factorizing with trial division, unioning with primes instead of comparing pairs, and counting only original numbers in the final tally. These are the things that slip away without practice.
Spaced repetition helps you lock in these details. By revisiting the problem at increasing intervals, you move from "I remember reading about this" to "I can write this from scratch in five minutes." The pattern becomes automatic, not something you have to re-derive each time.
Related posts
- Union Find Pattern - Introduction to union-find with path compression and union by rank
- Number of Provinces - Another connected components problem using union-find
- Accounts Merge - Union-find applied to grouping by shared elements
CodeBricks breaks Largest Component Size by Common Factor into its core building blocks: dictionary-based union-find, prime factorization by trial division, and the "union through attributes" pattern. You practice each piece individually with spaced repetition until writing the full solution is automatic. When a hard union-find problem appears in your interview, you do not freeze. You just write it.