Graph Connectivity With Threshold: Sieve Meets Union-Find

June 1, 20268 min read

leetcodeproblemhardgraphunion-findmath

You are given n cities numbered from 1 to n and an integer threshold. Two cities share a direct connection if they have a common factor strictly greater than threshold. You are also given a list of queries, where each query is a pair of cities. For each query, return true if the two cities are connected (directly or through a chain of connections), and false otherwise.

This is LeetCode 1627: Graph Connectivity With Threshold, and it is one of the cleanest examples of combining number theory with graph connectivity. The naive approach of checking every pair of cities for a common factor is too slow. The key insight is to flip the problem around: instead of checking pairs, iterate over factors and connect their multiples.

n=6, threshold=1. Nodes sharing a common factor greater than 1 are connected. Components: {1} (isolated), {2,3,4,6} (share factors 2 and 3), {5} (isolated, prime with no multiples in range).

Why this problem matters

This problem sits at the intersection of two fundamental patterns: the Sieve of Eratosthenes and Union-Find. On its own, Union-Find is a standard tool for dynamic connectivity problems. On its own, sieve-style iteration is a standard tool for number theory problems. But combining them is what makes this problem interesting, and it shows up more often than you might expect. Any time you need to group numbers by shared divisibility properties, this sieve-plus-union pattern applies.

Understanding this combination also deepens your intuition for when brute force over pairs is unnecessary. Instead of checking all O(n^2) pairs, you iterate over factors and their multiples, which runs in O(n log n) total. That harmonic series bound is the same one that makes the Sieve of Eratosthenes efficient, and recognizing it is a skill that transfers to many other problems.

The approach

The core idea: for each factor f from threshold + 1 up to n, union all multiples of f that fall within the range [1, n]. If two numbers share a common factor greater than threshold, they will both appear as multiples of that factor, and the union operation will place them in the same connected component.

After processing all factors, answering a query is just checking whether two nodes share the same root in the Union-Find structure.

The algorithm:

Initialize a Union-Find (disjoint set) structure with nodes 1 through n.
For each factor f from threshold + 1 to n, iterate through multiples f, 2f, 3f, ... up to n. Union consecutive multiples (or union each multiple with f itself).
For each query [a, b], check if find(a) == find(b).

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n + 1))
        self.rank = [0] * (n + 1)

    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]
            x = self.parent[x]
        return x

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return
        if self.rank[rx] < self.rank[ry]:
            rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]:
            self.rank[rx] += 1

def areConnected(n, threshold, queries):
    uf = UnionFind(n)

    for f in range(threshold + 1, n + 1):
        multiple = f * 2
        while multiple <= n:
            uf.union(f, multiple)
            multiple += f

    return [uf.find(a) == uf.find(b) for a, b in queries]

Let's walk through what each piece does.

The UnionFind class uses path compression (the two-step parent[x] = parent[parent[x]] trick) and union by rank. These two optimizations together bring the amortized cost of each find and union call down to nearly O(1), specifically O(alpha(n)) where alpha is the inverse Ackermann function.

The main loop iterates factor f from threshold + 1 to n. For each factor, it walks through every multiple of f starting at 2f and unions each multiple with f. This ensures that all multiples of f end up in the same connected component. You do not need to union every pair of multiples with each other. Unioning each multiple with f is enough because union is transitive: if f is connected to 2f and f is connected to 3f, then 2f and 3f are in the same component.

The query loop is a simple list comprehension. For each pair [a, b], it checks whether find(a) equals find(b).

You might wonder why we union each multiple with f rather than with the previous multiple. Both approaches work. Unioning with f keeps the code simpler and produces the same components. The key invariant is that all multiples of f end up sharing a root.

Practice this problem with spaced repetition

Step-by-step walkthrough

Initial: Create a parent array for Union-Find.

Each node is its own parent. No connections yet.

Factor 2: Union all multiples of 2.

Multiples of 2 in [2, 6]: 2, 4, 6. Union(2,4), then Union(4,6).

Component formed: {2, 4, 6}. Nodes 1, 3, 5 remain isolated.

Factor 3: Union all multiples of 3.

Multiples of 3 in [3, 6]: 3, 6. Union(3,6).

Node 3 joins the {2, 4, 6} component via 6. Component is now {2, 3, 4, 6}.

Factor 4: Union all multiples of 4.

Multiples of 4 in [4, 6]: only 4. One multiple means no union needed.

No change. 4 is already in {2, 3, 4, 6}.

Factors 5 and 6: No new unions.

Factor 5 has only one multiple (5). Factor 6 has only one multiple (6). Neither produces a union.

Final components: {1}, {2, 3, 4, 6}, {5}. Node 5 is prime with no multiples in range.

Answer queries: Check if two nodes share a root.

Query (2, 3): find(2) = find(3)? Yes. Query (1, 5): find(1) = find(5)? No.

Each query is O(alpha(n)) with path compression and union by rank.

In the example above with n=6 and threshold=1, the sieve iterates through factors 2, 3, 4, 5, and 6. Factor 2 connects nodes 2, 4, and 6 into one component. Factor 3 connects node 3 to node 6, which pulls 3 into the existing component. Factors 4, 5, and 6 each have at most one multiple in range, so they produce no new unions. The final components are , , and .

Notice how the sieve naturally handles transitive connections. Nodes 2 and 3 do not share factor 2 or factor 3 directly as a common factor. But 6 is a multiple of both 2 and 3, so the union operations through 6 connect them indirectly. This is exactly the kind of chain the problem asks about.

Complexity analysis

Approach	Time	Space
Brute force (check all pairs for GCD)	O(n^2 * log(n))	O(n)
Sieve + Union-Find	O(n * log(n) * alpha(n) + Q * alpha(n))	O(n)

Time for the sieve approach: the outer loop runs from threshold + 1 to n. For each factor f, the inner loop visits n/f multiples. The total number of union operations is the sum of n/f for f from threshold + 1 to n, which is bounded by O(n * log(n)). This is the harmonic series bound. Each union operation costs O(alpha(n)), giving O(n * log(n) * alpha(n)) for the preprocessing phase. Answering Q queries costs O(Q * alpha(n)). Since alpha(n) is effectively constant for all practical inputs, the total is essentially O(n * log(n) + Q).

Space is O(n) for the Union-Find parent and rank arrays.

Build your pattern library with CodeBricks

Edge cases to watch for

Threshold of 0: every pair of numbers shares factor 1, but the threshold requires a factor strictly greater than 0. Since every integer greater than or equal to 1 has factor 1, this means all nodes are connected when threshold is 0. The sieve starts at f = 1, and every number is a multiple of 1, so the union loop connects everything.
Threshold greater than or equal to n: no factor in the range (threshold + 1 to n) exists. The sieve loop does not execute, and every node stays isolated. All queries return false unless a == b.
Query where a == b: a node is always connected to itself. find(a) == find(a) is always true.
Node 1 with high threshold: node 1 has no multiples other than itself in the range when threshold >= 1. It can only join a component if some factor f has multiples that include 1, but since multiples start at 2f, node 1 never gets unioned with anything. It remains isolated unless threshold = 0.
Large n with small threshold: the sieve runs efficiently because the harmonic series sum converges to O(n * log(n)) regardless of how small the threshold is.

The building blocks

1. Union-Find with path compression and union by rank

The foundational data structure for dynamic connectivity:

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n + 1))
        self.rank = [0] * (n + 1)

    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]
            x = self.parent[x]
        return x

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return
        if self.rank[rx] < self.rank[ry]:
            rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]:
            self.rank[rx] += 1

This is the same Union-Find template you use in "Number of Islands" (with grid coordinates), "Accounts Merge" (with email strings), and any problem where you need to group elements and later check membership. The two-line path compression (parent[x] = parent[parent[x]], then move up) is the iterative version of full path compression. It achieves nearly the same amortized bound and avoids recursion stack limits.

2. Sieve-style factor iteration

The pattern of iterating over all multiples of each number:

for f in range(start, n + 1):
    multiple = f * 2
    while multiple <= n:
        process(f, multiple)
        multiple += f

This is the same loop structure as the Sieve of Eratosthenes, but instead of marking composites, you are performing union operations. The total number of iterations across all values of f is bounded by the harmonic series: n/2 + n/3 + n/4 + ... + n/n, which is O(n * log(n)). You will see this pattern in problems involving divisors, GCD grouping, and prime factorization on ranges.

From understanding to recall

You have seen how the sieve and Union-Find combine: iterate over factors, union their multiples, then answer queries with find. The logic is elegant and the code is compact. But reproducing it under pressure requires more than understanding.

The details that trip people up are specific. Do you start the factor loop at threshold + 1 or threshold? Do you start the multiple loop at f or 2f? Do you union each multiple with f or with the previous multiple? These are small choices, but getting any one of them wrong means a wrong answer or a TLE.

Spaced repetition closes the gap between understanding and recall. You practice writing the Union-Find template, the sieve iteration loop, and the query resolution from memory at increasing intervals. After a few rounds, the pattern becomes automatic. You see "connect numbers that share a common factor" and the sieve-plus-union template flows out without hesitation.

Course Schedule - Graph connectivity through topological sort, another way to reason about reachability
Number of Islands - Union-Find applied to grid connectivity, the foundational connected components problem
Accounts Merge - Union-Find with string keys, grouping elements by shared membership

CodeBricks breaks Graph Connectivity With Threshold into its sieve iteration and Union-Find building blocks, then drills them independently with spaced repetition. You type each piece from scratch until the pattern is automatic. When a connectivity problem shows up in your interview, you do not think about it. You just write it.

Try CodeBricks free