Skip to content
← All posts

Longest Common Subpath: Binary Search with Rolling Hash

7 min read
leetcodeproblemhardarraysbinary-searchhash-map

Longest Common Subpath is LeetCode 1923. There are n cities numbered from 0 to n - 1. You are given several friends' paths, where each path is an array of city numbers representing the route that friend traveled. A subpath is a contiguous subsequence of a path. Your job is to find the length of the longest subpath that is shared by all friends' paths.

For example, with n = 5 and paths [[0,1,2,3,4], [2,3,4,0], [4,0,1,2,3,4]], the longest common subpath is [2,3,4], which appears as a contiguous segment in every friend's route.

Friend 101234[2, 3, 4]Friend 22340[2, 3, 4]Friend 3401234[2, 3, 4]
n=5 cities (0 to 4). Three friends' paths each contain the subpath [2, 3, 4], making it the longest common subpath (length 3).

Why this problem matters

This problem forces you to combine two techniques that are each powerful on their own: binary search on the answer and rolling hash. You have already seen binary search on the answer in problems like Koko Eating Bananas, and rolling hash in problems like Longest Duplicate Substring. Here, you need both at the same time, applied across multiple arrays instead of a single string.

The challenge is not just knowing both techniques. It is knowing how to wire them together: the binary search decides which length to try, the rolling hash decides whether that length is feasible across all paths. Getting comfortable with this kind of composition is exactly what separates "I know the patterns" from "I can solve hard problems."

The key insight

If all friends share a common subpath of length L, then they also share a common subpath of length L - 1. Why? Take the shared subpath of length L and drop the last element. You now have a shared subpath of length L - 1 that still appears in every friend's path. This monotonicity means you can binary search on the answer length.

For each candidate length L, you need a way to check whether all paths contain at least one common subpath of that length. A brute force approach would extract every subpath of length L from every path and compare them, but that is far too slow when paths are long.

Rolling hash solves this. For each path, you compute a hash for every contiguous window of length L in O(1) per window (after an O(L) setup). You collect the hash sets for each path and check whether any hash appears in all of them. If yes, length L is feasible. If no, it is not.

The monotonicity check: "If a common subpath of length L exists, does one of length L - 1 also exist?" If the answer is always yes, you can binary search on L. This is the same reasoning behind binary search on the answer in many other problems.

The solution

def longestCommonSubpath(n, paths):
    MOD1 = (1 << 61) - 1
    MOD2 = (1 << 31) - 1
    BASE1 = 131
    BASE2 = 137

    def check(length):
        if length == 0:
            return True
        common = None
        for path in paths:
            h1 = 0
            h2 = 0
            p1 = pow(BASE1, length, MOD1)
            p2 = pow(BASE2, length, MOD2)
            seen = set()
            for i in range(length):
                h1 = (h1 * BASE1 + path[i]) % MOD1
                h2 = (h2 * BASE2 + path[i]) % MOD2
            seen.add((h1, h2))
            for i in range(1, len(path) - length + 1):
                h1 = (h1 * BASE1 - path[i - 1] * p1 + path[i + length - 1]) % MOD1
                h2 = (h2 * BASE2 - path[i - 1] * p2 + path[i + length - 1]) % MOD2
                seen.add((h1, h2))
            if common is None:
                common = seen
            else:
                common = common & seen
            if not common:
                return False
        return len(common) > 0

    lo, hi = 0, min(len(p) for p in paths)
    result = 0
    while lo <= hi:
        mid = (lo + hi) // 2
        if check(mid):
            result = mid
            lo = mid + 1
        else:
            hi = mid - 1
    return result

The outer binary search tries candidate lengths from 0 up to the length of the shortest path. For each candidate, check computes rolling hashes across all paths and intersects the hash sets.

Inside check, the double hash (h1, h2) is computed for each window position. Using two independent hash functions with different bases and moduli makes collisions astronomically unlikely. The first modulus (1 << 61) - 1 is a Mersenne prime, giving a huge hash space. The second modulus (1 << 31) - 1 provides a second independent check.

For each path, the code builds a set of all (h1, h2) pairs for windows of the candidate length. Then it intersects with the running common set. If at any point the intersection becomes empty, the function returns False early, saving unnecessary work on remaining paths.

Single hashing can produce false positives. Two different subpaths might hash to the same value. Double hashing with two independent (base, mod) pairs reduces the collision probability to roughly 1 / (MOD1 * MOD2), which is negligible for practical input sizes.

Visual walkthrough

Let's trace through the binary search with paths [[0,1,2,3,4], [2,3,4,0], [4,0,1,2,3,4]]. The shortest path has length 4, so we search lengths 0 through 4.

Step 1: Initialize binary search. The shortest path has length 4, so the answer is between 0 and 4. lo=0, hi=4, mid=2.

01234lohimid

The answer length cannot exceed the shortest path. Binary search will narrow this range.

Step 2: Check length 2. Use rolling hash to collect all length-2 subpaths from each friend.

Friend 1:[0,1][1,2][2,3][3,4]Friend 2:[2,3][3,4][4,0]Friend 3:[4,0][0,1][1,2][2,3][3,4]Common hashes found. Length 2 is feasible.

Friend 1: {[0,1],[1,2],[2,3],[3,4]}. Friend 2: {[2,3],[3,4],[4,0]}. Friend 3: {[4,0],[0,1],[1,2],[2,3],[3,4]}. Hashes for [2,3] and [3,4] appear in all. Length 2 works. Set lo = mid + 1 = 3.

Step 3: lo=3, hi=4, mid=3. Check length 3 with rolling hash.

Friend 1:[0,1,2][1,2,3][2,3,4]Friend 2:[2,3,4][3,4,0]Friend 3:[4,0,1][0,1,2][1,2,3][2,3,4]Hash for [2,3,4] found in all paths. Length 3 is feasible.

Friend 1: {[0,1,2],[1,2,3],[2,3,4]}. Friend 2: {[2,3,4],[3,4,0]}. Friend 3: {[4,0,1],[0,1,2],[1,2,3],[2,3,4]}. Hash for [2,3,4] appears in all three. Length 3 works. Set lo = mid + 1 = 4.

Step 4: lo=4, hi=4, mid=4. Check length 4 with rolling hash.

Friend 1:[0,1,2,3][1,2,3,4]Friend 2:[2,3,4,0]Friend 3:[4,0,1,2][0,1,2,3][1,2,3,4]No common hash across all paths. Length 4 fails.

Friend 1: {[0,1,2,3],[1,2,3,4]}. Friend 2: {[2,3,4,0]}. Friend 3: {[4,0,1,2],[0,1,2,3],[1,2,3,4]}. No hash appears in all three. Length 4 fails. Set hi = mid - 1 = 3.

Step 5: lo=4 > hi=3. Binary search ends. The longest common subpath has length 3.

Answer: 3 ([2, 3, 4])

Binary search tested only 3 candidate lengths instead of all 5. For each length, rolling hash collected subpath hashes in O(total cities) time.

The binary search checked lengths 2, 3, and 4. Lengths 2 and 3 were feasible (common subpaths exist), and length 4 was not. The answer is 3, corresponding to the subpath [2,3,4].

Complexity analysis

ApproachTimeSpace
Binary search + rolling hashO(S * log(min_len))O(S)

Where S is the total number of cities across all paths and min_len is the length of the shortest path.

The binary search runs O(log(min_len)) iterations. Each iteration scans all paths once, computing rolling hashes in O(S) total time. The space is O(S) for storing the hash sets.

The double hashing does not change the asymptotic complexity. It just doubles the constant factor for hash computation, which is negligible compared to the overall work.

The building blocks

1. Binary search on the answer

When the answer lives in a numeric range and feasibility is monotonic, binary search finds the boundary in O(log n) time. The template is always the same:

lo, hi = min_possible, max_possible
result = default
while lo <= hi:
    mid = (lo + hi) // 2
    if is_feasible(mid):
        result = mid
        lo = mid + 1
    else:
        hi = mid - 1

In this problem, is_feasible(mid) checks whether all paths share a common subpath of length mid. The monotonicity guarantee (if length L works, so does L - 1) makes binary search valid.

2. Rolling hash across multiple arrays

The rolling hash technique from Rabin-Karp string matching extends naturally to arrays of integers. Instead of hashing characters, you hash city numbers. The update formula is identical:

h = (h * BASE - old_value * power + new_value) % MOD

The twist in this problem is that you run rolling hash on multiple arrays and intersect the resulting hash sets. This is a clean composition: rolling hash produces the candidate set for each path, and set intersection filters for common subpaths.

Edge cases

  • Single path: if there is only one friend, every subpath of that path is trivially "common." The answer is the entire path length.
  • All paths identical: the answer is the length of any path (they are all the same).
  • No common elements: if the paths have no city in common, the answer is 0. The binary search will find that even length 1 is infeasible.
  • A path of length 1: the maximum possible answer is 1. If that single city appears in every other path, the answer is 1. Otherwise, 0.
  • Very long paths with small n: the rolling hash handles this efficiently. Even if paths have millions of cities, each check call is linear in the total number of cities.

From understanding to recall

The structure is clean: binary search picks a length, rolling hash checks feasibility, set intersection filters across paths. But the implementation details matter. How do you pick the two (base, mod) pairs? How do you update the rolling hash without off-by-one errors? How do you intersect sets efficiently by short-circuiting when the intersection empties?

These are the details that fade after a few days. Spaced repetition pins them down. You write the solution today, again in two days, then in a week. After a few rounds, the double-hash setup and the set intersection loop are automatic. You stop reconstructing and start writing.

Related posts

CodeBricks uses spaced repetition to turn these techniques into reflex. You solve the problem once, then the system schedules reviews at increasing intervals. By the time you sit for an interview, the rolling hash formula and the binary search skeleton are muscle memory.