Accounts Merge: Union-Find on Shared Emails

May 22, 20266 min read

leetcodeproblemmediumstringsgraphunion-find

Accounts Merge is LeetCode problem 721. You are given a list of accounts where each account starts with a name followed by one or more email addresses. Two accounts belong to the same person if they share at least one email. Your job is to merge all accounts that belong to the same person, then return each merged account with the name followed by all emails in sorted order.

The problem

Each element accounts[i] is a list of strings. The first string is the name, and the rest are emails belonging to that account. Two accounts may belong to the same person if they share any email. One person can have multiple accounts under the same name. Different people can also share the same name, so you cannot merge by name alone.

Input: accounts = [
    ["John", "john@a.com", "john@b.com"],
    ["John", "john@b.com", "john@c.com"],
    ["Mary", "mary@a.com"]
]
Output: [
    ["John", "john@a.com", "john@b.com", "john@c.com"],
    ["Mary", "mary@a.com"]
]

Accounts 0 and 1 both contain "john@b.com", so they must belong to the same John. All three of John's emails get merged into one account. Mary has no shared emails with anyone, so her account stays as is.

Two accounts share the email "john@b", so they belong to the same person. Mary has no shared emails, so her account stays separate.

The approach: union-find

The key insight is that this is a connected components problem in disguise. Each account is a node. If two accounts share an email, they are connected. You need to find all connected components and merge the emails within each component.

Union-Find (Disjoint Set Union) is a natural fit. You maintain one component per account index. As you scan through the emails, you build a mapping from each email to the first account index that owned it. When you encounter an email you have seen before, you union the current account with the previous owner. After processing all accounts, emails that end up under the same root belong to the same person.

The algorithm:

Initialize a union-find structure with one element per account.
Create a dictionary email_to_id mapping each email to an account index.
For each account, iterate through its emails. If an email is already in email_to_id, union the current account index with the stored index. Then set (or update) email_to_id[email] to the current index.
After all unions, iterate through email_to_id. For each email, find the root of its account index. Group emails by root.
For each group, sort the emails and prepend the account name.

Practice this pattern

Visual walkthrough

Let's trace the union-find process on the example above. Watch how the parent array changes when a shared email triggers a union, and how the email_to_id map grows as we process each email.

Initial state: each account is its own parent

email_to_id: {}

Initialize parent = [0, 1, 2], rank = [0, 0, 0], email_to_id = {}

Each of the 3 accounts starts as its own component in the union-find.

Process Account 0: john@a

email_to_id: {john@a: 0}

john@a not in email_to_id. Set email_to_id["john@a"] = 0.

First time seeing john@a. Map it to account 0. No union needed.

Process Account 0: john@b

email_to_id: {john@a: 0, john@b: 0}

john@b not in email_to_id. Set email_to_id["john@b"] = 0.

First time seeing john@b. Map it to account 0. Still no unions.

Process Account 1: john@b (shared email found!)

email_to_id: {john@a: 0, john@b: 1, john@c: 1}

john@b IS in email_to_id (owner: 0). union(1, 0). Then set email_to_id["john@b"] = 1.

Account 1 shares john@b with account 0. Union merges them. parent[1] = 0.

Process Account 1: john@c

email_to_id: {john@a: 0, john@b: 1, john@c: 1}

john@c not in email_to_id. Set email_to_id["john@c"] = 1.

New email, no union. But account 1 already points to account 0's root.

Process Account 2: mary@a

email_to_id: {john@a: 0, john@b: 1, john@c: 1, mary@a: 2}

mary@a not in email_to_id. Set email_to_id["mary@a"] = 2.

Mary's email is unique. No union. Account 2 stays in its own component.

Group emails by root and sort

email_to_id: {john@a: 0, john@b: 1, john@c: 1, mary@a: 2}

find(0)=0, find(1)=0, find(2)=2. Group: {0: [john@a, john@b, john@c], 2: [mary@a]}

Collect all emails under their root account, sort each group, prepend the name. Done!

The critical moment is Step 4. When we process account 1's email "john@b", we find it already in email_to_id with owner 0. That triggers union(1, 0), which links account 1 to account 0. From that point on, any email belonging to either account will resolve to the same root. In the final grouping step, all three of John's emails land in the same bucket.

The code

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1

def accounts_merge(accounts):
    uf = UnionFind(len(accounts))
    email_to_id = {}

    for i, account in enumerate(accounts):
        for email in account[1:]:
            if email in email_to_id:
                uf.union(i, email_to_id[email])
            email_to_id[email] = i

    groups = {}
    for email, idx in email_to_id.items():
        root = uf.find(idx)
        if root not in groups:
            groups[root] = []
        groups[root].append(email)

    result = []
    for root, emails in groups.items():
        result.append([accounts[root][0]] + sorted(emails))
    return result

Key details to notice:

The UnionFind class uses both path compression (in find) and union by rank. Path compression flattens the tree so future lookups are nearly O(1). Union by rank keeps the tree balanced by always attaching the shorter tree under the taller one.
We always call email_to_id[email] = i after the union check, not before. This means the map always points to a valid account index, and if the email was seen before, the union has already happened.
In the grouping phase, we call uf.find(idx) for each email, not just use idx directly. This is important because idx might not be the root of its component. The find call with path compression gives us the true root.
The final sorted(emails) within each group satisfies the problem's requirement that emails appear in sorted order.

You do not need to sort the final list of accounts. The problem accepts any order for the outer list. Only the emails within each account need to be sorted.

Drill this solution with spaced repetition

Complexity analysis

Metric	Value	Reasoning
Time	O(n * k * alpha(n) + n * k * log(n * k))	Scanning all emails takes O(n * k) where n is accounts and k is average emails per account. Each union/find is O(alpha(n)), effectively constant. Sorting each group costs O(n * k * log(n * k)) in the worst case.
Space	O(n * k)	The `email_to_id` map stores every unique email. The union-find parent and rank arrays use O(n). The groups dictionary also holds all emails.

Here n is the number of accounts and k is the average number of emails per account. The inverse Ackermann function alpha(n) is effectively constant for any input you will encounter in practice, so the union-find operations are nearly free. The sorting step dominates.

The building blocks

Union-Find (Disjoint Set Union)

The core data structure is the same template you see in many graph connectivity problems:

Number of Provinces uses union-find to count connected components in an adjacency matrix (LeetCode 547)
Redundant Connection (LeetCode 684) uses union-find to detect the edge that creates a cycle
Number of Islands can also be solved with union-find as an alternative to DFS flood fill

The find and union functions are identical across all of these. What changes is the input format and how you decide which elements to union.

Sorting merged groups

After the union-find phase identifies which emails belong together, you need to group them, sort each group, and prepend the account name. This is a standard "group by key" operation using a dictionary, followed by sorting each value list.

Edge cases to watch for

Same name, different people: Two accounts can have the same name but no shared emails. They must remain separate. Union-find handles this naturally because it merges by email overlap, not by name.
One account with many emails: A single account with hundreds of emails and no overlaps with others. No unions happen. The account passes through unchanged (just sorted).
Long chain of merges: Account 0 shares an email with account 1, account 1 shares a different email with account 2, account 2 shares yet another email with account 3. All four accounts merge into one component through transitive unions.
All accounts merge into one: Every account shares at least one email with another. The entire input collapses to a single merged account.
Single account: Only one account in the input. Return it with sorted emails.

From understanding to recall

You have seen how union-find turns a messy "merge accounts by shared emails" problem into a clean connected-components algorithm. The email-to-index map is the bridge between the string world and the integer-indexed union-find. The grouping and sorting at the end is mechanical.

But writing this from scratch under time pressure is different from reading it. The details that trip people up: initializing the union-find with len(accounts), mapping emails to account indices (not to each other), calling find during the grouping phase, and remembering to sort each group before prepending the name. Spaced repetition drills these details until they are automatic. You practice writing the union-find template, the email mapping loop, and the grouping logic from memory. After a few rounds, you can produce the full solution without hesitation.

Try CodeBricks free

Clone Graph - Another graph traversal problem using BFS/DFS, LeetCode 133
Number of Islands - Connected components via DFS, LeetCode 200
Number of Provinces - Union-find for connected components, LeetCode 547

Visual Learner? Explore how graph algorithms work with our interactive visualizations.