Subdomain Visit Count: Hash Map Aggregation
Subdomain Visit Count (LeetCode 811) is a clean example of the aggregation hash map pattern. You are given a list of count-paired domains, and your job is to figure out the total visit count for every subdomain that appears.
The core idea: when someone visits discuss.leetcode.com, that visit also counts for leetcode.com and com. You need to split each domain into all of its subdomains, accumulate counts in a hash map, and return the results.
The problem
You receive an array of strings cpdomains where each entry has the format "count domain". For example, "9001 discuss.leetcode.com" means discuss.leetcode.com was visited 9001 times.
A subdomain is any suffix of the domain that starts after a dot. So discuss.leetcode.com has subdomains leetcode.com and com, plus itself as the full domain.
For each subdomain (including the full domain), you need to return the total aggregated visit count across all input entries.
Example:
cpdomains = ["9001 discuss.leetcode.com", "50 yahoo.com",
"1 intel.mail.com", "5 wiki.org"]
Output (any order):
["9001 discuss.leetcode.com", "9051 leetcode.com", "9052 com",
"50 yahoo.com", "1 intel.mail.com", "1 mail.com",
"5 wiki.org", "5 org"]
Notice how com ends up with 9052. It received 9001 from discuss.leetcode.com, 50 from yahoo.com, and 1 from intel.mail.com. That is the aggregation in action.
Why this problem matters
Subdomain Visit Count teaches you two things at once. First, it is a direct application of the count aggregation hash map pattern, where you iterate through items and add values to a running total keyed by some identifier. Second, it requires you to generate all subdomains from a full domain string, which is a useful string manipulation skill.
These two building blocks show up constantly in real-world code. Log analysis, analytics pipelines, and URL routing all involve breaking structured strings apart and aggregating counts. If you can solve this problem cleanly, you already understand the mechanics behind those systems.
The approach
The algorithm is simple:
- For each entry, split the string into a count and a domain
- Generate all subdomains of that domain (including the domain itself)
- For each subdomain, add the count to a hash map
- Format the hash map entries as the output
The key operation is generating subdomains. Given "discuss.leetcode.com", you need to produce ["discuss.leetcode.com", "leetcode.com", "com"]. You can do this by repeatedly finding the next dot and taking the substring after it.
The code
from collections import Counter
def subdomain_visits(cpdomains: list[str]) -> list[str]:
counts = Counter()
for entry in cpdomains:
count_str, domain = entry.split(" ")
count = int(count_str)
parts = domain.split(".")
for i in range(len(parts)):
subdomain = ".".join(parts[i:])
counts[subdomain] += count
return [f"{count} {domain}" for domain, count in counts.items()]
Let's walk through each piece:
entry.split(" ")separates"9001 discuss.leetcode.com"into the count string"9001"and the domain"discuss.leetcode.com".domain.split(".")breaks the domain into parts:["discuss", "leetcode", "com"].- The inner loop iterates with
i = 0, 1, 2, and".".join(parts[i:])produces"discuss.leetcode.com","leetcode.com", and"com"respectively. counts[subdomain] += countadds the visit count to each subdomain's running total in the Counter.- The final list comprehension formats each entry as
"count domain".
Step-by-step walkthrough
Let's trace through the full example and watch the hash map build up entry by entry.
Step 1: Process "9001 discuss.leetcode.com". Split into 3 subdomains, each gets 9001 visits.
discuss.leetcode.com, leetcode.com, and com each receive 9001.
Step 2: Process "50 yahoo.com". Split into 2 subdomains. yahoo.com is new, com already exists.
"com" jumps from 9001 to 9051. "yahoo.com" is a new entry with 50.
Step 3: Process "1 intel.mail.com". Three subdomains. com gets another +1.
intel.mail.com and mail.com are new. com increases from 9051 to 9052.
Step 4: Process "5 wiki.org". Two subdomains, both new. Final map is complete.
wiki.org and org are added. The map now has 8 entries. Format each as count + domain for the output.
After processing all four input entries, the hash map has eight keys. Each key is a unique subdomain, and each value is the total visits across all input entries that contributed to it. The final step is just formatting.
Complexity analysis
| Metric | Value | Explanation |
|---|---|---|
| Time | O(n * m) | n is the number of input entries, m is the max number of domain levels (typically small, at most 3-4) |
| Space | O(n * m) | The hash map can hold up to n * m unique subdomains in the worst case |
In practice, the number of domain levels m is bounded by a small constant (most domains have 2 to 4 levels). So you can think of this as effectively O(n) time and space. The split and join operations on each domain are proportional to the domain length, but that too is bounded in practice.
Building blocks
Count aggregation
The core pattern here is iterating through a collection, computing one or more keys from each item, and adding a numeric value to the running total for each key:
from collections import Counter
counts = Counter()
for item in collection:
keys = generate_keys(item)
value = extract_value(item)
for key in keys:
counts[key] += value
This is a generalization of simple frequency counting. Instead of adding 1 for each occurrence, you add an arbitrary value. The pattern shows up in:
- Subdomain Visit Count: each domain produces multiple keys (subdomains), each receiving the same count
- Frequency counting: each item produces one key, and the value is always 1
- Revenue aggregation: each transaction contributes to totals for its category, region, and time period
Subdomain generation
The second building block is generating all suffixes of a dot-separated string. Given "a.b.c", produce ["a.b.c", "b.c", "c"]. The trick is splitting on dots and re-joining progressively shorter slices:
parts = domain.split(".")
for i in range(len(parts)):
subdomain = ".".join(parts[i:])
This is a pattern you will see in DNS resolution, file path processing, and namespace lookup systems.
Edge cases to watch for
A few things to consider:
- Single-level domains. An input like
"100 com"has no dots, soparts = ["com"]and the only subdomain is"com"itself. The loop handles this correctly with one iteration. - Multiple inputs sharing the same domain. If two entries both mention
"google.com", their counts add up naturally through the Counter. - Large counts. The problem guarantees counts fit in standard integers, but it is worth noting that counts can be up to 10000 and there can be up to 10000 entries. The total for a popular TLD like
"com"could reach 10^8, which is fine for a 32-bit or 64-bit integer. - Leading zeros in counts. The problem does not include these, but
int("0050")in Python correctly returns 50, so your code would handle it anyway.
From understanding to recall
This problem feels almost too simple once you see the solution. Split, loop, aggregate. But the building blocks here are the same ones you need for harder problems. The count aggregation pattern is the backbone of Top K Frequent Elements. The "generate all subkeys from one input" idea shows up whenever you need to index items under multiple categories.
The risk is not that you will forget the solution to this specific problem. The risk is that you will not recognize the aggregation pattern when it appears in a different form. Spaced repetition helps here. By revisiting the pattern at increasing intervals, you train yourself to spot "iterate, generate keys, aggregate values" as soon as a problem describes counting across overlapping categories.
Related posts
- Group Anagrams - Another hash map grouping problem where the key function determines which bucket each item falls into
- Hash Maps: The Data Structure - A deep dive into how hash maps work under the hood, and why O(1) lookups make aggregation patterns so efficient
- Top K Frequent Elements - Uses the same frequency counting pattern as a first step, then extracts the top k entries