Skip to content
← All posts

Subdomain Visit Count: Hash Map Aggregation

5 min read
leetcodeproblemmediumstringshash-map

Subdomain Visit Count (LeetCode 811) is a clean example of the aggregation hash map pattern. You are given a list of count-paired domains, and your job is to figure out the total visit count for every subdomain that appears.

The core idea: when someone visits discuss.leetcode.com, that visit also counts for leetcode.com and com. You need to split each domain into all of its subdomains, accumulate counts in a hash map, and return the results.

The problem

You receive an array of strings cpdomains where each entry has the format "count domain". For example, "9001 discuss.leetcode.com" means discuss.leetcode.com was visited 9001 times.

A subdomain is any suffix of the domain that starts after a dot. So discuss.leetcode.com has subdomains leetcode.com and com, plus itself as the full domain.

For each subdomain (including the full domain), you need to return the total aggregated visit count across all input entries.

Example:

cpdomains = ["9001 discuss.leetcode.com", "50 yahoo.com",
             "1 intel.mail.com", "5 wiki.org"]

Output (any order):

["9001 discuss.leetcode.com", "9051 leetcode.com", "9052 com",
 "50 yahoo.com", "1 intel.mail.com", "1 mail.com",
 "5 wiki.org", "5 org"]
input cpdomains:"9001 discuss.leetcode.com""50 yahoo.com""1 intel.mail.com""5 wiki.org"split and aggregateoutput (domain : total visits):discuss.leetcode.com9001leetcode.com9051com9052yahoo.com50intel.mail.com1mail.com951wiki.org5org5
Each input domain is split into all its subdomains. Visit counts accumulate in a hash map. "9001 discuss.leetcode.com" adds 9001 to three keys: discuss.leetcode.com, leetcode.com, and com.

Notice how com ends up with 9052. It received 9001 from discuss.leetcode.com, 50 from yahoo.com, and 1 from intel.mail.com. That is the aggregation in action.

Why this problem matters

Subdomain Visit Count teaches you two things at once. First, it is a direct application of the count aggregation hash map pattern, where you iterate through items and add values to a running total keyed by some identifier. Second, it requires you to generate all subdomains from a full domain string, which is a useful string manipulation skill.

These two building blocks show up constantly in real-world code. Log analysis, analytics pipelines, and URL routing all involve breaking structured strings apart and aggregating counts. If you can solve this problem cleanly, you already understand the mechanics behind those systems.

The approach

The algorithm is simple:

  1. For each entry, split the string into a count and a domain
  2. Generate all subdomains of that domain (including the domain itself)
  3. For each subdomain, add the count to a hash map
  4. Format the hash map entries as the output

The key operation is generating subdomains. Given "discuss.leetcode.com", you need to produce ["discuss.leetcode.com", "leetcode.com", "com"]. You can do this by repeatedly finding the next dot and taking the substring after it.

The code

from collections import Counter

def subdomain_visits(cpdomains: list[str]) -> list[str]:
    counts = Counter()
    for entry in cpdomains:
        count_str, domain = entry.split(" ")
        count = int(count_str)
        parts = domain.split(".")
        for i in range(len(parts)):
            subdomain = ".".join(parts[i:])
            counts[subdomain] += count
    return [f"{count} {domain}" for domain, count in counts.items()]

Let's walk through each piece:

  • entry.split(" ") separates "9001 discuss.leetcode.com" into the count string "9001" and the domain "discuss.leetcode.com".
  • domain.split(".") breaks the domain into parts: ["discuss", "leetcode", "com"].
  • The inner loop iterates with i = 0, 1, 2, and ".".join(parts[i:]) produces "discuss.leetcode.com", "leetcode.com", and "com" respectively.
  • counts[subdomain] += count adds the visit count to each subdomain's running total in the Counter.
  • The final list comprehension formats each entry as "count domain".

Step-by-step walkthrough

Let's trace through the full example and watch the hash map build up entry by entry.

Step 1: Process "9001 discuss.leetcode.com". Split into 3 subdomains, each gets 9001 visits.

count map:discuss.leetcode.com9001leetcode.com9001com9001

discuss.leetcode.com, leetcode.com, and com each receive 9001.

Step 2: Process "50 yahoo.com". Split into 2 subdomains. yahoo.com is new, com already exists.

count map:discuss.leetcode.com9001leetcode.com9001com9051yahoo.com50

"com" jumps from 9001 to 9051. "yahoo.com" is a new entry with 50.

Step 3: Process "1 intel.mail.com". Three subdomains. com gets another +1.

count map:discuss.leetcode.com9001leetcode.com9001com9052yahoo.com50intel.mail.com1mail.com1

intel.mail.com and mail.com are new. com increases from 9051 to 9052.

Step 4: Process "5 wiki.org". Two subdomains, both new. Final map is complete.

count map:discuss.leetcode.com9001leetcode.com9001com9052yahoo.com50intel.mail.com1mail.com1wiki.org5org5

wiki.org and org are added. The map now has 8 entries. Format each as count + domain for the output.

After processing all four input entries, the hash map has eight keys. Each key is a unique subdomain, and each value is the total visits across all input entries that contributed to it. The final step is just formatting.

Complexity analysis

MetricValueExplanation
TimeO(n * m)n is the number of input entries, m is the max number of domain levels (typically small, at most 3-4)
SpaceO(n * m)The hash map can hold up to n * m unique subdomains in the worst case

In practice, the number of domain levels m is bounded by a small constant (most domains have 2 to 4 levels). So you can think of this as effectively O(n) time and space. The split and join operations on each domain are proportional to the domain length, but that too is bounded in practice.

Building blocks

Count aggregation

The core pattern here is iterating through a collection, computing one or more keys from each item, and adding a numeric value to the running total for each key:

from collections import Counter

counts = Counter()
for item in collection:
    keys = generate_keys(item)
    value = extract_value(item)
    for key in keys:
        counts[key] += value

This is a generalization of simple frequency counting. Instead of adding 1 for each occurrence, you add an arbitrary value. The pattern shows up in:

  • Subdomain Visit Count: each domain produces multiple keys (subdomains), each receiving the same count
  • Frequency counting: each item produces one key, and the value is always 1
  • Revenue aggregation: each transaction contributes to totals for its category, region, and time period

Subdomain generation

The second building block is generating all suffixes of a dot-separated string. Given "a.b.c", produce ["a.b.c", "b.c", "c"]. The trick is splitting on dots and re-joining progressively shorter slices:

parts = domain.split(".")
for i in range(len(parts)):
    subdomain = ".".join(parts[i:])

This is a pattern you will see in DNS resolution, file path processing, and namespace lookup systems.

Edge cases to watch for

A few things to consider:

  • Single-level domains. An input like "100 com" has no dots, so parts = ["com"] and the only subdomain is "com" itself. The loop handles this correctly with one iteration.
  • Multiple inputs sharing the same domain. If two entries both mention "google.com", their counts add up naturally through the Counter.
  • Large counts. The problem guarantees counts fit in standard integers, but it is worth noting that counts can be up to 10000 and there can be up to 10000 entries. The total for a popular TLD like "com" could reach 10^8, which is fine for a 32-bit or 64-bit integer.
  • Leading zeros in counts. The problem does not include these, but int("0050") in Python correctly returns 50, so your code would handle it anyway.

From understanding to recall

This problem feels almost too simple once you see the solution. Split, loop, aggregate. But the building blocks here are the same ones you need for harder problems. The count aggregation pattern is the backbone of Top K Frequent Elements. The "generate all subkeys from one input" idea shows up whenever you need to index items under multiple categories.

The risk is not that you will forget the solution to this specific problem. The risk is that you will not recognize the aggregation pattern when it appears in a different form. Spaced repetition helps here. By revisiting the pattern at increasing intervals, you train yourself to spot "iterate, generate keys, aggregate values" as soon as a problem describes counting across overlapping categories.

Related posts

  • Group Anagrams - Another hash map grouping problem where the key function determines which bucket each item falls into
  • Hash Maps: The Data Structure - A deep dive into how hash maps work under the hood, and why O(1) lookups make aggregation patterns so efficient
  • Top K Frequent Elements - Uses the same frequency counting pattern as a first step, then extracts the top k entries