Number of Nodes in the Sub-Tree With the Same Label
LeetCode 1519, Number of Nodes in the Sub-Tree With the Same Label, asks you to count how many nodes in each node's subtree share the same label as that node.
The problem
You are given a tree with n nodes labeled 0 to n - 1. Each node i has a label labels[i], which is a lowercase English letter. The tree is rooted at node 0, and the edges are given as an undirected edge list.
Return an array answer where answer[i] is the number of nodes in the subtree of node i (including i itself) that have the same label as node i.
The brute force approach
The most direct approach: for each node, run a DFS or BFS from that node through its subtree, counting how many nodes share its label. This works, but it visits every node's subtree independently. In the worst case (a star graph or a skewed tree), this is O(n^2) because you might traverse most of the tree for every node.
The key insight
Instead of counting from scratch for each node, you can run a single DFS and share a global count array of 26 letters across the entire traversal.
The trick is a before/after snapshot. Before you process a node, record the current count for that node's label. Then increment the count, recurse into all children, and when you return, the difference between the current count and the saved "before" value tells you exactly how many nodes in this subtree have the same label.
Think of the count array as a running tally that only grows. Each node snapshots the tally before it recurses, then checks the tally after. The difference is exactly the subtree contribution, because only descendants (processed between the snapshot and the check) could have added to it.
Step-by-step walkthrough
Step 1: DFS reaches leaf node 3 (label 'a')
Node 3 is a leaf. Increment count['a'] to 1. Before DFS it was 0, now it is 1. ans[3] = 1 - 0 = 1.
Step 2: Return to node 1 (label 'a'). Subtree includes node 3.
Node 1 recorded before=0 for count['a'], then incremented it to 1 before recursing. After child returns, count['a']=2. ans[1] = 2 - 0 = 2.
Step 3: DFS visits leaf node 2 (label 'b')
Node 2 is a leaf. Increment count['b'] to 1. Before was 0, now is 1. ans[2] = 1 - 0 = 1.
Step 4: Return to root node 0 (label 'a'). Entire tree is its subtree.
Node 0 recorded before=0, incremented count['a'] to 1, then recursed into children. Now count['a']=3. ans[0] = 3 - 0 = 3.
Step 5: DFS complete. Final answer: [3, 2, 1, 1].
Three nodes labeled 'a' in node 0's subtree, two in node 1's subtree, one each for nodes 2 and 3.
The code
def countSubTrees(n, edges, labels):
from collections import defaultdict
adj = defaultdict(list)
for u, v in edges:
adj[u].append(v)
adj[v].append(u)
ans = [0] * n
count = [0] * 26
def dfs(node, parent):
idx = ord(labels[node]) - ord('a')
before = count[idx]
count[idx] += 1
for child in adj[node]:
if child != parent:
dfs(child, node)
ans[node] = count[idx] - before
dfs(0, -1)
return ans
The adjacency list is built from the undirected edge list. Since the tree is undirected, each edge is added in both directions.
The count array has 26 slots, one per lowercase letter. It tracks how many times each letter has been seen so far during the DFS.
For each node, before captures the count of the node's label before processing the subtree. After incrementing the count and recursing through all children, count[idx] - before gives the total number of nodes in this subtree with the same label. That difference is stored in ans[node].
The parent parameter prevents the DFS from walking back up the edge it came from. Since the tree is given as undirected edges, this is how you enforce the parent-child direction.
The before/after difference trick avoids allocating a separate count array for each subtree. A single shared array and one integer snapshot per node is all you need.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Brute force (DFS per node) | O(n^2) | O(n) |
| Single DFS with count snapshots | O(n) | O(n) |
Time is O(n) for the optimized approach. You visit each node exactly once during the DFS. At each node you do O(1) work: one lookup, one increment, one subtraction.
Space is O(n). The adjacency list takes O(n) for a tree with n - 1 edges. The recursion stack is O(n) in the worst case (skewed tree). The count array is O(26), which is O(1). The answer array is O(n).
The building blocks
DFS with parent tracking on undirected trees
When a tree is given as undirected edges, you need to avoid revisiting the parent during DFS. The standard pattern is to pass parent as a parameter and skip any neighbor that equals the parent:
def dfs(node, parent):
for child in adj[node]:
if child != parent:
dfs(child, node)
This converts the undirected edge list into a rooted tree traversal without explicitly building a directed adjacency list. You will see this pattern in any tree problem where the input is an edge list rather than a TreeNode structure.
Before/after snapshot for subtree aggregation
The pattern of recording a value before recursion and computing the difference after recursion is a clean way to isolate subtree contributions when using a shared accumulator. It avoids the overhead of merging separate count dictionaries from each child:
before = global_count[key]
global_count[key] += 1
for child in children:
recurse(child)
after = global_count[key]
subtree_result = after - before
This works because DFS processes an entire subtree contiguously. Everything that happens between the "before" and the final read comes from descendants of the current node. The same idea applies to Euler tour techniques and prefix-sum tricks on trees.
Edge cases
Single node: n = 1, no edges. The answer is [1]. The only node's subtree contains itself.
All same labels: every node has the same letter. Each node's answer equals the size of its subtree. The root's answer is n.
All unique labels: every node has a different letter (possible only when n is at most 26). Every node's answer is 1.
Skewed tree (linked list): the tree degenerates into a chain. The recursion depth is n, so you need O(n) stack space. The algorithm still runs in O(n) time.
Star graph: one root connected to n - 1 leaves. Each leaf's answer is 1 (or more, if labels repeat among leaves). The root's answer counts itself plus all leaves with the same label.
From understanding to recall
The solution hinges on two details: tracking a parent to avoid revisiting nodes in an undirected tree, and using a before/after snapshot on a shared count array to isolate subtree contributions. Both details are easy to understand when you read them, but easy to forget when you need to write them from scratch.
The parent-tracking pattern is muscle memory you build by solving several tree problems with edge-list inputs. The snapshot trick is more subtle. You need to remember to capture before before incrementing, and to compute the difference after all children have returned. Getting the order wrong breaks the count.
Spaced repetition locks in the sequencing. You practice the DFS skeleton, the snapshot placement, and the difference calculation as separate building blocks. After a few rounds, you stop second-guessing the order. You see "count nodes with same label in subtree" and the template writes itself.
Related posts
- Diameter of Binary Tree - DFS on trees where you aggregate information from subtrees and return a result to the parent
- Count Complete Tree Nodes - Tree traversal with counting, another problem where you gather subtree statistics recursively
- Subtree of Another Tree - Tree comparison problem involving subtree analysis