Most Frequent Subtree Sum: DFS with Frequency Counting
Given the root of a binary tree, return all the most frequent subtree sums. The subtree sum of a node is the sum of all values in the subtree rooted at that node (including the node itself). If there is a tie, return all values with the highest frequency.
This is LeetCode 508: Most Frequent Subtree Sum, and it combines two core patterns: post-order tree traversal and frequency counting with a hash map. If you have solved tree depth or path sum problems before, you already know how to propagate values up from children to parents. This problem adds a counting layer on top of that propagation.
The problem
You are given a binary tree. For each node, compute the subtree sum: the sum of that node's value plus all values in its left and right subtrees. Then find which subtree sum(s) appear most frequently across all nodes.
Example 1:
5
/ \
2 -3
- Left leaf subtree sum = 2
- Right leaf subtree sum = -3
- Root subtree sum = 5 + 2 + (-3) = 4
All three sums (2, -3, 4) appear exactly once. Since they all tie for the highest frequency, return [2, -3, 4].
Example 2:
5
/ \
2 -5
- Left leaf subtree sum = 2
- Right leaf subtree sum = -5
- Root subtree sum = 5 + 2 + (-5) = 2
Sum 2 appears twice, sum -5 appears once. The maximum frequency is 2. Return [2].
The approach
The key observation is that you need to compute subtree sums bottom-up. A node's subtree sum depends on the subtree sums of its children. That means post-order DFS is the natural fit: process children first, then combine their results with the current node.
The algorithm has two phases:
- Collect all subtree sums using post-order DFS. At each node, compute
node.val + left_sum + right_sumand record it. - Count frequencies with a hash map. After (or during) collection, find the maximum frequency and return all sums that match it.
You can combine both phases into a single DFS pass. As you compute each subtree sum, immediately increment its count in a hash map. When the DFS finishes, scan the map for the maximum frequency and collect the winners.
The subtree sum of a node always equals node.val + subtree_sum(left) + subtree_sum(right). This recursive definition means you must compute children first before computing a parent. That is exactly what post-order traversal gives you: left, right, then current node.
Step-by-step walkthrough
Let's trace through the second example tree (root=5, left=2, right=-5) using post-order DFS. Watch how each node's subtree sum gets computed and recorded in the frequency map.
Step 1: Post-order DFS visits left leaf (value 2)
Node 2 is a leaf. Its subtree sum is just its own value: 2. Record sum=2 in the frequency map.
Step 2: Post-order DFS visits right leaf (value -5)
Node -5 is a leaf. Its subtree sum is -5. Record sum=-5 in the frequency map.
Step 3: Visit root (value 5). Compute root subtree sum.
Root subtree sum = node value + left sum + right sum = 5 + 2 + (-5) = 2. Record sum=2 in frequency map.
Step 4: Find maximum frequency in the map
The max count in our frequency map is 2 (sum=2 appears twice). Collect all sums with that frequency.
Step 5: Return all sums with the maximum frequency
Sum 2 has frequency 2 (the max). Return [2]. Both the left leaf and the entire tree have subtree sum 2.
Observations:
- Post-order traversal guarantees that by the time you visit a node, both its children have already been processed and their subtree sums are available.
- The frequency map builds up incrementally. You do not need a separate pass to count frequencies.
- The final scan of the map is O(n) in the worst case (every node could have a unique subtree sum).
- For this example, sum 2 appears at both the left leaf and the root, giving it a frequency of 2.
The code
def find_frequent_tree_sum(root):
if root is None:
return []
count = {}
def dfs(node):
if node is None:
return 0
left_sum = dfs(node.left)
right_sum = dfs(node.right)
s = node.val + left_sum + right_sum
count[s] = count.get(s, 0) + 1
return s
dfs(root)
max_freq = max(count.values())
return [s for s, freq in count.items() if freq == max_freq]
Line-by-line breakdown:
if root is None: return []handles the empty tree edge case. No nodes means no subtree sums.count = {}is the frequency hash map. Keys are subtree sums, values are how many times each sum appears.def dfs(node)is the post-order DFS helper. It returns the subtree sum of the given node.if node is None: return 0is the base case. A null child contributes 0 to its parent's subtree sum.left_sum = dfs(node.left)andright_sum = dfs(node.right)recurse into children first (post-order).s = node.val + left_sum + right_sumcomputes the subtree sum for the current node using results from its children.count[s] = count.get(s, 0) + 1increments the frequency count for this subtree sum in the hash map.return spasses the subtree sum back up to the parent.max_freq = max(count.values())finds the highest frequency after the DFS completes.- The list comprehension collects all sums that appear with the maximum frequency.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Post-order DFS + hash map | O(n) | O(n) |
Time is O(n). You visit each node exactly once during the DFS. At each node, you do O(1) work: add three numbers, update a hash map entry, and return. The final scan of the hash map is at most O(n) since there are at most n distinct subtree sums.
Space is O(n). The hash map can hold up to n entries (one per node if every subtree sum is unique). The recursion stack uses O(h) space where h is the height of the tree. In the worst case (skewed tree), h = n, so overall space is O(n).
Building blocks
This problem is built on two fundamental building blocks:
Post-order DFS with value propagation. You compute a value for each node that depends on its children's values. The skeleton looks like this:
def dfs(node):
if node is None:
return BASE_VALUE
left = dfs(node.left)
right = dfs(node.right)
result = COMBINE(node.val, left, right)
return result
For Most Frequent Subtree Sum, BASE_VALUE is 0 and COMBINE is addition. This same pattern drives Maximum Depth (max + 1), Diameter of Binary Tree (track max through each node), and Binary Tree Maximum Path Sum (max single-branch contribution).
Frequency counting with a hash map. You accumulate counts as you encounter values, then extract the winners at the end:
count = {}
for value in stream:
count[value] = count.get(value, 0) + 1
max_freq = max(count.values())
winners = [v for v, f in count.items() if f == max_freq]
This same pattern appears in problems like Top K Frequent Elements, Group Anagrams, and Find Mode in BST. The hash map turns a sorting problem into a linear-time problem.
Combining these two blocks gives you the complete solution: DFS propagates sums upward, and the hash map counts them as they arrive.
Edge cases
- Single node: the only subtree sum is the node's value. Return
[node.val]. - All nodes have the same value: if every node is
k, the subtree sums depend on subtree sizes. Each sum is unique (k, 3k, 7k for a balanced tree of 3 levels), so all appear once and you return all of them. - Negative values: subtree sums can be negative, zero, or positive. The hash map handles all values as keys without issues.
- Empty tree: return an empty list. Guard against calling
max()on an empty sequence by checking for null root upfront.
Multiple subtree sums can tie for the highest frequency. Do not assume there is a single winner. The problem asks you to return all sums with the maximum count, not just one of them.
From understanding to recall
You just traced through a clean post-order DFS that computes subtree sums and counts their frequencies. The bottom-up propagation makes sense, and the hash map counting is familiar territory. But will you remember to return the subtree sum from each recursive call so the parent can use it? Will you remember that the final step requires scanning the map for the max frequency, not just returning the most recently seen sum?
Spaced repetition builds recall for these details. You write the solution from scratch after a day, then three days, then a week. Each repetition reinforces the post-order return pattern and the two-phase structure (collect sums, then find winners). After a few rounds, post-order DFS with frequency counting becomes second nature.
Post-order value propagation and hash map frequency counting are two of roughly 60 reusable building blocks that cover hundreds of LeetCode problems. Drilling them individually with spaced repetition is more effective than grinding random problems.
Related posts
- Path Sum III - Another tree problem that combines DFS with a hash map, using prefix sums to count paths with a target sum
- Subtree of Another Tree - Practices the recursive subtree concept, checking if one tree appears as a subtree of another
- Binary Tree Maximum Path Sum - Uses post-order DFS to propagate values upward, choosing the best path contribution at each node