Count of Smaller Numbers After Self: Binary Indexed Tree Explained
315. Count of Smaller Numbers After Self gives you an integer array nums. Your job is to return a counts array where counts[i] is the number of elements to the right of nums[i] that are strictly smaller than nums[i].
For example, with nums = [5, 2, 6, 1], the answer is [2, 1, 1, 0]. The 5 has two smaller elements to its right (2 and 1). The 2 has one (just 1). The 6 also has one (just 1). The 1 has none.
The approach: Binary Indexed Tree with coordinate compression
The brute force is O(n^2): for each element, scan everything to its right. That works for small inputs but times out on large ones.
The efficient approach uses a Binary Indexed Tree (also called a Fenwick Tree). A BIT lets you do two operations in O(log n) each: increment a position, and query a prefix sum up to some position. That is exactly what you need here.
The plan has three parts.
Coordinate compression. The values in nums can be large, but you only care about their relative order. Map them to dense ranks starting at 1. For [5, 2, 6, 1], the sorted unique values are [1, 2, 5, 6], so the ranks are {1: 1, 2: 2, 5: 3, 6: 4}. Your BIT only needs to be as large as the number of distinct values.
Traverse right to left. When you process nums[i], you want to know how many elements already in the BIT have rank less than rank[nums[i]]. Those are exactly the elements to the right of i that are smaller. Query the BIT for the prefix sum up to rank[nums[i]] - 1. Then call update at rank[nums[i]] to record that this element exists for future queries.
Read the result. Because you process right to left, you append counts in reverse order. Reverse at the end.
def countSmaller(nums):
sorted_unique = sorted(set(nums))
rank = {v: i + 1 for i, v in enumerate(sorted_unique)}
n = len(nums)
bit = [0] * (len(rank) + 2)
def update(i):
while i < len(bit):
bit[i] += 1
i += i & (-i)
def query(i):
s = 0
while i > 0:
s += bit[i]
i -= i & (-i)
return s
result = []
for num in reversed(nums):
r = rank[num]
result.append(query(r - 1))
update(r)
return result[::-1]
Coordinate compression: 1→rank 1, 2→rank 2, 5→rank 3, 6→rank 4
i=3, nums[3]=1, rank=1: query(0)=0, then update BIT at rank 1
BIT (after update)
result (built right to left)
query(rank-1) = query(0) = 0. No elements processed yet, so nothing is smaller. After updating rank 1, BIT[1]=1.
i=2, nums[2]=6, rank=4: query(3)=1, then update BIT at rank 4
BIT (after update)
result (built right to left)
query(rank-1) = query(3) = 1. The BIT holds one element (val=1) with rank 1, which is smaller than 6. After updating rank 4, BIT reflects two elements inserted.
i=1, nums[1]=2, rank=2: query(1)=1, then update BIT at rank 2
BIT (after update)
result (built right to left)
query(rank-1) = query(1) = 1. Only val=1 (rank 1) is to the right and smaller than 2. After updating rank 2, the prefix sum at index 2 covers both rank 1 and rank 2.
i=0, nums[0]=5, rank=3: query(2)=2, then update BIT at rank 3
BIT (after update)
result (built right to left)
query(rank-1) = query(2) = 2. Two elements to the right have rank <= 2, meaning values 1 and 2 are both smaller than 5. Result array is now complete: [2, 1, 1, 0].
Complexity
| Time | Space | |
|---|---|---|
| BIT approach | O(n log n) | O(n) |
The loop runs n times. Each iteration calls query and update, both of which traverse at most O(log n) nodes in the BIT. The BIT and rank map each take O(n) space where n is the number of distinct values.
The building blocks
Binary Indexed Tree structure. A BIT is an array where each index stores a partial sum over a range determined by the lowest set bit of that index. The update operation propagates a change upward by repeatedly adding the lowest set bit (i += i & (-i)). The query operation accumulates a prefix sum downward by repeatedly subtracting the lowest set bit (i -= i & (-i)). You do not need to understand the internal structure to use it correctly -- just the two operations and their O(log n) cost.
Coordinate compression. You use this whenever values are large but only their relative order matters. Sort and deduplicate, then assign ranks. This lets your BIT stay small regardless of the magnitude of the input values.
Merge sort approach. There is an alternative solution using a modified merge sort that counts inversions during the merge step. Both approaches are O(n log n). The BIT version is often easier to reason about once you understand the two operations, while the merge sort version avoids any extra data structure but requires careful index tracking.
Edge cases
- All same numbers. Every element gets the same rank.
query(rank - 1)always returns 0 because no smaller rank has been inserted. Result is all zeros. Coordinate compression handles ties cleanly. - Already sorted ascending (e.g.,
[1, 2, 3, 4]). Each element has no smaller elements to its right, so every count is 0. Processing right to left, each query returns 0 before you update. - Already sorted descending (e.g.,
[4, 3, 2, 1]). Each element has all remaining elements to its right smaller than it. Counts are[3, 2, 1, 0]. The BIT accumulates perfectly. - Single element. The loop runs once,
query(r - 1)returns 0 (BIT is empty), result is[0].
From understanding to recall
After walking through the steps, the logic makes sense. But in an interview the two things that tend to slip are: why query(r - 1) and not query(r), and why you traverse right to left instead of left to right.
query(r - 1) counts elements with rank strictly less than r. If you queried r, you would include the current element itself (if it had been inserted), which would give the wrong answer. The "minus one" encodes "strictly smaller."
Right to left is what makes the BIT query meaningful. When you are at index i, you want to count elements that come after i in the array. If you processed left to right, the BIT would contain elements to the left, not the right. The reversal at the end puts the counts back in the correct order.
Spaced repetition makes both of those details stick. You write the solution from scratch at increasing intervals, specifically checking that you got query(r - 1) and the traversal direction right. After a few sessions, they become muscle memory.
Related posts
- Merge K Sorted Lists - The heap-based n-way merge, another O(n log n) approach built around a supporting data structure
- Kth Largest Element - Uses a heap or quickselect to find order statistics efficiently
- Largest Rectangle in Histogram - Another hard array problem that rewards thinking carefully about what information to maintain as you scan