H-Index: Sorting to Find Your Citation Threshold
LeetCode H-Index (problem 274) asks you to compute a researcher's h-index: the largest value h such that the researcher has at least h papers with h or more citations. Given citations = [3, 0, 6, 1, 5], the answer is 3 because three papers have at least 3 citations each.
The h-index is a real metric used in academia to measure research impact. It balances quantity (number of papers) against quality (citation count per paper). And it turns out to be a neat little algorithm problem too.
Why this problem matters
The h-index problem teaches you a general pattern: using sorting as a preprocessing step to turn a search problem into a linear scan. Once the array is sorted, the answer reveals itself through a simple comparison at each index. You will see this "sort first, then scan" technique in many array problems.
It also introduces the idea of threshold finding, where you are looking for the exact point where a condition flips from true to false. That concept shows up repeatedly in binary search variants.
The approach: sort and scan
The key insight is this: if you sort citations in descending order, then at each position i, you know that i + 1 papers have citations of at least citations[i]. You just need to find the largest i + 1 where citations[i] >= i + 1.
Walk through [3, 0, 6, 1, 5]:
- Sort descending:
[6, 5, 3, 1, 0] - At index 0: 1 paper with at least 6 citations. Valid for h = 1.
- At index 1: 2 papers with at least 5 citations. Valid for h = 2.
- At index 2: 3 papers with at least 3 citations. Valid for h = 3.
- At index 3: 4 papers with at least 1 citation. But 1 is less than 4, so this fails.
The answer is 3.
def h_index(citations: list[int]) -> int:
citations.sort(reverse=True)
h = 0
for i, c in enumerate(citations):
if c >= i + 1:
h = i + 1
else:
break
return h
Step-by-step walkthrough
Step 0: Sort citations descending: [6, 5, 3, 1, 0]
Start scanning from the left. citations[0] = 6. Is 6 >= 1? Yes. h can be at least 1.
Step 1: Check i = 1
citations[1] = 5. Is 5 >= 2? Yes. h can be at least 2.
Step 2: Check i = 2
citations[2] = 3. Is 3 >= 3? Yes. h can be at least 3.
Step 3: Check i = 3
citations[3] = 1. Is 1 >= 4? No. Stop here. The h-index is 3.
Result: h-index = 3
3 papers have at least 3 citations each. The remaining 2 papers have no more than 3 citations each.
The moment citations[i] < i + 1, you know you have found the boundary. Every paper from that index onward has too few citations to push h any higher. The answer is whatever h was right before the condition failed.
Alternative: counting sort for O(n)
You can avoid the O(n log n) sort by using a counting approach. Create a bucket array where buckets[i] counts papers with exactly i citations (capping at n since h can never exceed the total number of papers). Then scan from the top down, accumulating the count of papers with at least that many citations.
def h_index(citations: list[int]) -> int:
n = len(citations)
buckets = [0] * (n + 1)
for c in citations:
buckets[min(c, n)] += 1
total = 0
for i in range(n, -1, -1):
total += buckets[i]
if total >= i:
return i
return 0
This runs in O(n) time and O(n) space. It is a nice optimization to mention in an interview, though the sorting approach is usually clearer to explain.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Sort + scan | O(n log n) | O(1) |
| Counting sort | O(n) | O(n) |
Both approaches are perfectly valid for interviews. The sort + scan solution is easier to implement correctly under pressure, while the counting sort solution shows deeper algorithmic thinking.
The building blocks
This problem combines two reusable patterns:
Sorting as preprocessing. Sorting transforms a messy search problem into a clean linear scan. Once the array is in order, you gain the ability to make conclusions about "all elements to the left" or "all elements to the right" of any position. This same trick powers problems like Two Sum (sorted variant), 3Sum, and Merge Intervals.
Threshold finding. You are looking for the precise point where a condition transitions from true to false. In the sorted h-index array, the condition citations[i] >= i + 1 holds for some prefix and then fails. Finding that boundary is the core task. The same pattern appears in binary search problems like Find First and Last Position and Search Insert Position.
Edge cases to watch for
- All zeros:
[0, 0, 0]returns 0 since no paper has any citations - All same values:
[4, 4, 4, 4]returns 4 since all four papers have at least 4 citations - Single paper:
[100]returns 1 (h can never exceed the number of papers) - Very high citations:
[100, 200, 300]returns 3, not 100
Remember that h can never exceed n, the total number of papers. Even if every paper has 1000 citations, the h-index is at most n. That is why the counting sort approach caps values at n.
From understanding to recall
You probably understand how h-index works after reading this. But could you code it cleanly under a 20-minute time limit next week? Understanding and recall are different skills. Spaced repetition bridges that gap by resurfacing problems at the exact intervals where your memory starts to fade, turning "I kind of remember" into automatic fluency.
Related posts
- Sort Colors - Another sorting problem
- Find First and Last Position - Binary search threshold
- Majority Element - Linear scan to find a key property