H-Index: Sorting to Find Your Citation Threshold

May 14, 20264 min read

leetcodeproblemmediumarrayssorting

LeetCode H-Index (problem 274) asks you to compute a researcher's h-index: the largest value h such that the researcher has at least h papers with h or more citations. Given citations = [3, 0, 6, 1, 5], the answer is 3 because three papers have at least 3 citations each.

The h-index is a real metric used in academia to measure research impact. It balances quantity (number of papers) against quality (citation count per paper). And it turns out to be a neat little algorithm problem too.

Sorted citations [6, 5, 3, 1, 0]. The first 3 papers have at least 3 citations each, so h = 3. Highlighted bars meet the threshold citations[i] ≥ i+1.

Why this problem matters

The h-index problem teaches you a general pattern: using sorting as a preprocessing step to turn a search problem into a linear scan. Once the array is sorted, the answer reveals itself through a simple comparison at each index. You will see this "sort first, then scan" technique in many array problems.

It also introduces the idea of threshold finding, where you are looking for the exact point where a condition flips from true to false. That concept shows up repeatedly in binary search variants.

Practice this problem with spaced repetition

The approach: sort and scan

The key insight is this: if you sort citations in descending order, then at each position i, you know that i + 1 papers have citations of at least citations[i]. You just need to find the largest i + 1 where citations[i] >= i + 1.

Walk through [3, 0, 6, 1, 5]:

Sort descending: [6, 5, 3, 1, 0]
At index 0: 1 paper with at least 6 citations. Valid for h = 1.
At index 1: 2 papers with at least 5 citations. Valid for h = 2.
At index 2: 3 papers with at least 3 citations. Valid for h = 3.
At index 3: 4 papers with at least 1 citation. But 1 is less than 4, so this fails.

The answer is 3.

def h_index(citations: list[int]) -> int:
    citations.sort(reverse=True)
    h = 0
    for i, c in enumerate(citations):
        if c >= i + 1:
            h = i + 1
        else:
            break
    return h

Practice this problem with spaced repetition

Step-by-step walkthrough

Step 0: Sort citations descending: [6, 5, 3, 1, 0]

Start scanning from the left. citations[0] = 6. Is 6 >= 1? Yes. h can be at least 1.

Step 1: Check i = 1

citations[1] = 5. Is 5 >= 2? Yes. h can be at least 2.

Step 2: Check i = 2

citations[2] = 3. Is 3 >= 3? Yes. h can be at least 3.

Step 3: Check i = 3

citations[3] = 1. Is 1 >= 4? No. Stop here. The h-index is 3.

Result: h-index = 3

3 papers have at least 3 citations each. The remaining 2 papers have no more than 3 citations each.

The moment citations[i] < i + 1, you know you have found the boundary. Every paper from that index onward has too few citations to push h any higher. The answer is whatever h was right before the condition failed.

Alternative: counting sort for O(n)

You can avoid the O(n log n) sort by using a counting approach. Create a bucket array where buckets[i] counts papers with exactly i citations (capping at n since h can never exceed the total number of papers). Then scan from the top down, accumulating the count of papers with at least that many citations.

def h_index(citations: list[int]) -> int:
    n = len(citations)
    buckets = [0] * (n + 1)
    for c in citations:
        buckets[min(c, n)] += 1
    total = 0
    for i in range(n, -1, -1):
        total += buckets[i]
        if total >= i:
            return i
    return 0

This runs in O(n) time and O(n) space. It is a nice optimization to mention in an interview, though the sorting approach is usually clearer to explain.

Complexity analysis

Approach	Time	Space
Sort + scan	O(n log n)	O(1)
Counting sort	O(n)	O(n)

Both approaches are perfectly valid for interviews. The sort + scan solution is easier to implement correctly under pressure, while the counting sort solution shows deeper algorithmic thinking.

The building blocks

This problem combines two reusable patterns:

Sorting as preprocessing. Sorting transforms a messy search problem into a clean linear scan. Once the array is in order, you gain the ability to make conclusions about "all elements to the left" or "all elements to the right" of any position. This same trick powers problems like Two Sum (sorted variant), 3Sum, and Merge Intervals.

Threshold finding. You are looking for the precise point where a condition transitions from true to false. In the sorted h-index array, the condition citations[i] >= i + 1 holds for some prefix and then fails. Finding that boundary is the core task. The same pattern appears in binary search problems like Find First and Last Position and Search Insert Position.

Edge cases to watch for

All zeros: [0, 0, 0] returns 0 since no paper has any citations
All same values: [4, 4, 4, 4] returns 4 since all four papers have at least 4 citations
Single paper: [100] returns 1 (h can never exceed the number of papers)
Very high citations: [100, 200, 300] returns 3, not 100

Remember that h can never exceed n, the total number of papers. Even if every paper has 1000 citations, the h-index is at most n. That is why the counting sort approach caps values at n.

From understanding to recall

You probably understand how h-index works after reading this. But could you code it cleanly under a 20-minute time limit next week? Understanding and recall are different skills. Spaced repetition bridges that gap by resurfacing problems at the exact intervals where your memory starts to fade, turning "I kind of remember" into automatic fluency.

Sort Colors - Another sorting problem
Find First and Last Position - Binary search threshold
Majority Element - Linear scan to find a key property

Try CodeBricks free