Skip to content
← All posts

H-Index II: Binary Search on a Sorted Array

4 min read
leetcodeproblemmediumarraysbinary-search

H-Index II is LeetCode problem 275. You are given a sorted array of citations in ascending order, and you need to find the researcher's h-index. The h-index is defined as the maximum value h such that the researcher has at least h papers with h or more citations. The catch: you must do it in O(log n) time.

If you have seen the original H-Index problem (LeetCode 274), that version gives you an unsorted array. The sort alone takes O(n log n). Here the array arrives pre-sorted, and the problem is asking you to exploit that structure with binary search.

0i=01i=13i=25i=36i=4leftmidright
citations = [0, 1, 3, 5, 6], n = 5. Binary search for the leftmost index where citations[mid] >= n - mid.

Why this problem matters

This is a classic "sorted input signals binary search" problem. Interviewers use it to test whether you can recognize that a sorted array lets you eliminate half the search space at each step, and whether you can formulate the correct binary search condition. It also tests your ability to reason about what the h-index definition means in terms of array indices.

The key connection: in a sorted array of length n, the element at index i tells you that there are n - i papers with at least citations[i] citations. So you are searching for the leftmost index where citations[i] >= n - i.

The approach: binary search for the threshold

Think about what the h-index means geometrically. In a sorted citations array of length n, if citations[mid] >= n - mid, then there are at least n - mid papers that each have at least citations[mid] citations. That satisfies the h-index condition for h = n - mid.

But you want the largest possible h, which corresponds to the smallest valid index. So when you find a valid mid, you search left to see if there is an even earlier index that also satisfies the condition. When it does not satisfy the condition, you search right because you need more citations.

def h_index(citations):
    n = len(citations)
    left, right = 0, n - 1
    while left <= right:
        mid = (left + right) // 2
        if citations[mid] >= n - mid:
            right = mid - 1
        else:
            left = mid + 1
    return n - left

Step-by-step walkthrough

Let's trace through citations = [0, 1, 3, 5, 6] with n = 5.

Step 1: left=0, right=4, mid=2. citations[2]=3, n-mid=3. 3 >= 3, search left.

01356leftmidright

citations[2] = 3 and n - mid = 5 - 2 = 3. Since 3 >= 3, there are at least 3 papers with 3+ citations. Try to find a tighter bound by searching left.

Step 2: left=0, right=1, mid=0. citations[0]=0, n-mid=5. 0 < 5, search right.

01356leftmidright

citations[0] = 0 and n - mid = 5 - 0 = 5. Since 0 < 5, we do not have 5 papers with 0+ citations meeting the h-index definition. Move left up.

Step 3: left=1, right=1, mid=1. citations[1]=1, n-mid=4. 1 < 4, search right.

01356leftmidright

citations[1] = 1 and n - mid = 5 - 1 = 4. Since 1 < 4, this position does not satisfy the threshold. Move left up.

Result: left=2. h-index = n - left = 5 - 2 = 3.

01356leftmidright

left has passed right, so the loop ends. The h-index is n - left = 5 - 2 = 3. Three papers (indices 2, 3, 4) each have at least 3 citations.

The key insight: when the loop ends, left points to the first index where the h-index condition is satisfied. The answer is n - left because there are that many papers from index left to the end, and each has at least n - left citations.

Complexity analysis

ApproachTimeSpace
Binary searchO(log n)O(1)
Linear scanO(n)O(1)

The linear scan approach would check each index from right to left until you find the threshold. It works, but it does not exploit the sorted structure fully. Binary search cuts the work from n comparisons to log n.

The building blocks

This problem is a direct application of the binary search for leftmost boundary pattern. You are not searching for a specific value in the array. Instead, you are searching for the first index where a condition flips from false to true. Once you recognize this framing, the code writes itself.

The same pattern appears whenever you have a sorted input and a monotonic predicate: the condition is false for all indices before some threshold and true for all indices after. Binary search finds that threshold in O(log n).

Edge cases to watch for

  • Empty array: n = 0, the loop never executes, and n - left = 0. Correct.
  • All zeros: [0, 0, 0]. No paper has any citations. left ends at n, so n - left = 0.
  • Single element: [100]. citations[0] = 100 >= 1 = n - 0. The h-index is 1.
  • All values equal n: [3, 3, 3]. Every paper has at least 3 citations, so h = 3. The search finds left = 0, returning n - 0 = 3.

From understanding to recall

You understand the logic now: sorted array, binary search for the leftmost index where citations[mid] >= n - mid, return n - left. But understanding is not the same as being able to write it under pressure. The condition citations[mid] >= n - mid is easy to second-guess when you are nervous. Did you compare to n - mid or n - mid - 1? Is it >= or >?

Spaced repetition locks in these details. You write the solution from scratch today, again tomorrow, again in a few days. Each time you rebuild the logic from first principles until the mapping from "sorted h-index" to "leftmost boundary binary search" is automatic.

Related posts