Skip to content
← All posts

Compare Strings by Frequency of the Smallest Character: Binary Search on Sorted Counts

5 min read
leetcodeproblemmediumarraysstringsbinary-search

Compare Strings by Frequency of the Smallest Character (LeetCode 1170) introduces a custom comparison function f(s) that returns the frequency of the lexicographically smallest character in a string. You then use that function to answer a batch of queries: for each query string, how many words have a strictly greater f value?

The brute-force approach of comparing every query against every word is too slow. The trick is to sort the word frequencies once, then answer each query with a binary search.

The problem

Define f(s) as the frequency of the smallest character in string s. For example, f("dcce") = 2 because the smallest character is c and it appears twice.

Given two string arrays queries and words, return an array answer where answer[i] is the number of words such that f(queries[i]) is strictly less than f(words[j]).

Example: queries = ["a","aaa"], words = ["bbb","cc"].

  • f("a") = 1, f("aaa") = 3, f("bbb") = 3, f("cc") = 2
  • For query "a": how many words have f(word) > 1? Both 3 and 2 are greater than 1, so the answer is 2.
  • For query "aaa": how many words have f(word) > 3? Neither 3 nor 2 is greater than 3, so the answer is 0.
  • Result: [2, 0].
wordsqueries"bbb"f = 3 (smallest: b)"cc"f = 2 (smallest: c)"a"f = 1 (smallest: a)"aaa"f = 3 (smallest: a)sort f(words)2[0]3[1]f = 1count f(w) > 1f = 3count f(w) > 3answer = [2, 0]For each query, count how many words have a strictly greater f value
Compute f(s) = frequency of the smallest character for each word and query, then count words with f(word) greater than f(query).

Why this problem matters

This problem combines three patterns you will see repeatedly: computing a derived value from strings, sorting an array to enable fast lookups, and using binary search to count elements that satisfy a threshold. These same building blocks appear in problems involving ranking, thresholding, and batch queries. If you can decompose this problem cleanly, you can handle any problem that asks "how many items exceed a threshold?"

The key insight

Computing f(s) for a single string takes O(k) where k is the length of the string. If you have m queries and n words, the naive approach checks every query against every word: O(m * n) comparisons after the precomputation.

The insight is that once you have computed f(word) for all words, you only care about the values, not the words themselves. Sort those values. Now for each query, finding "how many word frequencies are strictly greater than f(query)" is just a binary search: find the insertion point for f(query) using bisect_right, and subtract from the array length.

Sorting costs O(n log n) once. Each query then costs O(log n). Total: O(n log n + m log n), which is much better than O(m * n).

Python solution

from bisect import bisect_right

def num_smaller_by_frequency(queries, words):
    def f(s):
        smallest = min(s)
        return s.count(smallest)

    word_freqs = sorted(f(w) for w in words)
    n = len(word_freqs)

    return [n - bisect_right(word_freqs, f(q)) for q in queries]

Here is what each part does:

  • f(s) finds the lexicographically smallest character with min(s), then counts how many times it appears. This runs in O(k) for a string of length k.
  • word_freqs = sorted(f(w) for w in words) computes all word frequencies and sorts them in ascending order. This is the preprocessing step that makes queries fast.
  • bisect_right(word_freqs, f(q)) returns the index where f(q) would be inserted to keep the array sorted, placing it after any existing equal values. Everything to the right of this index is strictly greater.
  • n - bisect_right(...) gives the count of word frequencies that are strictly greater than f(q).

Using bisect_right (not bisect_left) is important here. The problem asks for strictly greater, not greater than or equal. bisect_right places the insertion point after all equal values, so n - bisect_right counts only values that are strictly larger.

Visual walkthrough

Step 1: Compute f(word) for each word. f(s) = frequency of the lexicographically smallest character.

"bbb"smallest = b, count = 3, f = 3"cc"smallest = c, count = 2, f = 2

For "bbb", the smallest character is b and it appears 3 times, so f = 3. For "cc", the smallest is c, appearing 2 times, so f = 2.

Step 2: Collect all f(word) values into an array and sort it.

f(words) unsorted:32sort2[0]3[1]sorted_f = [2, 3]

Sorting lets us use binary search later. The sorted array is [2, 3].

Step 3: Process query "a". Compute f("a") = 1. Binary search for first value > 1 in sorted_f.

"a"f = 12[0]3[1]bisect = 02 - 0 = 2answer[0] = 2

bisect_right(sorted_f, 1) returns 0. Elements from index 0 onward are all > 1. Count = len(sorted_f) - 0 = 2.

Step 4: Process query "aaa". Compute f("aaa") = 3. Binary search for first value > 3 in sorted_f.

"aaa"f = 32[0]3[1]bisect = 2no values > 3answer[1] = 0

bisect_right(sorted_f, 3) returns 2. Elements from index 2 onward: none. Count = len(sorted_f) - 2 = 0.

Step 5: All queries processed. Return the result array.

answer = [2, 0]Query "a" (f=1): 2 words have f > 1 | Query "aaa" (f=3): 0 words have f > 3

Each query took O(log n) via binary search on the sorted word frequencies. Total: O(m log n) for all queries.

Complexity analysis

MetricValueReason
TimeO((n + m) * k + n log n + m log n)Computing f for all strings takes O((n+m)*k). Sorting word frequencies takes O(n log n). Each of the m queries uses O(log n) binary search.
SpaceO(n)Storing the sorted word frequency array. The result array is O(m) but typically not counted.

Where n is the number of words, m is the number of queries, and k is the maximum string length. In practice, k is at most 10 per the constraints, so the dominant terms are O(n log n + m log n).

Building blocks

Derived value computation

The function f(s) transforms a string into a single number. This is a common technique: reduce a complex object to a comparable scalar. You see the same idea in:

def f(s):
    smallest = min(s)
    return s.count(smallest)

Binary search for threshold counting

Once you have a sorted array, counting "how many elements exceed a value" is a single binary search. This pattern appears whenever you need to answer batch range queries against a fixed dataset.

from bisect import bisect_right

count_greater = len(arr) - bisect_right(arr, target)

Edge cases

  • All words have the same f value. If every word produces the same frequency, binary search still works. Queries with a lower f get the full count, and queries with equal or higher f get 0.
  • Single-character strings. f("z") = 1 for any single character. The function correctly returns 1.
  • Query f equals the maximum word f. Since we need strictly greater (not greater than or equal), bisect_right correctly excludes equal values from the count.
  • All characters in a word are the same. f("aaaa") = 4. The smallest character is the only character, and its count is the string length.
  • Words and queries have length 1. Every f value is 1, so every query returns 0 (no word has f strictly greater than 1).

From understanding to recall

The part of this solution that tends to slip away is the bisect_right vs bisect_left distinction and which arithmetic gives you "strictly greater than." Under time pressure, you might second-guess whether to subtract from the left or the right end, or whether you need bisect_left instead.

The mental anchor is: bisect_right puts the needle after all equal values. So n - bisect_right(arr, x) counts values strictly greater than x. If you wanted "greater than or equal," you would use n - bisect_left(arr, x) instead.

Spaced repetition locks in these small details. You write the solution from scratch today, again in two days, then in a week. After a few rounds, reaching for bisect_right when you need "strictly greater" becomes automatic, and you spend your interview energy on the harder parts of the problem.

Related posts