Top K Frequent Words: Heap with Lexicographic Tie-Breaking

May 21, 20265 min read

leetcodeproblemmediumstringshash-mapheap

Top K Frequent Words (LeetCode 692) looks similar to Top K Frequent Elements, but the twist is the tiebreaker: when two words have the same frequency, you must return the one that comes first in alphabetical order. That one detail changes the optimal approach from bucket sort to a heap with a custom comparator.

Let's build it up step by step.

The problem

Given an array of strings words and an integer k, return the k most frequent strings. Return the answer sorted by frequency from highest to lowest. If two words have the same frequency, the word with the lower alphabetical order comes first.

Example: words = ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 2

Output: ["the", "is"] because "the" appears 4 times and "is" appears 3 times.

Count word frequencies. "the" appears 4 times, "is" 3 times, "sunny" 2 times, "day" 1 time.

The first step is the same as Top K Frequent Elements: count the frequencies. The difference shows up in the second step, where you need to handle the alphabetical tiebreaker while extracting the top k words.

The approach

You could sort the frequency map entries by (-count, word) and take the first k. That works in O(n log n) time. But a min-heap of size k gets you to O(n log k), which is better when k is much smaller than n.

The trick is the comparator. In a min-heap, the "smallest" element sits at the root and gets popped first. You want to keep the top k most frequent words, so the root should hold the "weakest" candidate: the one with the lowest frequency, or if frequencies tie, the one that comes last alphabetically (since it would be the first to lose in a tiebreaker).

In Python, you can encode this by pushing (-freq, word) onto a max-heap, or by pushing (freq, neg_word_trick) onto a min-heap. The cleanest approach uses heapq.nsmallest with a negated frequency, or you can build the heap manually.

Step 1: Build the frequency map

Walk through the list once. For each word, increment its count in the hash map.

Step 2: Use a min-heap of size k = 2 with a custom comparator

Push each (frequency, word) pair. If the heap exceeds size k, pop the root. The comparator ensures lexicographic tiebreaking.

Step 3: Process each word through the heap

After processing all words, the heap holds the top k. Pop them and reverse for highest-frequency-first order.

Step 4: How the lexicographic tiebreaker works

When two words share the same frequency, the one that comes first alphabetically gets priority.

Practice heap problems

The code

import heapq
from collections import Counter

def top_k_frequent(words: list[str], k: int) -> list[str]:
    counts = Counter(words)
    return heapq.nsmallest(k, counts.keys(), key=lambda w: (-counts[w], w))

The heapq.nsmallest call sorts by the key (-counts[w], w). Negating the count means higher frequencies come first. When two words share a frequency, the raw string comparison picks the alphabetically earlier word.

Here is the manual min-heap version that shows what happens under the hood:

import heapq
from collections import Counter

def top_k_frequent(words: list[str], k: int) -> list[str]:
    counts = Counter(words)
    heap = []
    for word, freq in counts.items():
        heapq.heappush(heap, (freq, ReverseStr(word)))
        if len(heap) > k:
            heapq.heappop(heap)
    result = []
    while heap:
        freq, rev_word = heapq.heappop(heap)
        result.append(rev_word.val)
    return result[::-1]

class ReverseStr:
    def __init__(self, val: str):
        self.val = val
    def __lt__(self, other: "ReverseStr") -> bool:
        return self.val > other.val

The ReverseStr wrapper inverts the string comparison so that in the min-heap, words later in the alphabet are treated as "smaller" and sit at the root. That way, when you pop to maintain size k, you discard the word that would lose the tiebreak. After draining the heap, reverse the result to get highest-frequency-first order.

The one-liner with nsmallest is what you would use in practice. The manual version is what you would walk through in an interview to show you understand how the heap manages the tiebreaker.

Complexity analysis

Approach	Time	Space	Notes
Sort all entries	O(n log n)	O(n)	Sort by (-freq, word), slice first k
Heap (nsmallest)	O(n log k)	O(n)	Min-heap of size k with custom key
Bucket sort + sort ties	O(n + m log m)	O(n)	m = distinct words in same-frequency bucket

The heap approach at O(n log k) is the sweet spot. Bucket sort can work here, but you still need to sort within each bucket to handle the alphabetical tiebreaker, which complicates things without a clear performance win.

Drill this pattern from memory

Building blocks

Frequency counter

Count how many times each word appears. This is the same building block from Top K Frequent Elements, Group Anagrams, and Valid Anagram. The template:

from collections import Counter

counts = Counter(words)

One pass through the input, O(n) time and space. The frequency counter is the foundation for almost every "most frequent" or "least frequent" problem.

Heap with custom comparator

Maintain a min-heap of size k to track the top k items. The key difference from a plain min-heap is the comparison function. For this problem, you compare on two axes: frequency (lower is weaker) and alphabetical order (later is weaker for tiebreaking).

The general template for a min-heap of size k:

import heapq

heap = []
for item in items:
    heapq.heappush(heap, comparison_key(item))
    if len(heap) > k:
        heapq.heappop(heap)

The comparison_key function encodes your custom ordering. For top k frequent words, it is (freq, ReverseStr(word)) so that the "weakest" candidate (lowest frequency, or same frequency but alphabetically last) sits at the root and gets popped first.

This same building block appears in Kth Largest Element (plain min-heap) and any problem where you need to maintain a running top-k over a stream of data.

Edge cases

All words are the same. words = ["hi", "hi", "hi"], k = 1. One entry in the frequency map, one element in the heap. Returns ["hi"].
k equals the number of distinct words. You return everything, ordered by frequency with alphabetical tiebreaking.
Multiple words share the same frequency. This is the whole point of the problem. The comparator handles it. ["a", "b", "c"] with k = 2 returns ["a", "b"] since all have frequency 1 and a comes before b alphabetically.
Single word. words = ["hello"], k = 1. Trivial case. Returns ["hello"].
Words with mixed case. The problem specifies lowercase English letters, so you do not need to worry about case sensitivity in standard LeetCode test cases.

From understanding to recall

The frequency counting step is automatic at this point. You have written Counter(words) dozens of times. The part you will forget is the comparator. Under pressure, you might negate the frequency but forget to handle the alphabetical tiebreaker, or you might reverse the wrong direction and get the ordering backward.

The key insight to lock in: "negate frequency for descending order, use the raw string for ascending alphabetical order." That gives you the tuple (-freq, word) as your sort key. One tuple, two sorting axes, problem solved.

Spaced repetition turns this from something you understand into something you recall under pressure. Practice writing the comparator from scratch. After a few reps at increasing intervals, you will reach for (-counts[w], w) without thinking twice.

Try CodeBricks free

Top K Frequent Elements - Same frequency counting foundation, but uses bucket sort since there is no tiebreaker requirement
Kth Largest Element - Uses the same min-heap-of-size-k building block with a simpler comparator
Sort Characters By Frequency - Another frequency-based ordering problem, solved with bucket sort since order among same-frequency items does not matter