Design Twitter: Merge K Sorted Lists with a Heap

May 18, 20266 min read

leetcodeproblemmediumdesignhash-mapheap

Design Twitter is LeetCode problem 355. It asks you to build a simplified social network with four operations: post a tweet, follow a user, unfollow a user, and retrieve a personalized news feed. The first three are trivial with the right data structures. The fourth one, getNewsFeed, is where the problem gets interesting. It asks you to return the 10 most recent tweets from a user and all of their followees. That is a merge-K-sorted-lists problem in disguise, and once you see it that way, the heap design follows naturally.

The four operations

postTweet(userId, tweetId) -- add a new tweet for a user.
getNewsFeed(userId) -- return up to 10 most recent tweet IDs from the user and everyone they follow.
follow(followerId, followeeId) -- start following another user.
unfollow(followerId, followeeId) -- stop following another user.

The constraints are modest: at most 500 total calls, user IDs and tweet IDs up to 500. What the problem really tests is whether you can identify the hidden algorithmic structure inside getNewsFeed.

The tweets map holds each user's tweet list. The following map tracks who follows whom. getNewsFeed merges the relevant tweet lists using a max-heap.

The data structures

You need two maps:

tweets: dict[int, list[tuple[int, int]]] maps each userId to their list of tweets, where each tweet is stored as (timestamp, tweetId). Tweets are appended in order, so the list grows from oldest to newest. A global counter provides the timestamps. You never sort these lists -- they are already in insertion order, which is chronological order.

Why store the timestamp alongside the tweetId? Because when you merge tweet lists from multiple users, the only way to compare tweets is by time. The tweetId itself carries no ordering information.

following: dict[int, set[int]] maps each userId to the set of users they follow. A set makes follow and unfollow O(1), and lets you iterate over followees without duplicates. You always include the user themselves when building the feed -- the cleanest way to do this is self.following[userId] | {userId}, which creates a temporary set union without modifying the stored data.

Getting the newsfeed

For each user in the feed (the requester plus all their followees), you have a sorted list of tweets ordered by timestamp. You need the 10 most recent across all of those lists. This is exactly the merge-K-sorted-lists pattern.

The approach:

For each user in the relevant set, push their most recent tweet onto a max-heap as (-timestamp, tweetId, userId, index). Negating the timestamp turns Python's min-heap into a max-heap.
Pop the top of the heap. That is the most recent tweet globally. Append its tweetId to the result.
If that user has an older tweet (index minus 1), push it onto the heap.
Repeat until you have 10 tweets or the heap is empty.

At any moment the heap holds at most one entry per user in the feed -- the "frontier" tweet from each user's list. You never load all tweets into memory. You lazily advance through each list only as needed.

The code

import heapq
from collections import defaultdict

class Twitter:
    def __init__(self):
        self.time = 0
        self.tweets = defaultdict(list)
        self.following = defaultdict(set)

    def postTweet(self, userId: int, tweetId: int) -> None:
        self.tweets[userId].append((self.time, tweetId))
        self.time += 1

    def getNewsFeed(self, userId: int) -> list[int]:
        heap = []
        users = self.following[userId] | {userId}
        for u in users:
            if self.tweets[u]:
                idx = len(self.tweets[u]) - 1
                t, tid = self.tweets[u][idx]
                heapq.heappush(heap, (-t, tid, u, idx))
        result = []
        while heap and len(result) < 10:
            t, tid, u, idx = heapq.heappop(heap)
            result.append(tid)
            if idx > 0:
                nt, ntid = self.tweets[u][idx - 1]
                heapq.heappush(heap, (-nt, ntid, u, idx - 1))
        return result

    def follow(self, followerId: int, followeeId: int) -> None:
        self.following[followerId].add(followeeId)

    def unfollow(self, followerId: int, followeeId: int) -> None:
        self.following[followerId].discard(followeeId)

Practice this problem with spaced repetition

Step-by-step walkthrough

Step 1: postTweet(1, 5), postTweet(1, 3)

Two tweets for user 1. Stored in append order with a global timestamp. Newer tweet has higher timestamp.

Step 2: follow(1, 2), postTweet(2, 6)

User 1 now follows user 2. User 2 posts tweet 6 with timestamp 3 (the highest so far).

Step 3: getNewsFeed(1)

The heap starts with the most recent tweet from each user. Pop by descending timestamp: tweet 6 first (t=3), then tweet 3 (t=2), then tweet 5 (t=1).

Step 4: unfollow(1, 2), getNewsFeed(1)

User 1 unfollows user 2. Now getNewsFeed(1) only merges user 1's own tweets → [3, 5]. Tweet 6 is gone from the feed.

Complexity

Operation	Time	Notes
`postTweet`	O(1)	Append to list, increment counter
`follow`	O(1)	Set insert
`unfollow`	O(1)	Set discard
`getNewsFeed`	O(u log u + 10 log u)	u = number of users in feed
Space	O(T + F)	T = total tweets, F = total follow relationships

For getNewsFeed, the initial heap build pushes at most u entries, costing O(u log u). Each of the 10 pops costs O(log u), so the total is O(u log u). In practice u is small (at most a few hundred), so this is fast.

The building blocks

getNewsFeed is the merge-K-sorted-lists pattern applied to tweet streams. Instead of k linked lists, you have k tweet arrays (one per user in the feed). Instead of a min-heap on node values, you use a max-heap on timestamps. The mechanics are identical: push the frontier from each source, pop the best, advance that source.

The heap tracks exactly the "next candidate" from each source. You never compare all tweets against each other. You just compare at most k entries at a time -- the size of the heap -- which keeps each pop at O(log k).

The idx field in the heap tuple is the key implementation detail. Instead of storing pointers (like in the linked list version), you store an array index. When you pop (t, tid, u, idx), the next tweet for that user is at tweets[u][idx - 1] (since the list is oldest-first, the most recent tweet is at the end, and you walk backwards).

Edge cases

User follows themselves. The | {userId} union handles this. Without it, a user who has never followed anyone would get an empty feed even if they have their own tweets.
User has no tweets but follows someone who does. The if self.tweets[u]: guard skips empty lists. The heap only sees users who have at least one tweet.
Fewer than 10 total tweets across all followees. The while heap and len(result) < 10 condition handles this cleanly. The loop ends when the heap runs out.
Unfollow a user not currently being followed. discard silently does nothing if the element is not in the set. Using remove instead would raise a KeyError.

From understanding to recall

The tricky part of this problem is not the implementation. It is the recognition. When you see "get the 10 most recent tweets across multiple users," your brain needs to immediately reach for the merge-K-sorted-lists template. Once you have that connection, the rest writes itself: each user is a sorted list, you need a max-heap on timestamps, and you advance each list lazily.

That recognition gap is exactly what spaced repetition targets. You might read this post today and feel completely confident. But under interview pressure a week from now, the connection between "news feed" and "heap merge" is the first thing to fade. Drilling this problem at spaced intervals builds the pattern into long-term memory, so you reach for the heap immediately instead of spending 10 minutes rediscovering it.

The second thing that trips people up is the | {userId} trick and the discard vs remove distinction. These are small details that are easy to miss and easy to forget. But they show up in every run of the code, and an interviewer will notice if you get them wrong.

Build lasting recall with spaced repetition

Merge K Sorted Lists -- the core heap merge pattern used in getNewsFeed
LRU Cache -- another design problem combining multiple data structures
Find Median from Data Stream -- heap-based design for streaming data

Try CodeBricks free