Skip to content
← All posts

Tweet Counts Per Frequency: Design with Sorted Buckets

7 min read
leetcodeproblemmediumdesignhash-mapsorting

You need to design a TweetCounts class that supports two operations. recordTweet(tweetName, time) stores the name of a tweet at a given time (in seconds). getTweetCountsPerFrequency(freq, tweetName, startTime, endTime) returns a list of integers representing how many tweets with that name fell into each time bucket within the range [startTime, endTime]. The frequency freq is one of "minute", "hour", or "day", corresponding to bucket sizes of 60, 3600, and 86400 seconds respectively.

This is LeetCode 1348: Tweet Counts Per Frequency, and it is a great exercise in combining hash map design with sorted data and bucket counting. The problem tests your ability to organize data for efficient insertion and range queries.

Timeline for "tweet3" with minute-frequency buckets01020304050607001060Bucket 0: count=2B1: 1[0, 59][60, 60]Bucket 0 tweetsBucket 1 tweetsResult: [2, 1]
Three tweets at times 0, 10, and 60. Querying with "minute" frequency over [0, 60] creates two 60-second buckets. Bucket 0 holds times 0 and 10. Bucket 1 holds time 60.

Why this problem matters

Design problems like this show up in system design interviews and coding rounds alike. The core challenge is choosing the right data structure so that both recording and querying are efficient. You need a hash map for fast lookups by tweet name, and sorted storage so you can quickly count how many tweets fall within a given bucket range.

The bucket-splitting logic is also worth internalizing. Dividing a time range into fixed-size chunks and counting items per chunk is a pattern that appears in time-series databases, histogram generation, and rate limiting.

The key insight

Store each tweet name's timestamps in a sorted list. When a query comes in, compute the bucket size from the frequency (60 for minute, 3600 for hour, 86400 for day). Then iterate through the buckets. Each bucket i covers the range from startTime + i * delta to min(startTime + (i + 1) * delta - 1, endTime). For each bucket, count how many stored timestamps fall within that range.

Using bisect (binary search) on the sorted list makes counting efficient. For each bucket, find the leftmost index at the bucket start and the rightmost index at the bucket end. The difference gives you the count.

The algorithm:

  1. Maintain a defaultdict(list) mapping tweet names to sorted lists of timestamps.
  2. On recordTweet, insert the time into the correct list using bisect.insort to maintain sorted order.
  3. On getTweetCountsPerFrequency, compute delta from the frequency string. Walk through each bucket, and for each one use bisect_left and bisect_right to count tweets in O(log n) time.

The solution

from collections import defaultdict
from bisect import insort, bisect_left, bisect_right

class TweetCounts:
    def __init__(self):
        self.tweets = defaultdict(list)

    def recordTweet(self, tweetName: str, time: int) -> None:
        insort(self.tweets[tweetName], time)

    def getTweetCountsPerFrequency(
        self, freq: str, tweetName: str, startTime: int, endTime: int
    ) -> list[int]:
        delta = {"minute": 60, "hour": 3600, "day": 86400}[freq]
        times = self.tweets[tweetName]
        result = []

        t = startTime
        while t <= endTime:
            bucket_end = min(t + delta - 1, endTime)
            lo = bisect_left(times, t)
            hi = bisect_right(times, bucket_end)
            result.append(hi - lo)
            t += delta

        return result

Let's walk through what each piece does.

The constructor creates a defaultdict(list). Each key is a tweet name, and the value is a sorted list of all recorded times for that tweet.

recordTweet uses bisect.insort to insert the new time into the correct position in the sorted list. This keeps the list sorted at all times, which is critical for the query step. The insertion itself is O(log n) for the binary search plus O(n) for the list shift, but for practical input sizes this is fast enough.

getTweetCountsPerFrequency first maps the frequency string to a bucket size in seconds. Then it walks through the time range bucket by bucket. For each bucket, it computes the bucket's start (t) and end (min(t + delta - 1, endTime)). The bisect_left call finds the first timestamp that is at or after t, and bisect_right finds the first timestamp that is after bucket_end. The difference hi - lo is the count of tweets in that bucket.

The last bucket might be shorter than delta seconds. That is why you take min(t + delta - 1, endTime) as the bucket end. This ensures you never count tweets beyond endTime, even if the range does not divide evenly by the bucket size.

Visual walkthrough

Let's trace through the example step by step. We record three tweets for "tweet3" at times 0, 60, and 10, then run two different frequency queries.

Step 1: recordTweet("tweet3", 0)

0102030405060700

Store: {"tweet3": [0]}. One tweet recorded so far.

Step 2: recordTweet("tweet3", 60)

010203040506070060

Store: {"tweet3": [0, 60]}. Inserted in sorted order.

Step 3: recordTweet("tweet3", 10)

01020304050607001060

Store: {"tweet3": [0, 10, 60]}. Use bisect to insert 10 in sorted position.

Step 4: getTweetCountsPerFrequency("minute", "tweet3", 0, 59)

010203040506070010602

Frequency = minute, so delta = 60. Range [0, 59] fits in one bucket [0, 59]. Tweets at 0 and 10 fall inside. Result: [2].

Step 5: getTweetCountsPerFrequency("minute", "tweet3", 0, 60)

0102030405060700106021

Range [0, 60] needs two buckets. Bucket 0 = [0, 59] has 2 tweets. Bucket 1 = [60, 60] has 1 tweet. Result: [2, 1].

Notice how the query result depends on whether endTime falls exactly on a bucket boundary. With endTime = 59, the entire range fits in one bucket. With endTime = 60, a second bucket is needed, and the tweet at time 60 lands in it.

Complexity analysis

OperationTimeSpace
recordTweetO(n)O(1) amortized
getTweetCountsPerFrequencyO(k * log n)O(k)
Overall spaceO(total tweets)

Here n is the number of recorded tweets for a given name, and k is the number of buckets in the query range (roughly (endTime - startTime) / delta + 1).

recordTweet does a sorted insertion. The binary search to find the position is O(log n), but shifting elements in a Python list to make room is O(n) in the worst case. For the constraints in this problem (up to 10,000 calls), this is acceptable.

getTweetCountsPerFrequency iterates through k buckets, doing two binary searches per bucket. Each binary search is O(log n), so the total is O(k * log n). The result list has k entries, so space is O(k).

The building blocks

1. Hash map with sorted value lists

The pattern of maintaining a defaultdict(list) where each value list is kept sorted is a reusable building block. You see it in problems involving time-series data, event logs, and any scenario where you need fast lookups by key and range queries by value.

from collections import defaultdict
from bisect import insort, bisect_left, bisect_right

store = defaultdict(list)

def insert(key, value):
    insort(store[key], value)

def count_in_range(key, lo, hi):
    times = store[key]
    return bisect_right(times, hi) - bisect_left(times, lo)

This combination gives you O(log n) range counting after each insertion. The same building block powers problems like Time Based Key-Value Store (LeetCode 981) and Calendar Booking (LeetCode 729).

2. Fixed-size bucket splitting

Dividing a range into fixed-size chunks is a general technique for frequency counting. Given a start, an end, and a bucket size, the loop pattern is:

t = start
while t <= end:
    bucket_end = min(t + delta - 1, end)
    # process items in [t, bucket_end]
    t += delta

The min on the last bucket is the detail that prevents off-by-one errors. This same pattern appears in histogram generation, time-series aggregation, and rate-limiter implementations. Once you have it memorized, you can apply it without re-deriving the boundary math each time.

Edge cases

Before submitting, think through these scenarios:

  • Single tweet in range: if only one tweet is recorded and it falls within the query range, the bucket containing it returns 1 and all other buckets return 0.
  • No tweets for the name: the sorted list is empty. Every bucket returns 0. The result is a list of zeroes.
  • startTime equals endTime: the range is a single second. There is exactly one bucket. Count how many tweets happened at exactly that second.
  • Tweet at exact bucket boundary: a tweet at time 60 with startTime = 0 and freq = "minute" falls into bucket 1 (range [60, 119]), not bucket 0 (range [0, 59]). The boundary math t + delta - 1 handles this correctly.
  • Large time gaps: if startTime = 0 and endTime = 86400 with freq = "minute", you get 1441 buckets. The loop runs 1441 times, each doing O(log n) binary searches. This is efficient as long as n is not enormous.
  • Multiple tweets at the same time: bisect.insort handles duplicates correctly. bisect_right - bisect_left counts all of them.

From understanding to recall

You have seen how to combine a hash map, sorted insertion, and bucket splitting to solve this design problem. The code is clean and each piece has a clear purpose. But reproducing it under interview pressure is a different matter.

The details that trip people up: getting the bucket boundary math right (t + delta - 1 vs t + delta), remembering to use bisect_left for the lower bound and bisect_right for the upper bound, and handling the last bucket with min. These are not conceptual gaps. They are recall gaps, and they cost time when it counts.

Spaced repetition is how you close them. You practice writing the sorted-insertion hash map, the bucket-splitting loop, and the bisect counting from memory at increasing intervals. After a few rounds, the pattern is automatic. You see "count events per time interval" in a problem description, and the template flows out without hesitation.

Related posts

  • Design Twitter - Another design problem involving tweet storage, combining hash maps with sorted data and heap-based feeds
  • Time Based Key Value Store - Uses the same sorted-list-with-binary-search pattern for timestamp-based lookups
  • Design HashMap - The foundational hash map design problem that underlies the storage layer in this solution

CodeBricks breaks Tweet Counts Per Frequency into its sorted-insertion and bucket-splitting building blocks, then drills them independently with spaced repetition. You type each piece from scratch until the pattern is automatic. When a design problem with time-range queries shows up in your interview, you do not hesitate. You just write it.