Finding MK Average: Sorted Window Design Pattern
Finding MK Average is LeetCode 1825, rated Hard. You need to design an MKAverage class that is initialized with two integers, m and k. It supports two operations:
addElement(num)addsnumto a stream of numbers.calculateMKAverage()looks at the lastmelements in the stream, sorts them, removes the smallestkand the largestk, and returns the floor of the mean of the remainingm - 2*kelements. If there are fewer thanmelements in the stream, it returns-1.
For example, with m = 3 and k = 1, after adding [3, 1, 10], the sorted window is [1, 3, 10]. You trim the bottom 1 element (1) and the top 1 element (10), leaving [3]. The MK average is 3 // 1 = 3.
Why this problem matters
Trimmed means are everywhere in real systems. Rating platforms remove outlier scores before computing averages. Sensor networks discard noisy readings from the extremes. Financial systems strip the highest and lowest bids before pricing. This problem captures that exact pattern: maintain a sliding window of recent data, trim the extremes, and compute the average of what remains.
The challenge is doing it efficiently. A brute-force approach that sorts the window from scratch on every query is too slow when the stream is long. You need a data structure that supports fast insertion, deletion, and sum queries over sorted partitions.
The key insight
Instead of re-sorting the entire window every time, maintain the last m elements in a sorted data structure. When a new element arrives, insert it into the sorted structure and remove the element that just fell out of the window. To compute the MK average, you need the sum of the middle m - 2*k elements.
The cleanest approach uses a SortedList (from Python's sortedcontainers library). It supports O(log n) insertion and removal, and you can index into it directly. You track the total sum of all m elements in the window. When you need the MK average, subtract the sum of the bottom k elements and the top k elements from the total, then divide by m - 2*k.
This avoids sorting from scratch on each call. The sorted list maintains order incrementally, and the running total lets you compute the middle sum in O(k) time per query.
The running total is the trick that makes this practical. Instead of summing the middle elements from scratch every time, you maintain the total of all m elements and subtract the bottom k and top k sums. This turns the average calculation into a subtraction problem.
The solution
from sortedcontainers import SortedList
from collections import deque
class MKAverage:
def __init__(self, m: int, k: int):
self.m = m
self.k = k
self.queue = deque()
self.sl = SortedList()
self.total = 0
def addElement(self, num: int) -> None:
self.queue.append(num)
self.sl.add(num)
self.total += num
if len(self.queue) > self.m:
old = self.queue.popleft()
self.sl.remove(old)
self.total -= old
def calculateMKAverage(self) -> int:
if len(self.queue) < self.m:
return -1
bottom_sum = sum(self.sl[i] for i in range(self.k))
top_sum = sum(self.sl[self.m - self.k + i] for i in range(self.k))
middle_sum = self.total - bottom_sum - top_sum
return middle_sum // (self.m - 2 * self.k)
Detailed explanation
The queue (a deque) tracks the order elements arrived. It tells you which element to evict when the window is full. The sl (a SortedList) maintains those same elements in sorted order so you can index into the smallest and largest values efficiently.
When addElement is called, the new number goes into both the deque and the sorted list. If the deque exceeds size m, the oldest element is popped from the front of the deque and removed from the sorted list. The running total is updated in both directions: add the new element, subtract the evicted one.
When calculateMKAverage is called, you first check if there are enough elements. If not, return -1. Otherwise, the sorted list contains exactly m elements in sorted order. The bottom k elements are at indices 0 through k - 1, and the top k are at indices m - k through m - 1. You compute their sums, subtract both from the total, and floor-divide by m - 2*k.
The floor division is important. The problem asks for the integer part of the mean, rounding toward zero. Python's // operator does floor division, which for positive numbers gives the same result as truncation.
Visual walkthrough
Here is a step-by-step trace with m = 3 and k = 1. Watch how the sorted window slides and the trimmed average updates.
Step 1: addElement(3) -- stream = [3]
Only 1 element so far. We need at least m=3 elements. calculateMKAverage returns -1.
Step 2: addElement(1) -- stream = [3, 1]
Only 2 elements. Still fewer than m=3. calculateMKAverage returns -1.
Step 3: addElement(10) -- stream = [3, 1, 10]
Now we have m=3 elements. Sorted: [1, 3, 10]. Remove bottom k=1 (value 1) and top k=1 (value 10). Middle = [3]. Average = 3 // 1 = 3.
Step 4: addElement(5) -- stream = [3, 1, 10, 5]
Window slides to last 3: [1, 10, 5]. Sorted: [1, 5, 10]. Remove bottom 1 and top 10. Middle = [5]. Average = 5 // 1 = 5.
Step 5: addElement(5) -- stream = [3, 1, 10, 5, 5]
Window slides to last 3: [10, 5, 5]. Sorted: [5, 5, 10]. Remove bottom 5 and top 10. Middle = [5]. Average = 5 // 1 = 5.
Step 6: addElement(5) -- stream = [..., 5, 5, 5]
Window is [5, 5, 5]. Sorted: [5, 5, 5]. Remove bottom 5 and top 5. Middle = [5]. Average = 5 // 1 = 5.
Notice how each step only needs to insert one element and possibly remove one from the sorted list. The window never exceeds size m, so operations stay fast regardless of how many total elements have been added to the stream.
Complexity analysis
| Operation | Time | Space |
|---|---|---|
addElement | O(log m) | O(m) |
calculateMKAverage | O(k) | O(1) extra |
| Overall space | - | O(m) |
addElement. Inserting into and removing from the SortedList each take O(log m) time. The deque operations are O(1). Updating the running total is O(1). So each call to addElement costs O(log m).
calculateMKAverage. You iterate over k elements at each end to compute the bottom and top sums. This is O(k) per call. If k is large relative to m, you could further optimize this with a Fenwick tree or segmented sums, but for most practical inputs this is fast enough.
Space. The deque and sorted list each store at most m elements, so total space is O(m).
Building blocks
Sorted containers for sliding windows
The SortedList from Python's sortedcontainers library is the workhorse here. It gives you O(log n) insertion and removal with O(1) indexed access. This combination is rare and powerful: you get the benefits of a balanced BST with the convenience of array indexing. Whenever a problem requires maintaining a sorted collection that changes incrementally, a SortedList is the right tool. You will see this pattern in problems involving running medians, order statistics, and ranked data.
Running sums with incremental updates
Instead of recomputing the sum of the middle elements from scratch, you maintain a running total and adjust it as elements enter and leave the window. This technique of maintaining aggregate statistics incrementally rather than recomputing them is a fundamental optimization. It applies to sliding window sums, moving averages, and any scenario where the data changes by small increments but the aggregate is needed frequently.
Edge cases
- Fewer than m elements. If the stream has not yet received
mvalues,calculateMKAveragereturns-1. The deque length check handles this. - All elements are equal. If every element in the window is the same value, the bottom
kand topkhave the same value as the middle. The trimmed average equals that value. - k = 0. No trimming occurs. The MK average is the floor mean of all
melements in the window. The code handles this naturally sincebottom_sumandtop_sumare both 0. - m - 2*k = 1. Only one element remains after trimming. The average is just that element. No division rounding issues arise.
- Large values. Elements can be up to 10^5 and
mcan be up to 10^5. The running sum can grow large, but Python handles arbitrary precision integers, so overflow is not a concern.
From understanding to recall
You have seen how a SortedList combined with a running total turns a seemingly expensive operation (sort, trim, average) into an efficient incremental update. The data structure design is the core challenge here: choosing the right container so that each operation stays within its time budget.
But knowing the approach once is not the same as being able to reconstruct it under pressure. The details matter: remembering to use a deque for eviction order, a sorted list for rank queries, and a running total to avoid recomputation. These are distinct building blocks that combine to form the solution.
Spaced repetition locks in each piece separately. CodeBricks breaks this problem into its components, sorted containers, sliding window management, and incremental sum tracking, and drills them at intervals tuned to your forgetting curve. After a few review cycles, the design comes together naturally when you encounter similar data structure problems.
Related posts
- Sliding Window Maximum - Maintaining order in a sliding window
- Find Median from Data Stream - Two-heap design for streaming statistics
- Kth Largest Element in a Stream - Heap-based streaming design