越而胜己 / Beyond awesome

Link to problem: https://leetcode.com/problems/find-median-from-data-stream/

After observing the problem description, we can see that we have to store every number we have seen. The problem is how to store these numbers so that we can efficiently compute the median at any point.

We can use a max-heap and a min-heap to always expose the median. We use the max-heap to store the smaller half of the numbers, and the min-heap to store the bigger half of the numbers. When we have stored $2k+1$ numbers, the median is just the top element of the max-heap; when we have stored $2k$ numbers, the median is the average of the top elements of both heaps. Then we just need to figure out how to balance the size of the heaps to preserve the above property.

We need $\O(n)$ space to store all $n$ elements. When each number is added we are adjusting the heaps in $\O(\log k)$ time. So the overall time complexity is $\O(n \log k)$.

观察题目之后,可以发现我们必须用某种方式保存见过的所有值。关键则在于如何存储可以随时快速计算中位数。

(经过答案的提示,)我们可以使用一个最大堆 maxHeap 和一个最小堆 minHeap 来始终暴露中位数的值。maxHeap 存储较小的一半数字,minHeap 存储较大的一半数字:当存储的数字数量为奇数 $2k+1$ 个时,我们保证 maxHeap 中含有 $k+1$ 个数字,而 minHeap 中含有 $k$ 个数字,此时中位数即为 maxHeap 堆顶的数字;当存储的数字数量为偶数 $2k$ 个时,我们保证 maxHeapminHeap 中各有 $k$ 个数字,此时中位数即为 maxHeap 堆顶数字与 minHeap 堆顶数字的平均值。接下来我们只需要考虑如何保持这两个堆的大小满足以上性质。

保存所有 $n$ 个数字需要 $\O(n)$ 空间。每加入一个数字,调整两个堆需要 $\O(\log k)$ 时间,所以总的时间复杂度为 $\O(n \log k)$.

import heapq

class MedianFinder:

    def __init__(self):
        """
        initialize your data structure here.
        """
        self.maxHeap = []
        self.minHeap = []

    def addNum(self, num: int) -> None:
        if not self.maxHeap or num <= -self.maxHeap[0]:
            heapq.heappush(self.maxHeap, -num)
        else:
            heapq.heappush(self.minHeap, num)
        self._balance()

    def findMedian(self) -> float:
        if self._size() % 2 == 1:
            return -self.maxHeap[0]
        else:
            return (-self.maxHeap[0] + self.minHeap[0]) / 2

    def _size(self) -> int:
        return len(self.minHeap) + len(self.maxHeap)
    
    def _balance(self) -> None:
        lsize = len(self.maxHeap)
        rsize = len(self.minHeap)
        if lsize == rsize or lsize == rsize + 1:
            return
        if lsize > rsize:
            heapq.heappush(self.minHeap, -heapq.heappop(self.maxHeap))
        else:
            heapq.heappush(self.maxHeap, -heapq.heappop(self.minHeap))

# Your MedianFinder object will be instantiated and called as such:
# obj = MedianFinder()
# obj.addNum(num)
# param_2 = obj.findMedian()