LRU and LRU-K
LRU and LRU-k
- 注:本文章为CMU15445-2024-spring project1-task1的部分讲解,为了遵守Andy Pavlo对于学术的要求,有关实验中的代码一律不予公开,只会讲解思路
实验要求
This component is responsible for tracking page usage in the buffer pool. You will implement a new class called
LRUKReplacer
insrc/include/buffer/lru_k_replacer.h
and its corresponding implementation file insrc/buffer/lru_k_replacer.cpp
. Note thatLRUKReplacer
is a stand-alone class and is not related to any of the otherReplacer
classes. You are expected to implement only the LRU-K replacement policy. You don’t have to implement LRU or a clock replacement policy, even if there is a corresponding file for it.The LRU-K algorithm evicts a frame whose backward k-distance is maximum of all frames in the replacer. Backward k-distance is computed as the difference in time between current timestamp and the timestamp of kth previous access. A frame with fewer than k historical accesses is given +inf as its backward k-distance. When multiple frames have +inf backward k-distance, the replacer evicts the frame with the earliest overall timestamp (i.e., the frame whose least-recent recorded access is the overall least recent access).
The maximum size for the
LRUKReplacer
is the same as the size of the buffer pool since it contains placeholders for all of the frames in theBufferPoolManager
. However, at any given moment, not all the frames in the replacer are considered to be evictable. The size ofLRUKReplacer
is represented by the number of evictable frames. TheLRUKReplacer
is initialized to have no frames in it. Then, only when a frame is marked as evictable, replacer’s size will increase.You will need to implement the LRU-K policy discussed in this course. You will need to implement the following methods as defined in the header file (
src/include/buffer/lru_k_replacer.h
) and in the source file (src/buffer/lru_k_replacer.cpp
):
Evict(frame_id_t* frame_id)
: Evict the frame with largest backward k-distance compared to all other evictable frames being tracked by theReplacer
. Store the frame id in the output parameter and returnTrue
. If there are no evictable frames returnFalse
.RecordAccess(frame_id_t frame_id)
: Record that given frame id is accessed at current timestamp. This method should be called after a page is pinned in theBufferPoolManager
.Remove(frame_id_t frame_id)
: Clear all access history associated with a frame. This method should be called only when a page is deleted in theBufferPoolManager
.SetEvictable(frame_id_t frame_id, bool set_evictable)
: This method controls whether a frame is evictable or not. It also controlsLRUKReplacer
’s size. You’ll know when to call this function when you implement theBufferPoolManager
. To be specific, when pin count of a page reaches 0, its corresponding frame is marked evictable and replacer’s size is incremented.Size()
: This method returns the number of evictable frames that are currently in theLRUKReplacer
.The implementation details are up to you. You are allowed to use built-in STL containers. You may assume that you will not run out of memory, but you must make sure that your implementation is thread-safe.
- 本文不会考虑thread-safe的问题,主要是为了讲解两种算法的模型
内容分析
- 这个实验的主要内容是为了实现一个
BUFFER POOL
,用于将存储数据的页暂时存储在内存中,但是一台机器的内容始终都是有限的,一旦到达阈值,我们就要踢出去一些页(从内容中移除,刷到磁盘上面去),那么怎么踢呢?本实验提供给我们的算法是LRU-K算法
LRU
-
学过OS/计算机组成的朋友一定对这个算法不陌生,以往对于cache中数据的处理中LRU就是常用的一个手法,但是我们该如何实现呢?这也就是leetcode-146,首先选取数据结构上的模型
-
首先存储数据标准或者整个数据的数据结构肯定是一个列表,但是这个列表我们是选数组还是链表呢?在LRU算法中,常常会发生的事情是之前用的数据在没有被踢出cache之前会再次被使用,而且我们踢数据通常就是在列表的末尾踢掉的,如果这个数据再次被使用,应该将其移动到列表头部,那对于移动数据均摊下来时间复杂度是$O(n)$的数组近乎是一场效率上的灾难,所以这个时候删除和插入数据的时间复杂度仅有$O(1)$的链表就成了首选
-
但是我们选了链表之后又有一个问题,对于get(key) -> value这样的操作,时间复杂度会升高到$O(n)$,我们还是不能容忍这样的效率,所以我们可以再用一部分空间来换取时间:加上一张<key, pointer of node>的哈希表,这样通过key可以直接定位到链表中的节点,时间复杂度就降到了O(1)
-
C++库的选取:因为15445要用C++,所以本题也为C++实现,这里面会遇到一个选择:list是使用STL提供好的还是手搓?我先使用的STL中的list,然后哈希表中的value存储的是list的迭代器,但是这样会出现一个隐式的bug:STL的list存在迭代器失效问题,而这个东西又是在一定条件下出现的,很难预测,所以这里手搓了一个双向链表,哈希表中的value直接换成链表中node的地址就可以了
lc-146实现
- 节点node
|
|
- LRU模板实现
|
|
- 题解直接将模板实例化就可以了
|
|
LRU-K
-
从上文看,LRU算法对于cache数据的管理已经很好了,但是这里面为什么又要引入LRU-K算法呢?
-
LRU算法存在的问题是,当存在大量的一次性操作时,会把历史的缓存冲刷掉,而新进入cache的数据有可能之后不会再访问了,被冲刷掉的数据是之前保留下来的比较“有用”的数据,这就是缓存污染问题
-
LRU-K的思路是,永远最先驱逐访问次数小于K次的page。网上的很多讲解是直接维护两个链表,一个叫做history list,另一个叫buffer list。新加入的数据总会先进入history list,当访问次数等于指定的次数K次时,就会从history list删除,并移动到buffer list的头部
-
Buffer list服从LRU算法,History List可以服从任意替换算法,在实验手册中,要求驱逐最早进入history list的page,采用的是FIFO策略
-
以上就是LRU和LRU-K算法的解释
补充
- 由于该版的实验page中有是否可以移除的标志位,如果为false表明该页无法被移除,所以说移除page的算法需要在原有的LRU-K算法的基础上面改进,按照优先级别先从History List的头向后扫描,如果有可以移除的page就直接移除,如果说History List中的页都无法移除,那么就需要从Buffer List的尾部向前扫描,发现可以移除的page就直接移除
- 这种情况的改变将删除操作的时间复杂度从$O(1)$上升到了$O(n)$,但是因为理想的模型和实际并不相符,实际中有一些page就是规定不应该被移除,所以为了和实际相符只能损失一小部分效率(毕竟主存和磁盘的容量比很小)