page reclaim 01:概述
1. 背景/问题引入
本文不讨论 swapping (swap out to disk).
If a seldom-used page is backed by a block device (e.g., memory mappings of files) then the modified pages need not be swapped out, but can be directly synchronized with the block device. The page frame can be reused, and if the data are required again, it can be reconstructed from the source. If a page is backed by a file but cannot be modified in memory (e.g., binary executable data), then it can be discarded if it is currently not required. 通过Writing back cached data即可将这些page释放.
If a page is backed by a file but cannot be modified in memory (e.g., binary executable data), then it can be discarded if it is currently not required.
将 暂时不用的 或 很少使用的 内存回笼/回收, 给后续其他人使用. 那么, 如何界定 暂时不用 或 很少使用 呢? 这些已分配出去的内存都散落在哪里?
2. 散落在哪
如何找到这些内存呢? 啥链表吗?
答:see addtopagecachelru(), pageaddnewanonrmap()
方式1: addtopagecachelru 把page添加到 both the page cache and the LRU cache. Most importantly, it is used by mpagereadpages and dogenericmappingread, the standard functions in which the block layer ends up when reading data from a file or mapping. 当然, 实际是先添加到per cpu的struct pagevec中, 等满了再转移到global的lru中.
方式2: addtopagecachelru 将page加入tree, 故可考虑从各文件系统的inode遍历各page:
drop_caches_sysctl_handler -> iterate_supers(drop_pagecache_sb, NULL)
|--list_for_each_entry(inode, &sb->s_inodes, i_sb_list)
| |--invalidate_mapping_pages(inode->i_mapping, ...)
3. 如何判断rarely used
如上所述, page组织在链表lru中. 首次访问page会将page放到链表开头, 不过, 后续访问page, 并不会把page调整到链表lru的开头.
为了区分使用频率, 内核将较少使用的page放入inactive list中. 如果使用相对频繁, 则移动到active list中.
这样, 我们可以从特定链表中尝试对page进行回收, 因为这些链表中的page相对较少使用.
首次添加到lru的话, 通常是添加到inactive那边, 对于匿名页添加到active list, see _dofault, doanonymouspage, dowppage
4. PG_referenced
page在active list和inactive list之间迁移, 可以理解为page的状态迁移了, 而这两者的迁移仅需一步, 内核认为这个不合适, 引入了 PG_referenced.
PGreferenced 和 PGactive 的区别
在引入PG_referenced之前, When the page is accessed, the flag is set, but when is it going to be removed again? Either the kernel does not remove it automatically, but then the page would remain in the active state forever even if it would only be used very little, or not at all anymore.
为此, 我们需要能够认出 标记了PGactive但不常访问的page. 当前的方法是引入了 PGreferenced 这个标记.
page fault会将page 设置为active, 那么, 是否会一直呆在某个list中呢?
答:不会, pagecheckreferences 和 markpageaccessed 双方进行角逐.
inactive不会直接转变为active, 要先unreferenced转变为referenced.
如果D0表示 PGreferenced , D1表示 PGactive, 那么, 状态转换即为:
0b00->0b01->0b10->0b11
引入之后, A highly active page has both PGactive and PGreferenced set.
5. 何处调用 pagecheckreferences 和 markpageaccessed
调用pagecheckreferences的路径:
shrink_inactive_list -> shrink_page_list -> page_check_references
常见的调用 markpageaccessed的路径:
touch_buffer -> mark_page_accessed(bh->b_page)
do_generic_file_read -> mark_page_accessed(page)
generic_perform_write -> mark_page_accessed(page)
何时 shrink PGreferenced, PGactive?
答: 常见的有direct reclaim 和 kswapd, 以及truncate等.
6. 如何回收
如标题1中的内容所述, 对于有块设备后备的page, 在reclaim后, 若后续需要, 可以通过重新读入来恢复, 所以, 可将page的内容回写到后备设备中, 然后归还给buddy system, see shrinkinactivelist().
7. 数据结构
per zone : active list, inactive list
per node :kswapd
per cpu :lru cache, 也就是 pagevec
本文地址: https://awakening-fong.github.io/posts/mm/reclaim_01_overview
转载请注明出处: https://awakening-fong.github.io
若无法评论, 请打开JavaScript, 并通过proxy.
blog comments powered by Disqus