page reclaim 02:实现 2017-02-07
本文简化描述, 忽略compound page, 忽略HUGEPAGE, 未开启CONFIGMEMCG, 未开启CONFIGSWAP. 若无特别说明, 本文仅讨论ARM体系的情况.
1. 数据结构
per zone : active list, inactive list
struct zone {
struct lruvec lruvec;
};
struct lruvec {
struct list_head lists[NR_LRU_LISTS];
...
};
per node :kswapd
kswapd_init -> for_each_node_state(nid, N_MEMORY) kswapd_run(nid)
kswapd_run -> pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
page reclaim 03:活跃度的表示和状态转换 2017-02-07
1. page_referenced
linux-3.10.86/mm/rmap.c
* Quick test_and_clear_referenced for all mappings to a page,
* returns the number of ptes which referenced the page.
这个注释过时了, 实际功能并不是返回指向该page的pte的个数.
问题:这个函数的用途?
答:用来反应在inactive list中的page的活跃程度.
返回数值1和返回数值2是有区别的, see page_check_references
{
referenced_ptes = page_referenced(...);
if (... || referenced_ptes > 1)
return PAGEREF_ACTIVATE;
}
*/
int page_referenced(struct page *page,
int is_locked,
struct mem_cgroup *memcg,
unsigned long *vm_flags)
page reclaim 04:参数 2017-02-07
未开启CONFIGMEMCG, CONFIGSWAP.
1. scan_control
linux-3.10.86/mm/vmscan.c
struct scan_control {
/* Incremented by the number of inactive pages that were scanned */
unsigned long nr_scanned;
/* How many pages shrink_list() should reclaim
问题:nr_to_reclaim和nr_scanned的关系?
答:nr_to_reclaim是个setting data, nr_reclaimed是runtime date.
通常是先给struct scan_control sc设置好这个目标,
然后启动回收.
在 sum of shrink_list() > nr_to_reclaim后中断回收, see shrink_lruvec() or do_try_to_free_pages().
*/
unsigned long nr_to_reclaim;
/*
这里的may类似may I ..., may的意思是 是否可以, 是否允许
*/
int may_writepage;
/*
[Understanding the Linux Kernel, 3rd Edition]p695
Lower priority implies scanning more pages.
*/
int priority;
};
get_scan_count
{
size = get_lru_size(lruvec, lru);
scan = size >> sc->priority;
//扫描的量 与 list的大小 成比例
}
shrink_lruvec
|--//1. 根据优先级等 给数组nr[]赋值
|--get_scan_count(lruvec, sc, nr);
page reclaim 05:page count 2017-02-07
1. 问题引入
linux-3.10.86/mm/vmscan.c
static inline int is_page_cache_freeable(struct page *page)
{
/*
* A freeable page cache page is referenced only by the caller
* that isolated the page, the page cache radix tree and
* optional buffer heads at page->private.
*/
return page_count(page) - page_has_private(page) == 2;
}
为何是== 2?
2. 解
page reclaim 06:ARM和L_PTE_YOUNG 2017-02-07
1. pagereferencedone对硬件页表的影响
linux-3.10.86/include/asm-generic/pgtable.h
page_referenced_one -> ptep_clear_flush_young_notify -> ptep_clear_flush_young -> ptep_test_and_clear_young
{
//pte_mkold 实现上是:PTE_BIT_FUNC(mkold, &= ~L_PTE_YOUNG);
set_pte_at(vma->vm_mm, address, ptep, pte_mkold(pte));
}
linux-3.10.86/arch/arm/include/asm/pgtable.h
static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval)
{
unsigned long ext = 0;
if (addr < TASK_SIZE && pte_present_user(pteval)) {
__sync_icache_dcache(pteval);
ext |= PTE_EXT_NG;
}
/*
@pteval 是linux版的pte
硬件版的会根据ext来生成
*/
set_pte_ext(ptep, pteval, ext);
}