page reclaim 02:实现 2017-02-07

本文简化描述, 忽略compound page, 忽略HUGEPAGE, 未开启CONFIGMEMCG, 未开启CONFIGSWAP. 若无特别说明, 本文仅讨论ARM体系的情况.

1. 数据结构

per zone : active list, inactive list

struct zone {
    struct lruvec       lruvec;
};

struct lruvec {
    struct list_head lists[NR_LRU_LISTS];
    ...
};

per node :kswapd

kswapd_init -> for_each_node_state(nid, N_MEMORY) kswapd_run(nid)
kswapd_run -> pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);

page reclaim 03:活跃度的表示和状态转换 2017-02-07

1. page_referenced

linux-3.10.86/mm/rmap.c

 * Quick test_and_clear_referenced for all mappings to a page,
 * returns the number of ptes which referenced the page.
这个注释过时了, 实际功能并不是返回指向该page的pte的个数. 

 问题:这个函数的用途?
 答:用来反应在inactive list中的page的活跃程度.
 返回数值1和返回数值2是有区别的, see page_check_references
 {
     referenced_ptes = page_referenced(...);
     if (... || referenced_ptes > 1)
        return PAGEREF_ACTIVATE;

 }
 */
int page_referenced(struct page *page,
        int is_locked,
        struct mem_cgroup *memcg,
        unsigned long *vm_flags)

page reclaim 04:参数 2017-02-07

未开启CONFIGMEMCG, CONFIGSWAP.

1. scan_control

linux-3.10.86/mm/vmscan.c

struct scan_control {

    /* Incremented by the number of inactive pages that were scanned */
    unsigned long nr_scanned;

    /* How many pages shrink_list() should reclaim 
    问题:nr_to_reclaim和nr_scanned的关系?
    答:nr_to_reclaim是个setting data, nr_reclaimed是runtime date.
    通常是先给struct scan_control sc设置好这个目标, 
    然后启动回收. 
    在 sum of shrink_list() > nr_to_reclaim后中断回收, see shrink_lruvec() or do_try_to_free_pages().
    */
    unsigned long nr_to_reclaim;

    /*
    这里的may类似may I ..., may的意思是 是否可以, 是否允许
    */
    int may_writepage;

    /*
    [Understanding the Linux Kernel, 3rd Edition]p695
    Lower priority implies scanning more pages.
    */
    int priority;

};

get_scan_count
{
        size = get_lru_size(lruvec, lru);
        scan = size >> sc->priority;
        //扫描的量 与 list的大小 成比例
}

shrink_lruvec
|--//1. 根据优先级等 给数组nr[]赋值
|--get_scan_count(lruvec, sc, nr); 

page reclaim 05:page count 2017-02-07

1. 问题引入

linux-3.10.86/mm/vmscan.c

static inline int is_page_cache_freeable(struct page *page)
{

    /*
     * A freeable page cache page is referenced only by the caller
     * that isolated the page, the page cache radix tree and
     * optional buffer heads at page->private.
     */
    return page_count(page) - page_has_private(page) == 2;

}

为何是== 2?

2. 解

page reclaim 06:ARM和L_PTE_YOUNG 2017-02-07

1. pagereferencedone对硬件页表的影响

linux-3.10.86/include/asm-generic/pgtable.h

page_referenced_one -> ptep_clear_flush_young_notify -> ptep_clear_flush_young -> ptep_test_and_clear_young
{
//pte_mkold 实现上是:PTE_BIT_FUNC(mkold,     &= ~L_PTE_YOUNG);
set_pte_at(vma->vm_mm, address, ptep, pte_mkold(pte));
}

linux-3.10.86/arch/arm/include/asm/pgtable.h

static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
                  pte_t *ptep, pte_t pteval)
{
    unsigned long ext = 0;

    if (addr < TASK_SIZE && pte_present_user(pteval)) {
        __sync_icache_dcache(pteval);
        ext |= PTE_EXT_NG;
    }
    /*
    @pteval 是linux版的pte
    硬件版的会根据ext来生成
    */
    set_pte_ext(ptep, pteval, ext);
}