2017-01-03
主要介绍writeback中的livelocking.

1. 什么是livelock

通常说的livelock指的是双方相互避让, 导致任务没有进展.
不过, 从下面的thread来看, livelock的含义是:虽然有进度, 但总有新任务, 导致无法完成, 也算livelock.
反正跟锁没有关系.

https://lkml.org/lkml/2010/11/9/593 For example when a single large file is continuously dirtied, we would never finish syncing it ... After this patch, program from http://lkml.org/lkml/2010/10/24/154 is no longer able to stall sync forever.

2. 实现

处理方法是: write_cache_pages()中, 如果 回写控制 要求同步, 那么, 给DIRTY的再加上TOWRITE, 然后只管处理完TOWRITE的就完事了, 后面DIRTY的page就不管了.
这个套路有点眼熟, 比如 write_cache_pages()pagevec_lookup_tag()把page挑选出来放在struct pagevec pvec中.

write_cache_pages
{

    if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
        tag = PAGECACHE_TAG_TOWRITE;
    else
        tag = PAGECACHE_TAG_DIRTY;
retry:
    if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
        tag_pages_for_writeback(mapping, index, end);//给DIRTY的加上TOWRITE

    while(...)
    {
        nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
                  min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
        if (nr_pages == 0) //完事了.
            break;
        ...

            /*

            */
            if (!clear_page_dirty_for_io(page))
                goto continue_unlock;
            (*writepage)(page, wbc, data);

    }
}


3. 自问自答

问题:上面的实现, 写好之后, 似乎并没有清掉radix tree的tag, 就放那不管了?
答: 搜索 mapping->tree_lock, 找到 test_set_page_writeback():

linux-3.10.86/mm/page-writeback.c

test_set_page_writeback
{
    ...
            radix_tree_tag_clear(&mapping->page_tree,
                        page_index(page),
                        PAGECACHE_TAG_DIRTY);

        radix_tree_tag_clear(&mapping->page_tree,
                     page_index(page),
                     PAGECACHE_TAG_TOWRITE)

}

linux-3.10.86/include/linux/page-flags.h

static inline void set_page_writeback(struct page *page)
{
    test_set_page_writeback(page);
}

调用set_page_writeback的有:

__block_write_full_page -> set_page_writeback
__mpage_writepage -> set_page_writeback(page);

writepage通常是 __mpage_writepage__writepage

__writepage ->  mapping->a_ops->writepage -> ext2_writepage -> block_write_full_page -> block_write_full_page_endio -> __block_write_full_page

所以, write_cache_pages()中的 (*writepage)(...)会清掉相应tag.

4. 其他livelocking

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=aa373cf550994623efb5d49a4d8775bafd10bbc1

writeback: stop background/kupdate works from livelocking other works

Background writeback is easily livelockable in a loop in wb_writeback() by a process continuously re-dirtying pages (or continuously appending to a file). This is in fact intended as the target of background writeback is to write dirty pages it can find as long as we are over dirty_background_threshold.

But the above behavior gets inconvenient at times because no other work queued in the flusher thread's queue gets processed. In particular, since e.g. sync(1) relies on flusher thread to do all the IO for it, sync(1) can hang forever waiting for flusher thread to do the work.

...

Thus we interrupt background writeback if there is some other work to do.

等等, 我们发起sync, 然后这里中断了, 啥事都没干, 返回了, 合适吗?
答: 上边的...中有 Generally, when a flusher thread has some work queued, someone submitted the work to achieve a goal more specific than what background writeback does. ...

linux-3.10.86/fs/fs-writeback.c

wb_writeback
{

        if ((work->for_background || work->for_kupdate) &&
            !list_empty(&wb->bdi->work_list))
            break;
        if (work->for_background && !over_bground_thresh(wb->bdi))
            break;
}

5. more ...

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=livelocking

本文地址: https://awakening-fong.github.io/posts/io/tag_towrie_livelock

转载请注明出处: https://awakening-fong.github.io


若无法评论, 请打开JavaScript, 并通过proxy.


blog comments powered by Disqus