tag TOWRITE to avoid livelocking?
1. 什么是livelock
通常说的livelock指的是双方相互避让, 导致任务没有进展.
不过, 从下面的thread来看, livelock的含义是:虽然有进度, 但总有新任务, 导致无法完成, 也算livelock.
反正跟锁没有关系.
https://lkml.org/lkml/2010/11/9/593 For example when a single large file is continuously dirtied, we would never finish syncing it ... After this patch, program from http://lkml.org/lkml/2010/10/24/154 is no longer able to stall sync forever.
2. 实现
处理方法是: write_cache_pages()
中, 如果 回写控制 要求同步, 那么, 给DIRTY的再加上TOWRITE, 然后只管处理完TOWRITE的就完事了, 后面DIRTY的page就不管了.
这个套路有点眼熟, 比如 write_cache_pages()
中pagevec_lookup_tag()
把page挑选出来放在struct pagevec pvec
中.
write_cache_pages
{
if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
tag = PAGECACHE_TAG_TOWRITE;
else
tag = PAGECACHE_TAG_DIRTY;
retry:
if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
tag_pages_for_writeback(mapping, index, end);//给DIRTY的加上TOWRITE
while(...)
{
nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
if (nr_pages == 0) //完事了.
break;
...
/*
*/
if (!clear_page_dirty_for_io(page))
goto continue_unlock;
(*writepage)(page, wbc, data);
}
}
3. 自问自答
问题:上面的实现, 写好之后, 似乎并没有清掉radix tree的tag, 就放那不管了?
答: 搜索 mapping->tree_lock
, 找到 test_set_page_writeback()
:
linux-3.10.86/mm/page-writeback.c
test_set_page_writeback
{
...
radix_tree_tag_clear(&mapping->page_tree,
page_index(page),
PAGECACHE_TAG_DIRTY);
radix_tree_tag_clear(&mapping->page_tree,
page_index(page),
PAGECACHE_TAG_TOWRITE)
}
linux-3.10.86/include/linux/page-flags.h
static inline void set_page_writeback(struct page *page)
{
test_set_page_writeback(page);
}
调用set_page_writeback
的有:
__block_write_full_page -> set_page_writeback
__mpage_writepage -> set_page_writeback(page);
writepage通常是 __mpage_writepage
或 __writepage
__writepage -> mapping->a_ops->writepage -> ext2_writepage -> block_write_full_page -> block_write_full_page_endio -> __block_write_full_page
所以, write_cache_pages()
中的 (*writepage)(...)会清掉相应tag.
4. 其他livelocking
writeback: stop background/kupdate works from livelocking other works Background writeback is easily livelockable in a loop in wb_writeback() by a process continuously re-dirtying pages (or continuously appending to a file). This is in fact intended as the target of background writeback is to write dirty pages it can find as long as we are over dirty_background_threshold. But the above behavior gets inconvenient at times because no other work queued in the flusher thread's queue gets processed. In particular, since e.g. sync(1) relies on flusher thread to do all the IO for it, sync(1) can hang forever waiting for flusher thread to do the work. ... Thus we interrupt background writeback if there is some other work to do.
等等, 我们发起sync, 然后这里中断了, 啥事都没干, 返回了, 合适吗?
答: 上边的...中有 Generally, when a flusher thread has some work queued, someone submitted the work to achieve a goal more specific than what background writeback does. ...
linux-3.10.86/fs/fs-writeback.c
wb_writeback
{
if ((work->for_background || work->for_kupdate) &&
!list_empty(&wb->bdi->work_list))
break;
if (work->for_background && !over_bground_thresh(wb->bdi))
break;
}
5. more ...
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=livelocking
本文地址: https://awakening-fong.github.io/posts/io/tag_towrie_livelock
转载请注明出处: https://awakening-fong.github.io
若无法评论, 请打开JavaScript, 并通过proxy.
blog comments powered by Disqus