2018-03-22

1. 没有锁保护

有如下结构体:

struct your_obj {
    struct hlist_node obj_node_hlist;
    struct rcu_head rcu_head;   
    atomic_t refcnt;
    int id;
};

错误写法:

hlist_del_init_rcu(&your_obj->obj_node_hlist);  

正确写法:

spin_lock(&obj_hash_lock[hash]);
hlist_del_init_rcu(&your_obj->obj_node_hlist);
spin_unlock(&obj_hash_lock[hash]);

错误原因:

rcu_read_lock
hash = xxx
hlist_for_each_entry_rcu(tpos, pos, &obj_hlist[hash], obj_node_hlist) {
    xxx
}
rcu_read_unlock();

假如我们使用 hlist_for_each_entry_rcu 来遍历链表, 在没有锁保护时, 当有两个hlist_del_init_rcu在并行执行时, 可能导致我们遍历出错.

2 内存释放01

错误写法:

void your_obj_unregister(struct your_obj *your_obj)
{
    unsigned int hash = 0;

    hash = your_hash(your_obj->id);
    spin_lock(&obj_hash_lock[hash]);
    hlist_del_init_rcu(&your_obj->obj_node_hlist);
    spin_unlock(&obj_hash_lock[hash]);
    your_obj_put(your_obj);
    return;
}

int your_obj_get_if_registered(struct your_obj *your_obj)
{
    unsigned int hash = 0;
    struct hlist_node *pos;
    struct your_obj *tpos;
    int is_registered = 0;
    rcu_read_lock(); 

    hash = your_hash(your_obj->id);

    hlist_for_each_entry_rcu(tpos, pos, &obj_hlist[hash], obj_node_hlist) {
        if (tpos == your_obj) {
            is_registered = 1;
            your_obj_get(your_obj);
            break;
        }
    }
    rcu_read_unlock();
    return is_registered ? 0 : 1;

}

your_obj_put(struct your_obj *t)
{
    if(atomic_dec_and_test(&t->refcnt)){
        kfree(t);
}

正确的写法:

void your_obj_free_rcu_callback(struct rcu_head *h)
{
    struct your_obj *t = container_of(h, struct your_obj, rcu_head);
    if (!atomic_read(&t->refcnt)) {
        t->obj_node_hlist.next = NULL;
        kfree(t);
    }
}

your_obj_put(struct your_obj *t)
{
    if(atomic_dec_and_test(&t->refcnt)){        
        call_rcu(&t->rcu_head, your_obj_free_rcu_callback);
}

错误原因: 没有使用call_rcu()时, 遍历链表时, 允许hlist_del_init_rcu把节点从链表上脱离, 并释放内存.

int your_obj_get_if_registered(struct your_obj *your_obj)
{
    ...
    if (tpos == your_obj) {
                                //这时考虑发生 your_obj_put -> kfree
        is_registered = 1;
        your_obj_get(your_obj);  //修改已经释放的内存, 触发BUG
        break;
    }
}

3 内存释放02

考虑如下序列:

int your_obj_get_if_registered()                |          your_obj_unregister
{                                               |         {
    rcu_read_lock();                            |
                                                |
    hlist_for_each_entry_rcu( ... ) {           |
        if (tpos == your_obj) {                 | 
            is_registered = 1;                  |   
                                                |
                                                |           hlist_del_init_rcu
                                                |           your_obj_put
                                                |       }

            your_obj_get(your_obj); //后续导致rcu callback重复注册, 第2个callback释放已释放的内存, 故BUG.
            break;
        }
    }
    rcu_read_unlock();
}

使用引用计数时, get大概率是错误的, 需要改成kref_get_unless_zero().

4 模块

模块内有

if(atomic_dec_and_test(&t->refcnt)){
        call_rcu(&t->rcu_head, your_obj_free_rcu_callback);

那么, 模块卸载前需要rcu_barrier(). 否则造成BUG, 因为callback是模块内的函数, 而模块已卸载了, 故BUG.

5. 相关资料

RCU Usage In the Linux Kernel: One Decade Later https://pdos.csail.mit.edu/6.828/2017/readings/rcu-decade-later.pdf

https://www.kernel.org/doc/Documentation/RCU/checklist.txt

本文地址: https://awakening-fong.github.io/posts/other/rcu

转载请注明出处: https://awakening-fong.github.io


若无法评论, 请打开JavaScript, 并通过proxy.


blog comments powered by Disqus