SLUB 04:tid

2017-01-14

1. tid

linux-3.10.86/include/linux/slub_def.h

struct kmem_cache_cpu {
    ...
    unsigned long tid;  /* Globally unique transaction id */
    ..
};

linux-3.10.86/mm/slub.c

#ifdef CONFIG_PREEMPT
/*
 * Calculate the next globally unique transaction for disambiguiation
 * during cmpxchg. The transactions start with the cpu number and are then
 * incremented by CONFIG_NR_CPUS.

问题:一次不是加1, 而是加CONFIGNRCPUS, 为啥?
答:这个设计是为了让任何时刻每个cpu的tid值都不一样. 不过上面的注释有点老, 因为实际并不是加CONFIGNRCPUS, 而是TID_STEP.

#define TID_STEP  roundup_pow_of_two(CONFIG_NR_CPUS)

先看下为何加CONFIGNRCPUS也是错误的:
linux-3.10.86/mm/slub.c

static void init_kmem_cache_cpus(struct kmem_cache *s)
{
    int cpu;

    for_each_possible_cpu(cpu)
        per_cpu_ptr(s->cpu_slab, cpu)->tid = init_tid(cpu);
}


每个cpu对tid+ CONFIG_NR_CPUS, 本例中是3个cpu:
        cpu0 cpu1  cpu2
初始值:  0    1      2
         3    4      5
         6    7      0

cpu2的tid 和 cpu0的tid 可能出现相同的值, 本例中为0, BUG出现.
如果cpu个数不是 2^x的话, 如果每次递增CONFIGNRCPUS的话, 就会出现上面例子的BUG.
可以改为递增2^x, 这样, 相当于各个cpu的低xbit Dx-1~D0不变, 只有较高位在变. 这样就可以保证各cpu的tid永不相同.

2. 应用

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9aabf810a67cd97e2d1a48f0bab338b7680f1929

We had to insert a preempt enable/disable in the fastpath a while ago in order to guarantee that tid and kmemcachecpu are retrieved on the same cpu. It is the problem only for CONFIGPREEMPT in which scheduler can move the process to other cpu during retrieving data. Now, I reach the solution to remove preempt enable/disable in the fastpath. If tid is matched with kmemcachecpu's tid after tid and kmemcachecpu are retrieved by separate thiscpu operation, it means that they are retrieved on the same cpu. If not matched, we just have to retry it. With this guarantee, preemption enable/disable isn't need at all even if CONFIG_PREEMPT, so this patch removes it.

I saw roughly 5% win in a fast-path loop over kmemcachealloc/free in CONFIG_PREEMPT. (14.821 ns -> 14.049 ns)

将_kmalloc -> slaballoc -> slaballocnode 中:

preempt_disable();
c = __this_cpu_ptr(s->cpu_slab);
tid = c->tid;
preempt_enable();
object = c->freelist;
page = c->page;

改为:

 * We should guarantee that tid and kmem_cache are retrieved on
 * the same cpu. It could be different if CONFIG_PREEMPT so we need
 * to check if it is matched or not.
 */
do {
    tid = this_cpu_read(s->cpu_slab->tid);
    c = raw_cpu_ptr(s->cpu_slab);
} while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid));

/*
 * Irqless object alloc/free algorithm used here depends on sequence
 * of fetching cpu_slab's data. tid should be fetched before anything
 * on c to guarantee that object and page associated with previous tid
 * won't be used with current tid. If we fetch tid first, object and
 * page could be one associated with next tid and our alloc/free
 * request will be failed. In this case, we will retry. So, no problem.
 */
barrier();
object = c->freelist;
page = c->page;

本文地址: https://awakening-fong.github.io/posts/mm/slub_tid

转载请注明出处: https://awakening-fong.github.io

内存管理/features 11

SLUB 4

若无法评论, 请打开JavaScript, 并通过proxy.