模式 03: 用户态的一些性能优化方法 2020-10-17

减少系统调用

buffer
合并

减少不必要的进程唤醒/切换

避免惊群或无效唤醒:
- pthread_cond_signal 的问题, 以及替代方法
- epoll_wait 边沿触发
worker数量, 通常是指进程数, 需要 <= vCPU数
golang 协程 (当前笔者不熟悉golang)

模式 04: trouble less 的一些模式(待更新) 2020-10-17

多线程

进程间交互通过传递消息, 而不要共享状态.

少用多线程

信号

方式1: 屏蔽一些信号, 然后使用 signalfd 或 sigwaitinfo.
方式2: 信号处理函数中仅仅设置信号到来的标志, 其他地方检测该标志.
方式3: 信号 handler 中, 往 pipe/eventfd 写入内容, 主程序通过 poll/epoll 来判断有事件到来.

关于 longhorn 快照的自问自答 2020-10-06

以下内容未经过源码查验. 如有错误, 欢迎指出.

一块1G的磁盘, 要先划定用来存储快照的区域, 这样, 实际最大寻址的就不是1G, 比如说, 只有800MB, 是这样吗?
答: 不是这样的, 而是就没有直接寻址的. 读取都需要查询映射表. 换句话说, 整块磁盘都用来存放快照.

快照是存放在文件系统上, 还是没有块设备上, 划一块区域存快照?
答: 不是存放在文件系统上.

live data是写到实际磁盘位置上(不需要映射), 对吗?
答: 好像不是.

backup 还是跟快照存放在同一块磁盘上吗?
答: 不是的. 快照向上层展示了线性的磁盘, 而其实现是映射表. backup是对快照的整合, 若放在同一个磁盘上, 则会干扰映射关系.

快照自身是需要存储地址的, 否则无法通过快照来构建 backup. backup展示的是线性的空间, 还是需要映射? 或者说, backup是否需要存储块的地址(起始地址)?
答: 文档提到 backup每块是2M, 由此可猜想, backup的组织形式地址是递增的, 但地址可以不连贯, 也就是允许 2M的块之间有断开. 所以, 需要存储块的起始地址. disaster recovery (DR) volume 才是不需要起始地址的, 因为是volume.

LevelDB 05: 性能 2019-09-13

1. 避免突发大量压缩, 导致延迟过大

2. seek 和 compaction

2.1 量化

3. BloomFilter

1. 避免突发大量压缩, 导致延迟过大

DBImpl::MakeRoomForWrite
{

  if (allow_delay &&
    versions_->NumLevelFiles(0) >= config::kL0_SlowdownWritesTrigger) {
  // We are getting close to hitting a hard limit on the number of
  // L0 files.  Rather than delaying a single write by several
  // seconds when we hit the hard limit, start delaying each
  // individual write by 1ms to reduce latency variance. 与其在达到hard limit后, 延后好几秒, 我们选择在达到soft limit后, 对每个独立的写延后1ms. Also,
  // this delay hands over some CPU to the compaction thread in
  // case it is sharing the same core as the writer. 这样, 也将一些cpu让给压缩线程.
  env_->SleepForMicroseconds(1000);

}

LevelDB 04: Compact 2019-09-12

1. 为什么要compaction?

2. 压缩基础

3. The Universal Style Compaction 和 The Level Style Compaction 的区别?

4. Leveled-Compaction

4.1 leveldb的DBImpl::DoCompactionWork

5. 参考资料

1. 为什么要compaction?

compaction可以提高数据的查询效率，没有经过compaction，需要从很多SST file去查找，而做过compaction后，只需要从有限的SST文件去查找，大大的提高了随机查询的效率，另外也可以删除过期数据。

删除过期数据? 比如相同key, 不同时期的value?

2. 压缩基础

https://github.com/facebook/rocksdb/wiki/RocksDB-Basics

The three basic constructs of RocksDB are memtable, sstfile and logfile. The memtable is an in-memory data structure - new writes are inserted into the memtable and are optionally written to the logfile. logfile是可选的. The logfile is a sequentially-written file on storage. When the memtable fills up, it is flushed to a sstfile on storage and the corresponding logfile can be safely deleted. The data in an sstfile is sorted to facilitate easy lookup of keys.