2019-09-10

目录:

  • 1. LogAndApply, 以DBImpl::CompactMemTable()为例进行说明
  • 1.1 写 MANIFEST
  • 1.2 Builder一个用途
  • 2. Get
  • 2.1 序列号在 SkipList 查找中的作用
  • 2.2 leveldb读写可以同时进行吗?
  • 2.3 对于skiplist, 在哪些地方加锁了?
  • 2.3 没有处理 序列号 溢出/回绕问题
  • 3. manifest
  • 3.1 为何有多个 manifest 文件?
  • 3.2 manifest 存储的内容是啥:
  • 4. VersionEdit (未完成)
  • 4.1 ctor

  • 1. LogAndApply, 以DBImpl::CompactMemTable()为例进行说明

    使用场景:

    DBImpl::CompactMemTable()
    {
      VersionEdit edit;
      Version* base = versions_->current();
      WriteLevel0Table(imm_, &edit, base);
      edit.SetPrevLogNumber(0);
      edit.SetLogNumber(logfile_number_);  // Earlier logs no longer needed
      s = versions_->LogAndApply(&edit, &mutex_);
    }
    

    1.1 写 MANIFEST

    LogAndApply
    |--edit.SetLogNumber = ..
    |--Version* v = new Version(this);
    |--builder.Apply(edit) +  SaveTo
    |--Finalize(v); 计算下次从哪个level进行压缩.
    |--new_manifest_file = DescriptorFileName(...)
    | |--snprintf(buf, sizeof(buf), "/MANIFEST-%06llu",...)
    |--env_->NewWritableFile(new_manifest_file, &descriptor_file_)
    |--descriptor_log_ = new log::Writer(descriptor_file_);
    |--WriteSnapshot(descriptor_log_);
    | |--compact_pointer_, current_->files_ 等信息放入edit
    | |--edit.EncodeTo(&record); 转为string record;
    | |--descriptor_log_->AddRecord(record);
    |--edit->EncodeTo(&record);
    |--descriptor_log_->AddRecord(record) //Write new record to MANIFEST log
    |--SetCurrentFile(env_, dbname_, manifest_file_number_)
    |--AppendVersion  => current_ = xxx
    

    MANIFEST中是 一个个 VersionEdit.

    1.2 Builder一个用途

    恢复元信息的过程是逐个应用VersionEdit的过程,若不使用builder, 则该过程会产生大量的Version,但这些我们并不需要。引入VersionSet::Builder, 将所有的VersoinEdit交由Version::Builder处理,然后一次应用产生最终的Version.

    代码:

    VersionSet::Recover
    {
    
      Builder builder(this, current_);
      while (reader.ReadRecord(&record, &scratch) && s.ok()) {
        VersionEdit edit;
        s = edit.DecodeFrom(record); // 将读出来的转为VersionEdit
        builder.Apply(&edit);
      }
      Version* v = new Version(this);
      builder.SaveTo(v);
    }
    

    builder.Apply(&edit); 将 VersionEdit中关于 compact_pointer_, deleted_filesadded_files 的信息 放入 Builder. deleted_filesadded_files 是差异/变动的信息,

    从磁盘manifest读入 record, DecodeFrom解码后存入 VersionEdit, 经过Apply, 放入 Builder. 这样, 所有的record都 由 Builder 来组织了.

    2. Get

    2.1 序列号在 SkipList 查找中的作用

    template<typename Key, class Comparator>
    inline void SkipList<Key,Comparator>::Iterator::Seek(const Key& target) {
     node_ = list_->FindGreaterOrEqual(target, nullptr);
    }
    

    typedef SkipList Table;

    对于memtable, Key为char * .

    template<typename Key, class Comparator>
    inline void SkipList<Key,Comparator>:
    Iterator::Seek(const Key& target) {
      node_ = list_->FindGreaterOrEqual(target, nullptr);
    }
    
    FindGreaterOrEqual -> KeyIsAfterNode -> compare_(a, b)  
    

    Comparator const compare_; 来自模板 class Comparator

    struct KeyComparator {
      const InternalKeyComparator comparator;
      explicit KeyComparator(const InternalKeyComparator& c) : comparator(c) { }
      int operator()(const char* a, const char* b) const;
    };
    


    leveldb/db/memtable.cc:

    int MemTable::KeyComparator::operator()(const char* aptr, const char* bptr)
        const {
      // Internal keys are encoded as length-prefixed strings.
      Slice a = GetLengthPrefixedSlice(aptr);
      Slice b = GetLengthPrefixedSlice(bptr);
      return comparator.Compare(a, b);
    }
    


    leveldb/db/dbformat.cc:

    int InternalKeyComparator::Compare(const Slice& akey, const Slice& bkey) const {
      // Order by:
      //    increasing user key (according to user-supplied comparator)
      //    decreasing sequence number
      //    decreasing type (though sequence# should be enough to disambiguate)
      int r = user_comparator_->Compare(ExtractUserKey(akey), ExtractUserKey(bkey));
      if (r == 0) {
        // 序列号
        const uint64_t anum = DecodeFixed64(akey.data() + akey.size() - 8);
        const uint64_t bnum = DecodeFixed64(bkey.data() + bkey.size() - 8);
        if (anum > bnum) {
          r = -1;
        } else if (anum < bnum) {
          r = +1;
        }
      }
      return r;
    }
    

    要找到 >= 给定用户key 的项, 如果找到的项, 其key一致, 但seq比 用户给定的大 (也就是, 找到的是 新近添加的项), 那么, 这个相对于查询时的snapshot, 新近的项 会被排除, 进而查找其他项. 所以, 允许get的时候, 执行put吗?


    问题: 用户有没有提供序号啊?
    答:

    DBImpl::Get()
    |--SequenceNumber snapshot;
    |--snapshot = versions_->LastSequence() or from options.snapshot
    |--//|usize + 8| user_key| seq(7B) type(1B)|
    |--LookupKey lkey(key, snapshot);
    |--mem->Get(lkey, value, xx)
    | |--iter.Seek(..);
    

    MemTable::Get(const LookupKey& key, std::string* value, Status* s) 查询时, user key会加上seq.
    LookupKey格式:

    |usize + 8| user_key| seq(7B) type(1B)|
    


    放入跳表中,会添加seq信息, 从MemTable::Add的原型就可知:

    MemTable::Add(SequenceNumber s, ValueType type,
                       const Slice& key,
                       const Slice& value)
    

    2.2 leveldb读写可以同时进行吗?

    答: 看起来并不阻止.

    DBImpl::Write
    {
    
      MutexLock l(&mutex_);
      ...
    
      &w is currently responsible for logging
          // and protects against concurrent loggers and concurrent writes
          // into mem_.
      mutex_.Unlock();
      WriteBatchInternal::InsertInto(updates, mem_);
    
    }
    
    DBImpl::Get
    {
      MutexLock l(&mutex_);
      ...
      // Unlock while reading from files and memtables
      mutex_.Unlock();
      LookupKey lkey(key, snapshot);
      mem->Get(lkey, value, &s) or imm->Get or current->Get
    
    }
    

    2.3 对于skiplist, 在哪些地方加锁了?

    ...
    void SkipList<Key,Comparator>::Insert(const Key& key) {
      Insert() is externally synchronized.
    }
    

    2.3 没有处理 序列号 溢出/回绕问题

    按1s一个提交, uint64_t 差不多可使用584942417355年, 故无需处理回绕问题.

    3. manifest

    A MANIFEST file lists the set of sorted tables that make up each level, the corresponding key ranges, and other important metadata. (https://github.com/google/leveldb/blob/master/doc/impl.md) 列出了每层的sorted tables, 相应的key范围, 以及metadata.

    3.1 为何有多个 manifest 文件?

    情景1: manifest太大了, 换新文件进行写
    情景2:

    3.2 manifest 存储的内容是啥:

    VersionSet::Recover
    {
    std::string dscname = dbname_ + "/" + current; //为 manifest文件
    SequentialFile* file;
    s = env_->NewSequentialFile(dscname, &file)
    log::Reader reader(file, &reporter, ...)
    reader.ReadRecord(&record, &scratch)
    edit.DecodeFrom(record) 可知, 从 manifest 中读出的是 :log_number_, prev_log_number_, next_file_number_, last_sequence_, deleted_files_
    }
    

    4. VersionEdit (未完成)

    leveldb/db/db_impl.cc:

    Status DB::Open(){
      /*edit和 log关联的意义是????*/
      edit.SetLogNumber(new_log_number);  
    }
    
    VersionSet::LogAndApply(VersionEdit* edit, port::Mutex* mu) {
      if (edit->has_log_number_) {
        /*相比于VersionSet, VersionEdit内容更加新*/
        assert(edit->log_number_ >= log_number_);
      }
    }
    

    VersionSet::lognumber < VersionEdit::lognumber 对旧的VersionSet, 打上 新的VersionEdit,

    Version0 + VersionEdit = Version1

    4.1 ctor

    DBImpl::DBImpl(const Options& raw_options, const std::string& dbname):
    internal_comparator_(raw_options.comparator),
    versions_(new VersionSet(dbname_, &options_, table_cache_,
                                   &internal_comparator_)) {...}
    

    Options 默认的comparator是 BytewiseComparator() -> BytewiseComparatorImpl, 所以, DBImpl的 VersionSet 的比较器 默认是 BytewiseComparator().


    VersionSet::VersionSet():
      .略.
      dummy_versions_(this),
      current_(nullptr)
    {
      AppendVersion(new Version(this));
    }
    

    AppendVersion会修改 current_, 指向 new Version(this)

    本文地址: https://awakening-fong.github.io/posts/database/leveldb_02_snapshot

    转载请注明出处: https://awakening-fong.github.io


    若无法评论, 请打开JavaScript, 并通过proxy.


    blog comments powered by Disqus