LevelDB 02: snapshot
目录:
1. LogAndApply, 以DBImpl::CompactMemTable()为例进行说明
使用场景:
DBImpl::CompactMemTable()
{
VersionEdit edit;
Version* base = versions_->current();
WriteLevel0Table(imm_, &edit, base);
edit.SetPrevLogNumber(0);
edit.SetLogNumber(logfile_number_); // Earlier logs no longer needed
s = versions_->LogAndApply(&edit, &mutex_);
}
1.1 写 MANIFEST
LogAndApply
|--edit.SetLogNumber = ..
|--Version* v = new Version(this);
|--builder.Apply(edit) + SaveTo
|--Finalize(v); 计算下次从哪个level进行压缩.
|--new_manifest_file = DescriptorFileName(...)
| |--snprintf(buf, sizeof(buf), "/MANIFEST-%06llu",...)
|--env_->NewWritableFile(new_manifest_file, &descriptor_file_)
|--descriptor_log_ = new log::Writer(descriptor_file_);
|--WriteSnapshot(descriptor_log_);
| |--compact_pointer_, current_->files_ 等信息放入edit
| |--edit.EncodeTo(&record); 转为string record;
| |--descriptor_log_->AddRecord(record);
|--edit->EncodeTo(&record);
|--descriptor_log_->AddRecord(record) //Write new record to MANIFEST log
|--SetCurrentFile(env_, dbname_, manifest_file_number_)
|--AppendVersion => current_ = xxx
MANIFEST中是 一个个 VersionEdit.
1.2 Builder一个用途
恢复元信息的过程是逐个应用VersionEdit的过程,若不使用builder, 则该过程会产生大量的Version,但这些我们并不需要。引入VersionSet::Builder, 将所有的VersoinEdit交由Version::Builder处理,然后一次应用产生最终的Version.
代码:
VersionSet::Recover
{
Builder builder(this, current_);
while (reader.ReadRecord(&record, &scratch) && s.ok()) {
VersionEdit edit;
s = edit.DecodeFrom(record); // 将读出来的转为VersionEdit
builder.Apply(&edit);
}
Version* v = new Version(this);
builder.SaveTo(v);
}
builder.Apply(&edit);
将 VersionEdit中关于 compact_pointer_
, deleted_files
和 added_files
的信息 放入 Builder.
deleted_files
和 added_files
是差异/变动的信息,
从磁盘manifest读入 record, DecodeFrom解码后存入 VersionEdit, 经过Apply, 放入 Builder. 这样, 所有的record都 由 Builder 来组织了.
2. Get
2.1 序列号在 SkipList 查找中的作用
template<typename Key, class Comparator>
inline void SkipList<Key,Comparator>::Iterator::Seek(const Key& target) {
node_ = list_->FindGreaterOrEqual(target, nullptr);
}
typedef SkipList
对于memtable, Key为char * .
template<typename Key, class Comparator>
inline void SkipList<Key,Comparator>:
Iterator::Seek(const Key& target) {
node_ = list_->FindGreaterOrEqual(target, nullptr);
}
FindGreaterOrEqual -> KeyIsAfterNode -> compare_(a, b)
Comparator const compare_; 来自模板 class Comparator
struct KeyComparator {
const InternalKeyComparator comparator;
explicit KeyComparator(const InternalKeyComparator& c) : comparator(c) { }
int operator()(const char* a, const char* b) const;
};
leveldb/db/memtable.cc:
int MemTable::KeyComparator::operator()(const char* aptr, const char* bptr)
const {
// Internal keys are encoded as length-prefixed strings.
Slice a = GetLengthPrefixedSlice(aptr);
Slice b = GetLengthPrefixedSlice(bptr);
return comparator.Compare(a, b);
}
leveldb/db/dbformat.cc:
int InternalKeyComparator::Compare(const Slice& akey, const Slice& bkey) const {
// Order by:
// increasing user key (according to user-supplied comparator)
// decreasing sequence number
// decreasing type (though sequence# should be enough to disambiguate)
int r = user_comparator_->Compare(ExtractUserKey(akey), ExtractUserKey(bkey));
if (r == 0) {
// 序列号
const uint64_t anum = DecodeFixed64(akey.data() + akey.size() - 8);
const uint64_t bnum = DecodeFixed64(bkey.data() + bkey.size() - 8);
if (anum > bnum) {
r = -1;
} else if (anum < bnum) {
r = +1;
}
}
return r;
}
要找到 >= 给定用户key 的项, 如果找到的项, 其key一致, 但seq比 用户给定的大 (也就是, 找到的是 新近添加的项), 那么, 这个相对于查询时的snapshot, 新近的项 会被排除, 进而查找其他项. 所以, 允许get的时候, 执行put吗?
问题: 用户有没有提供序号啊?
答:
DBImpl::Get()
|--SequenceNumber snapshot;
|--snapshot = versions_->LastSequence() or from options.snapshot
|--//|usize + 8| user_key| seq(7B) type(1B)|
|--LookupKey lkey(key, snapshot);
|--mem->Get(lkey, value, xx)
| |--iter.Seek(..);
MemTable::Get(const LookupKey& key, std::string* value, Status* s)
查询时, user key会加上seq.
LookupKey格式:
|usize + 8| user_key| seq(7B) type(1B)|
放入跳表中,会添加seq信息, 从MemTable::Add的原型就可知:
MemTable::Add(SequenceNumber s, ValueType type,
const Slice& key,
const Slice& value)
2.2 leveldb读写可以同时进行吗?
答: 看起来并不阻止.
DBImpl::Write
{
MutexLock l(&mutex_);
...
&w is currently responsible for logging
// and protects against concurrent loggers and concurrent writes
// into mem_.
mutex_.Unlock();
WriteBatchInternal::InsertInto(updates, mem_);
}
DBImpl::Get
{
MutexLock l(&mutex_);
...
// Unlock while reading from files and memtables
mutex_.Unlock();
LookupKey lkey(key, snapshot);
mem->Get(lkey, value, &s) or imm->Get or current->Get
}
2.3 对于skiplist, 在哪些地方加锁了?
...
void SkipList<Key,Comparator>::Insert(const Key& key) {
Insert() is externally synchronized.
}
2.3 没有处理 序列号 溢出/回绕问题
按1s一个提交, uint64_t 差不多可使用584942417355年, 故无需处理回绕问题.
3. manifest
A MANIFEST file lists the set of sorted tables that make up each level, the corresponding key ranges, and other important metadata. (https://github.com/google/leveldb/blob/master/doc/impl.md) 列出了每层的sorted tables, 相应的key范围, 以及metadata.
3.1 为何有多个 manifest 文件?
情景1: manifest太大了, 换新文件进行写
情景2:
3.2 manifest 存储的内容是啥:
VersionSet::Recover
{
std::string dscname = dbname_ + "/" + current; //为 manifest文件
SequentialFile* file;
s = env_->NewSequentialFile(dscname, &file)
log::Reader reader(file, &reporter, ...)
reader.ReadRecord(&record, &scratch)
edit.DecodeFrom(record) 可知, 从 manifest 中读出的是 :log_number_, prev_log_number_, next_file_number_, last_sequence_, deleted_files_
}
4. VersionEdit (未完成)
leveldb/db/db_impl.cc:
Status DB::Open(){
/*edit和 log关联的意义是????*/
edit.SetLogNumber(new_log_number);
}
VersionSet::LogAndApply(VersionEdit* edit, port::Mutex* mu) {
if (edit->has_log_number_) {
/*相比于VersionSet, VersionEdit内容更加新*/
assert(edit->log_number_ >= log_number_);
}
}
VersionSet::lognumber < VersionEdit::lognumber 对旧的VersionSet, 打上 新的VersionEdit,
Version0 + VersionEdit = Version1
4.1 ctor
DBImpl::DBImpl(const Options& raw_options, const std::string& dbname):
internal_comparator_(raw_options.comparator),
versions_(new VersionSet(dbname_, &options_, table_cache_,
&internal_comparator_)) {...}
Options 默认的comparator是 BytewiseComparator() -> BytewiseComparatorImpl, 所以, DBImpl的 VersionSet 的比较器 默认是 BytewiseComparator().
VersionSet::VersionSet():
.略.
dummy_versions_(this),
current_(nullptr)
{
AppendVersion(new Version(this));
}
AppendVersion会修改 current_, 指向 new Version(this)
本文地址: https://awakening-fong.github.io/posts/database/leveldb_02_snapshot
转载请注明出处: https://awakening-fong.github.io
若无法评论, 请打开JavaScript, 并通过proxy.
blog comments powered by Disqus