MANIFEST
存儲(chǔ)引擎狀態(tài)的元數(shù)據(jù)持久化的文件
CURRENT:指向最新的MANIFEST文件
MANIFEST-<seq-no>
在RocksDB中任意時(shí)間存儲(chǔ)引擎的狀態(tài)都會(huì)保存為一個(gè)Version(也就是SST的集合)姐仅,而每次對(duì)Version的修改都是一個(gè)VersionEdit,而最終這些VersionEdit就是 組成manifest-log文件的內(nèi)容.
下面就是MANIFEST的log文件的基本構(gòu)成:
version-edit = Any RocksDB state change
version = { version-edit* }
manifest-log-file = { version, version-edit* }
= { version-edit* }
VersionSet實(shí)現(xiàn)
VersionSet::LogAndApply
->VersionSet::ProcessManifestWrites
descriptor_log_ 表示了當(dāng)前manifest-log文件的寫入句柄
manifest_writers__表示需要寫入到manifest-log文件中的writes的數(shù)組.
ManifestWriter的結(jié)構(gòu),包含了VersionEdit的數(shù)組,這個(gè)數(shù)組就是即將要寫入到manifest文件中version_edit.
// this is used to batch writes to the manifest file
struct VersionSet::ManifestWriter {
Status status;
bool done;
InstrumentedCondVar cv;
ColumnFamilyData* cfd;
const autovector<VersionEdit*>& edit_list;
explicit ManifestWriter(InstrumentedMutex* mu, ColumnFamilyData* _cfd,
const autovector<VersionEdit*>& e)
: done(false), cv(mu), cfd(_cfd), edit_list(e) {}
};
VersionSet更改(LogAndApply)
創(chuàng)建新的MANIFEST
//如果manifest的大小觸發(fā)閾值則創(chuàng)建新的manifest
...
assert(pending_manifest_file_number_ == 0);
//如果manifest的大小觸發(fā)閾值則創(chuàng)建新的manifest
if (!descriptor_log_ ||
manifest_file_size_ > db_options_->max_manifest_file_size) {
TEST_SYNC_POINT("VersionSet::ProcessManifestWrites:BeforeNewManifest");
pending_manifest_file_number_ = NewFileNumber();
batch_edits.back()->SetNextFile(next_file_number_.load());
new_descriptor_log = true;
} else {
pending_manifest_file_number_ = manifest_file_number_;
}
// This is fine because everything inside of this block is serialized --
// only one thread can be here at the same time
if (new_descriptor_log) {
// create new manifest file
ROCKS_LOG_INFO(db_options_->info_log, "Creating manifest %" PRIu64 "\n",
pending_manifest_file_number_);
std::string descriptor_fname =
DescriptorFileName(dbname_, pending_manifest_file_number_);
std::unique_ptr<WritableFile> descriptor_file;
s = NewWritableFile(env_, descriptor_fname, &descriptor_file,
opt_env_opts);
if (s.ok()) {
descriptor_file->SetPreallocationBlockSize(
db_options_->manifest_preallocation_size);
std::unique_ptr<WritableFileWriter> file_writer(new WritableFileWriter(
std::move(descriptor_file), descriptor_fname, opt_env_opts, env_,
nullptr, db_options_->listeners));
descriptor_log_.reset(
new log::Writer(std::move(file_writer), 0, false));
//創(chuàng)建完新的file writer后就將現(xiàn)在的狀態(tài)寫入Manifest
//依次寫入db狀態(tài)浪册,cf信息贞绵,數(shù)據(jù)文件信息御滩,lognumber等
s = WriteCurrentStateToManifest(descriptor_log_.get());
}
}
如果新創(chuàng)建了manifest缴川,將其寫入寫的CURRENT 文件中(通過(guò)rename保證原子性)
// If we just created a new descriptor file, install it by writing a
// new CURRENT file that points to it.
if (s.ok() && new_descriptor_log) {
s = SetCurrentFile(env_, dbname_, pending_manifest_file_number_,
db_directory);
TEST_SYNC_POINT("VersionSet::ProcessManifestWrites:AfterNewManifest");
}
Install new version
Version Edit
Version Edit Layout
Version edit是每次元數(shù)據(jù)變化時(shí)的增量(添加/刪除文件橘券,添加/刪除 column family)
Data Types
Simple data types
VarX - Variable character encoding of intX
FixedX - Fixed character encoding of intX
Complex data types
String - Length prefixed string data
+-----------+--------------------+
| size (n) | content of string |
+-----------+--------------------+
|<- Var32 ->|<-- n -->|
Version Edit Record Types and Layout
不同的狀態(tài)改變的數(shù)據(jù)記錄容客,大致格式
+-------------+------ ......... ----------+
| Record ID | Variable size record data |
+-------------+------ .......... ---------+
<-- Var32 --->|<-- varies by type -->
Comparator edit record
Log number edit record
Previous File Number edit record:
Next File Number edit record:
Last Sequence Number edit record
Max Column Family edit record
-
Deleted File edit record
Mark a file as deleted from database. +-----------------+-------------+--------------+ | kDeletedFile | level | file number | +-----------------+-------------+--------------+ <-- Var32 --->|<-- Var32 -->|<-- Var64 -->|
New File edit record
Mark a file as newly added to the database and provide RocksDB meta information.
-
File edit record with compaction information(compaction信息的)
kNewFile Level file number file size Smallest_key Largest_key Smallest_seqno Largest_seqno ... var32 var32 var64 var64 String String var64 var64 ...
-
- File edit record backward compatible
- File edit record with path information
Column family status edit record
-
Column family add edit record
Add a column family +---------------------+----------------+ | kColumnFamilyAdd | cf name | +---------------------+----------------+ <-- Var32 --->|<-- String -->|
-
Column family drop edit record
Drop all column family +---------------------+ | kColumnFamilyDrop | +---------------------+ <-- Var32 --->|