??RocksDB對(duì)文件系統(tǒng)和存儲(chǔ)介質(zhì)是不可知的禁悠。文件系統(tǒng)的操作不是原子的刮便,所以很有可能在系統(tǒng)故障時(shí)導(dǎo)致不一致讹蘑。即使打開了日志記錄功能,文件系統(tǒng)也不能在unclean restart時(shí)保證一致性功偿。POSIX文件系統(tǒng)也不支持批量操作的原子性盆佣。所以,在RocksDB重啟時(shí)械荷,不能依靠存儲(chǔ)在RocksDB data store file中的元信息來重建啟動(dòng)前的一致性狀態(tài)共耍。
??RocksD有一個(gè)內(nèi)建的機(jī)制來克服POSIX文件系統(tǒng)的各種限制,這種機(jī)制就是通過一個(gè)MANIFEST文件記錄RocksDB狀態(tài)改變的所有事物日志吨瞎。所以痹兜,MANIFEST文件可以在DB重啟時(shí)恢復(fù)到最近一次的一致性狀態(tài)。
- MANIFEST
記錄RocksDB狀態(tài)改變的所有transction log的一種機(jī)制 - Manifest log
一個(gè)單獨(dú)的文件关拒,記錄了RocksDB狀態(tài)的snapshot和edit log - CURRENT
最新最近的manifest log
how does it work?
??MANIFEST是RocksDB狀態(tài)變更的transction log 記錄佃蚜。MANIFEST包含 manifest log文件和最新的manifest 文件指針庸娱。Manifest logs是滾動(dòng)的日子文件,命名為MANIFEST-(序列號(hào)谐算,遞增)熟尉。CURRENT是一個(gè)特殊的文件用于指定latest manifest log 文件。
??在系統(tǒng)啟動(dòng)或者重啟時(shí)洲脂,latest manifest log文件包含了RocksDB的一致性狀態(tài)斤儿。RocksDB后續(xù)的所有變更記錄都寫入這個(gè)manifest log 文件。當(dāng)一個(gè)manifest log 文件超過了固定大小時(shí)恐锦,一個(gè)新的manifest文件會(huì)被創(chuàng)建往果,這個(gè)新的文件是基于當(dāng)前DB的snapshot。CURRENT中的最新manifest 文件指針會(huì)更新一铅,同時(shí)sync到文件系統(tǒng)中陕贮。一旦成功寫入CURRENT file,冗余的manifest log文件會(huì)被清除潘飘。
MANIFEST = { CURRENT, MANIFEST-<seq-no>* }
CURRENT = File pointer to the latest manifest log
MANIFEST-<seq no> = Contains snapshot of RocksDB state and subsequent modifications
Version Edit
??RocksDB在特定時(shí)間的特定狀態(tài)會(huì)關(guān)聯(lián)一個(gè)version肮之。針對(duì)version的任何改動(dòng)都被視為一個(gè)version edit。version(or RocksDB state snapshot)是通過join 一連串的version-edit構(gòu)建起來卜录「昵埽總之,一個(gè)manifest log文件是一連串的version edits艰毒。
version-edit = Any RocksDB state change
version = { version-edit* }
manifest-log-file = { version, version-edit* }
= { version-edit* }
Version Edit Layout
?? Manifest log是一串version edit記錄筐高。version edit 記錄的類型是通過 edit number指定。
1丑瞧、Data Types
簡(jiǎn)單數(shù)據(jù)類型
VarX - Variable character encoding of intX
FixedX - Fixed character encoding of intX
復(fù)雜數(shù)據(jù)類型
String - Length prefixed string data
+-----------+--------------------+
| size (n) | content of string |
+-----------+--------------------+
|<- Var32 ->|<-- n -->|
2柑土、Version Edit Record Format
version edit record 使用下面的格式,解碼器通過Record ID來識(shí)別記錄類型嗦篱。
+-------------+------ ......... ----------+
| Record ID | Variable size record data |
+-------------+------ .......... ---------+
<-- Var32 --->|<-- varies by type -->
3冰单、Version Edit Record Types and Layout
根據(jù)對(duì)RocksDB的狀態(tài)變更的不同幌缝,有不同類型的edit record灸促。
- Comparator edit record
Captures the comparator name
+-------------+----------------+
| kComparator | data |
+-------------+----------------+
<-- Var32 --->|<-- String -->|
- Log number edit record
Latest WAL log file number
+-------------+----------------+
| kLogNumber | log number |
+-------------+----------------+
<-- Var32 --->|<-- Var64 -->|
- Previous File Number edit record
Previous manifest file number
+------------------+----------------+
| kPrevFileNumber | log number |
+------------------+----------------+
<-- Var32 --->|<-- Var64 -->|
- Next File Number edit record
Next manifest file number
+------------------+----------------+
| kNextFileNumber | log number |
+------------------+----------------+
<-- Var32 --->|<-- Var64 -->|
- Last Sequence Number edit record
Last sequence number of RocksDB
+------------------+----------------+
| kLastSequence | log number |
+------------------+----------------+
<-- Var32 --->|<-- Var64 -->|
- Max Column Family edit record
Adjust the maximum number of family columns allowed.
+---------------------+----------------+
| kMaxColumnFamily | log number |
+---------------------+----------------+
<-- Var32 --->|<-- Var32 -->|
- Deleted File edit record
Mark a file as deleted from database.
+-----------------+-------------+--------------+
| kDeletedFile | level | file number |
+-----------------+-------------+--------------+
<-- Var32 --->|<-- Var32 -->|<-- Var64 -->|
- 新文件的edit record
表明這個(gè)文件是新添加到數(shù)據(jù)庫中的,同時(shí)提供RocksDB元信息涵卵。
?a浴栽、File edit record with compaction information
+--------------+-------------+--------------+------------+----------------+--------------+----------------+----------------+
| kNewFile4 | level | file number | file size | smallest_key | largest_key | smallest_seqno | largest_seq_no |
+--------------+-------------+--------------+------------+----------------+--------------+----------------+----------------+
|<-- var32 -->|<-- var32 -->|<-- var64 -->|<- var64 ->|<-- String -->|<-- String -->|<-- var64 -->|<-- var64 -->|
+-----------+---------------+-------+------------------+-------+--------------+
|kPathID ---| Path size(n) | path | kNeedCompaction | 1 | value (0/1) |
+-----------+---------------+-------+------------------+-------+--------------+
<- var32 ->|<-- var32 -->|<- n ->|<-- var32 -->|<- 1 ->|<-- 1 -->|
??b、File edit record backward compatible
+--------------+-------------+--------------+------------+----------------+--------------+----------------+----------------+
| kNewFile2 | level | file number | file size | smallest_key | largest_key | smallest_seqno | largest_seq_no |
+--------------+-------------+--------------+------------+----------------+--------------+----------------+----------------+
<-- var32 -->|<-- var32 -->|<-- var64 -->|<- var64 ->|<-- String -->|<-- String -->|<-- var64 -->|<-- var64 -->|
??c轿偎、File edit record with path information
+--------------+-------------+--------------+-------------+-------------+----------------+--------------+
| kNewFile3 | level | file number | Path ID | file size | smallest_key | largest_key |
+--------------+-------------+--------------+-------------+-------------+----------------+--------------+
|<-- var32 -->|<-- var32 -->|<-- var64 -->|<-- var32 -->|<-- var64 -->|<-- String -->|<-- String -->|
+----------------+----------------+
| smallest_seqno | largest_seq_no |
+----------------+----------------+
<-- var64 -->|<-- var64 -->|
- Column family status edit record
Note the status of column family feature (enabled/disabled)
+------------------+----------------+
| kColumnFamily | 0/1 |
+------------------+----------------+
<-- Var32 --->|<-- Var32 -->|
- Column family add edit record
Add a column family
+---------------------+----------------+
| kColumnFamilyAdd | cf name |
+---------------------+----------------+
<-- Var32 --->|<-- String -->|
- Column family drop edit record
Drop all column family
+---------------------+
| kColumnFamilyDrop |
+---------------------+
<-- Var32 --->|