qcow2 鏡像格式是 QEMU 模擬器支持的一種磁盤鏡像话原。它也是可以用一個文件的形式來表示一塊固定大小的塊設(shè)備磁盤。與普通的 raw 格式的鏡像相比辈挂,有以下特性:
- 更小的空間占用,即使文件系統(tǒng)不支持空洞(holes);
- 支持寫時拷貝(COW, copy-on-write)惠赫,鏡像文件只反映底層磁盤的變化;
- 支持快照(snapshot)故黑,鏡像文件能夠包含多個快照的歷史儿咱;
- 可選擇基于 zlib 的壓縮方式
- 可以選擇 AES 加密
目前網(wǎng)上可以百度到一些對qcow2文件的中文解析,但大多語焉不詳场晶,索性自己看qemu官方的文檔混埠,順便在這里記下自己的理解。
虛擬化新手诗轻,理解可能有誤钳宪,見諒。
下文是對qcow2官方文檔的翻譯概耻。以及自己的一些理解使套。
概述
A qcow2 image file is organized in units of constant size, which are called
(host) clusters.
qcow2 鏡像文件是由多個固定大小的單元組織構(gòu)成,這些單元被稱為 (host)clusters 鞠柄。
A cluster is the unit in which all allocations are done,
both for actual guest data and for image metadata.
無論是實際用戶數(shù)據(jù)(guest data)還是鏡像的元數(shù)據(jù)(metadata)侦高,都在一個 cluster 單元中進行存儲。
Likewise, the virtual disk as seen by the guest is divided into (guest)
clusters of the same size.
同樣的厌杜,用戶所見到的虛擬磁盤也是被分割為多個同樣大小的 cluesters 奉呛。
All numbers in qcow2 are stored in Big Endian byte order.
qcow2里所有的數(shù)都是Big Endian的。
文件頭
The first cluster of a qcow2 image contains the file header:
qcow2 鏡像的第一個 cluster 內(nèi)容包含了文件頭信息夯尽,文件頭在源代碼里的定義如下:
typedef struct QCowHeader {
uint32_t magic;
uint32_t version;
uint64_t backing_file_offset;
uint32_t backing_file_size;
uint32_t cluster_bits;
uint64_t size; /* in bytes */
uint32_t crypt_method;
uint32_t l1_size; /* XXX: save number of clusters instead ? */
uint64_t l1_table_offset;
uint64_t refcount_table_offset;
uint32_t refcount_table_clusters;
uint32_t nb_snapshots;
uint64_t snapshots_offset;
/* The following fields are only valid for version >= 3 */
uint64_t incompatible_features;
uint64_t compatible_features;
uint64_t autoclear_features;
uint32_t refcount_order;
uint32_t header_length;
} QEMU_PACKED QCowHeader;
文件頭結(jié)構(gòu)體里的具體含義如下:
字節(jié) 0 - 3 :magic
QCOW magic string ("QFI\xfb")
4個字節(jié)固定的標識符
4 - 7 version
Version number (valid values are 2 and 3)
版本號瞧壮,2或者3
8 - 15 backing_file_offset
Offset into the image file at which the backing file name is stored (NB: The string is not null terminated). 0 if the image doesn't have a backing file.
backing_file 文件路徑字符串相對于文件起始位置的偏移地址,這個字符串不是以0結(jié)束的匙握。該值為0時咆槽,表示該鏡像沒有 backing file
什么是backing file就不解釋了,知道qcow2的人自然知道圈纺。
16 - 19 backing_file_size
Length of the backing file name in bytes. Must not be longer than 1023 bytes. Undefined if the image doesn't have a backing file.
backing file 文件路徑字符串長度秦忿,單位是字節(jié)數(shù)。必須小于1023字節(jié)蛾娶。鏡像沒有backing file時灯谣,該值無意義
20 - 23 cluster_bits
Number of bits that are used for addressing an offset within a cluster (1 << cluster_bits is the cluster size). Must not be less than 9 (i.e. 512 byte clusters). Note: qemu as of today has an implementation limit of 2 MB as the maximum cluster size and won't be able to open images with larger cluster sizes.
cluster 位數(shù),代表了 cluster 大谢桌拧(1 << cluster_bits 就是 cluster 的大刑バ怼)。不能小于9,也就是每個 cluster 大小不能小于 512個字節(jié)辜窑。 Note:新版本的qemu啟用了最大 2MB 的 cluster 大小钩述。
24 - 31 size
Virtual disk size in bytes
虛擬磁盤的大小,單位字節(jié)穆碎。應(yīng)該就是鏡像文件總的大小切距。
32 - 35 crypt_method
0 for no encryption 1 for AES encryption
0 - 未加密;1 - AES加密
36 - 39 l1_size
Number of entries in the active L1 table
L1 table的入口個數(shù)惨远。
L1 table 是什么鬼?目前不理解话肖,以后再說
40 - 47 l1_table_offset
Offset into the image file at which the active L1 table starts. Must be aligned to a cluster boundary.
L1 table 相對于鏡像文件起始位置的偏移北秽。 必須與 cluster 對齊
48 - 55 refcount_table_offset
Offset into the image file at which the refcount table starts. Must be aligned to a cluster boundary.
refcount table 相對于鏡像文件起始位置的偏移。必須與 cluster 對齊
refcount table 在后文有解釋最筒?
56 - 59 refcount_table_clusters
Number of clusters that the refcount table occupies
refcount table 占用了多少個 cluster
60 - 63 nb_snapshots
Number of snapshots contained in the image
鏡像文件中包含了多少個快照贺氓。
64 - 71 snapshots_offset
Offset into the image file at which the snapshot table starts. Must be aligned to a cluster boundary.
快照 table 相對于鏡像文件起始位置的偏移。必須與 cluster 對齊
If the version is 3 or higher, the header has the following additional fields.
For version 2, the values are assumed to be zero, unless specified otherwise
in the description of a field.
如果版本是3或更高(目前最高就是3)床蜘,文件頭還會包含以下的信息辙培。在版本2中,這些值都是0邢锯,除非特別說明.
72 - 79 incompatible_features
Bitmask of incompatible features.
An implementation must fail to open an image if an unknown bit is set.
未實現(xiàn)的特征的位掩碼
在解析文件的時候扬蕊,如果發(fā)現(xiàn)某個未知的位被設(shè)置為1,就是需要報錯的時候了丹擎。
Bit 0:
Dirty bit. If this bit is set then refcounts may be inconsistent, make sure to scan L1/L2 tables to repair refcounts before accessing the image.
臟位尾抑。如果該位為1,refcounts可能和實際情況是不一致的蒂培,在解析的時候需要掃描一遍 L1/L2 table 來修復(fù) refcounts再愈。
Bit 1:
Corrupt bit. If this bit is set then any data structure may be corrupt and the image must not be written to (unless for regaining consistency).
損壞位。如果該位為1护戳,任何數(shù)據(jù)結(jié)構(gòu)可能損壞翎冲,且鏡像不應(yīng)該被寫。
好吧媳荒,如果讀到這一位為1抗悍,我不想管了……
Bits 2-63:
Reserved (set to 0)
保留,應(yīng)該為0肺樟。
80 - 87: compatible_features
Bitmask of compatible features. An implementation can safely ignore any unknown bits that are set.
兼容特征的位掩碼檐春。解析的時候完全可以忽略這些位。
Bit 0:
Lazy refcounts bit.
If this bit is set then lazy refcount updates can be used. This means marking the image file dirty and postponing refcount metadata updates.
該位為1么伯,則 lazy refcount 更新可以被使用疟暖。 意味著 dirty bit 為1,并且推遲refcount 元數(shù)據(jù)的更新。
Bits 1-63: Reserved (set to 0)
88 - 95: autoclear_features
Bitmask of auto-clear features. An implementation may only write to an image with unknown auto-clear features if it clears the respective bits from this field first.
我的理解是…… 對于這些autoclear feature俐巴,在處理鏡像時骨望,如果某一位含義未知,則應(yīng)該先將其設(shè)置為0欣舵,再進行寫鏡像操作擎鸠。
Bit 0:
Bitmaps extension bit
This bit indicates consistency for the bitmaps extension data. It is an error if this bit is set without the bitmaps extension present. If the bitmaps extension is present but this bit is unset, the bitmaps extension data must be considered inconsistent.
這一位表示 bitmap extension 數(shù)據(jù)一致性。 如果這一位為1缘圈,但不存在 bitmaps extension劣光,則應(yīng)該報錯;如果存在 bitmap extension 但這一位為0糟把,則應(yīng)認為 bitmap extension data 不一致(存在問題绢涡?)。
Bits 1-63: Reserved (set to 0)
96 - 99: refcount_order
Describes the width of a reference count block entry (width in bits: refcount_bits = 1 << refcount_order). For version 2 images, the order is always assumed to be 4 (i.e. refcount_bits = 16). This value may not exceed 6 (i.e. refcount_bits = 64).
refcount block 入口的寬度遣疯。
抱歉寫到這里的時候雄可,我還不明白refcount是什么含義,無法做出更多解釋缠犀,不過反正版本2的時候是個固定值16数苫,應(yīng)該影響不大
后文有詳細解釋
refcount_bits = 1 << refcount_order
版本2時,固定為4辨液,也就是說 refcount_bits = 16.
該值不超過6虐急,也就是 refcount_bits 不超過 64
100 - 103: header_length
Length of the header structure in bytes. For version 2 images, the length is always assumed to be 72 bytes.
文件頭結(jié)構(gòu)體的長度,版本2時室梅,長度固定為72字節(jié)戏仓。
header extensions
Directly after the image header, optional sections called header extensions can
be stored. Each extension has a structure like the following:
緊接著鏡像的文件頭,存儲的是可選的多個 header extensions亡鼠。
源代碼里header extension的結(jié)構(gòu)體定義如下:
typedef struct Qcow2UnknownHeaderExtension {
uint32_t magic;
uint32_t len;
QLIST_ENTRY(Qcow2UnknownHeaderExtension) next;
uint8_t data[];
} Qcow2UnknownHeaderExtension;
每一個結(jié)構(gòu)如下:
Byte 0 - 3: Header extension type:
0x00000000 - End of the header extension area
0xE2792ACA - Backing file format name
0x6803f857 - Feature name table
0x23852875 - Bitmaps extension
other - Unknown header extension, can be safely ignored
幾個固定的赏殃,extension的類型,沒啥可說的间涵。
4 - 7: Length of the header extension data
數(shù)據(jù)長度
8 - n: Header extension data
數(shù)據(jù)內(nèi)容
n - m: Padding to round up the header extension size to the next multiple of 8.
填充到8字節(jié)對齊
Unless stated otherwise, each header extension type shall appear at most once
in the same image.
If the image has a backing file then the backing file name should be stored in
the remaining space between the end of the header extension area and the end of
the first cluster. It is not allowed to store other data here, so that an
implementation can safely modify the header and add extensions without harming
data of compatible features that it doesn't support. Compatible features that
need space for additional data can use a header extension.
除非特別說明仁热,每個extension類型在一個鏡像里應(yīng)該只會出現(xiàn)一次。
下面是 Feature name table 和 Bitmaps extension 兩種 extension 類型結(jié)構(gòu)的說明勾哩。
Feature name table
The feature name table is an optional header extension that contains the name
for features used by the image. It can be used by applications that don't know
the respective feature (e.g. because the feature was introduced only later) to
display a useful error message.
The number of entries in the feature name table is determined by the length of
the header extension data. Each entry look like this:
Byte 0: Type of feature (select feature bitmap)
0: Incompatible feature
1: Compatible feature
2: Autoclear feature
1: Bit number within the selected feature bitmap (valid
values: 0-63)
2 - 47: Feature name (padded with zeros, but not necessarily null
terminated if it has full length)
Bitmaps extension
The bitmaps extension is an optional header extension. It provides the ability
to store bitmaps related to a virtual disk. For now, there is only one bitmap
type: the dirty tracking bitmap, which tracks virtual disk changes from some
point in time.
The data of the extension should be considered consistent only if the
corresponding auto-clear feature bit is set, see autoclear_features above.
The fields of the bitmaps extension are:
Byte 0 - 3: nb_bitmaps
The number of bitmaps contained in the image. Must be
greater than or equal to 1.
Note: Qemu currently only supports up to 65535 bitmaps per
image.
4 - 7: Reserved, must be zero.
8 - 15: bitmap_directory_size
Size of the bitmap directory in bytes. It is the cumulative
size of all (nb_bitmaps) bitmap headers.
16 - 23: bitmap_directory_offset
Offset into the image file at which the bitmap directory
starts. Must be aligned to a cluster boundary.
Host cluster management
看到這里抗蠢,發(fā)現(xiàn)似乎有 host cluster 和 guest cluster 的區(qū)別,權(quán)且這么認為思劳,先繼續(xù)看吧迅矛。
qcow2 manages the allocation of host clusters by maintaining a reference count
for each host cluster. A refcount of 0 means that the cluster is free, 1 means
that it is used, and >= 2 means that it is used and any write access must
perform a COW (copy on write) operation.
這里解釋了前面一直提到的refcount。對于每一個host cluster潜叛,qcow2維護了一個refcount表秽褒,應(yīng)該是引用計數(shù)的概念壶硅,當refcount為0時,表示該cluster是未分配的销斟,1表示是在使用的庐椒,>=2時表示在被使用,并且所有的寫操作都要進行COW(copy on write)操作蚂踊。
The refcounts are managed in a two-level table. The first level is called
refcount table and has a variable size (which is stored in the header). The
refcount table can cover multiple clusters, however it needs to be contiguous
in the image file.
采用了兩層表來維護管理 refcounts约谈,第一層叫 refcount table,是可變大小的(refcount table 的 size 存儲在header里)犁钟,refcount table 的每一項覆蓋多個 cluster棱诱,當然,在鏡像文件中refcount table是連續(xù)存儲的涝动。
It contains pointers to the second level structures which are called refcount
blocks and are exactly one cluster in size.
refcount table 包含了多個指針军俊,指向了第二層結(jié)構(gòu)體,第二層結(jié)構(gòu)被稱為 refcount block捧存,一個refcount block在大小上就是一個cluster。(意思就是担败,block也是存在一個個cluster里的)
Given a offset into the image file, the refcount of its cluster can be obtained
as follows:
以下是根據(jù)鏡像偏移量 offset昔穴,獲得某個cluster對應(yīng)引用計數(shù)的方法:
refcount_block_entries = (cluster_size * 8 / refcount_bits)
refcount_block_index = (offset / cluster_size) % refcount_block_entries
refcount_table_index = (offset / cluster_size) / refcount_block_entries
refcount_block = load_cluster(refcount_table[refcount_table_index]);
return refcount_block[refcount_block_index];
注:各種變量前文有述,這里回顧一下
cluster_size = 1 << cluster_bits //最小 512 bytes
refcount_bits = 16 //in version 2
怎么理解呢提前?
以版本2為例
- cluster_size是一個cluster的字節(jié)數(shù)吗货,對于一個qcow2文件來說,每個cluster都是固定大小的狈网,比如512字節(jié)宙搬。
- refcount_bits固定是16,因為refcount block也要按照cluster的大小來存儲拓哺,所以每個cluster能夠存儲的block個數(shù): refcount_block_entries = cluster_size / 2 = 256 勇垛。
refcount_table 的一個單元對應(yīng) 256 個 refcount_block,存在一個cluster里士鸥。
每個block里有2個字節(jié)(16位)闲孤,記錄了某個cluster的引用計數(shù)。
所以計算某個 offset 所在的 cluster 引用計數(shù)的辦法烤礁,先 offset / cluster_size 得到這個offset對應(yīng)的是第幾個cluster讼积,然后在refcount table里找,存在table的第幾個單元里脚仔,最后在這個單元里找是第幾個block存著引用計數(shù)勤众。
因為一開始沒有區(qū)分原文里offset所在的cluster和存儲block的cluster,理解這個refcount table 頗費了一番功夫鲤脏,年紀大了腦子真的不好使了们颜?
下面是 refcount table 和 refcount block 的結(jié)構(gòu)體定義吕朵,理解了上面這段的話,這里挺簡單的了掌桩。
Refcount table entry:
Bit 0 - 8: Reserved (set to 0)
9 - 63: Bits 9-63 of the offset into the image file at which the
refcount block starts. Must be aligned to a cluster
boundary.
If this is 0, the corresponding refcount block has not yet
been allocated. All refcounts managed by this refcount block
are 0.
Refcount block entry (x = refcount_bits - 1):
Bit 0 - x: Reference count of the cluster. If refcount_bits implies a
sub-byte width, note that bit 0 means the least significant
bit in this context.
先寫到這里吧边锁,后面還有很重要的 cluster mapping的解讀和快照的解讀,明天有精力再寫波岛。