這是“文件格式探究”專題的第 1 期——初探 “ePub” 文件格式。這個專題將會給各位讀者呈現(xiàn)筆者探索各種文件格式的過程罐呼,具體則是文件的內(nèi)容是如何呈現(xiàn)出來的震蒋。原則上我們假定僅對于這些文件格式的用途有所了解十艾,但具體實現(xiàn)的細節(jié)并不清楚 (如果提前掌握了部分內(nèi)容,筆者全當(dāng)其不存在) 萄金。探究過程中我們會嘗試使用各種方法來逐漸初步掌握其概貌。
文件格式簡介
根據(jù)維基百科大陸簡體版本的相關(guān)描述:
EPub 是一個自由的開放標準媚朦,屬于一種可以“自動重新排版”的內(nèi)容氧敢;也就是文字內(nèi)容可以根據(jù)閱讀設(shè)備的特性,以最適于閱讀的方式顯示询张。
之所以后面不截是因為再截就劇透了孙乖。簡單來說,ePub 就是類似于 PDF 那樣的“文檔型”文件格式瑞侮,常用于分發(fā)電子讀物等的圆。
探究過程
環(huán)境
現(xiàn)在筆者手頭上有一份用于測試的 ePub 文件,文件路徑為 ~/Downloads/咖啡館推理事件簿系列(全四本).epub
(趁機夾帶私貨半火,反正很合我胃口就是了) 越妈,后續(xù)所有的探究活動均建立于此文件上。筆者目前的操作系統(tǒng)環(huán)境為 Manjaro 21.1.0 on amd64钮糖,終端環(huán)境為 GNU bash 5.1.8(1)-release 梅掠。為了方便,我們先把文件改個名字 (那你還把原來的名字給出來干嘛店归?阎抒!) :
[littleye233@lymjrolt Downloads]$ cd ~
[littleye233@lymjrolt ~]$ cd Downloads
[littleye233@lymjrolt Downloads]$ mv 咖啡館推理事件簿系列(全四本).epub test.epub
[littleye233@lymjrolt Downloads]$ ll test.epub
-rw-r--r-- 1 littleye233 littleye233 1253964 Aug 22 23:24 test.epub
Round I. 文件類型
首先我們先嘗試用 Linux 系統(tǒng)的內(nèi)置命令 file
試試水,看看會輸出什么東西消痛。鍵入 file test.epub
后執(zhí)行:
[littleye233@lymjrolt Downloads]$ file test.epub
test.epub: EPUB document EPUB document
哎呀且叁,真可惜! file
命令幾乎什么有效信息都沒給我們秩伞。 file
命令的 man
頁面明確給出此命令可以判斷文件格式逞带,但其實它能做到的有很多欺矫,例如如果對一個圖片文件使用 file
,可能會出現(xiàn)類似下面的結(jié)果:
[littleye233@lymjrolt Downloads]$ file ~/.local/share/osu/screenshots/osu_2021-08-21_23-40-03.png
/home/littleye233/.local/share/osu/screenshots/osu_2021-08-21_23-40-03.png: PNG image data, 1920 x 961, 8-bit/color RGBA, non-interlaced
這樣我們可以通過 file
中提供的相關(guān)信息順藤摸瓜展氓,嘗試在文件的二進制編碼內(nèi)容中尋找其蛛絲馬跡穆趴,進而推測對應(yīng)“位點”所表達的含義 (因為一些文件格式要求在特定的位置表達某些含義) ,如果能提供類似注釋的信息就再好不過了遇汞。
Round II. 文件結(jié)構(gòu)
現(xiàn)在我們回到這個 ePub 文件上來∥疵茫現(xiàn)在我們嘗試能否直接獲取其內(nèi)容,目的是通過文件頭部的部分可見字符猜測其文件結(jié)構(gòu)空入。輸入 nano test.epub
直接預(yù)覽络它,或使用 head --bytes=120 test.epub
查看前面 120 個字節(jié)的內(nèi)容:
[littleye233@lymjrolt Downloads]$ head --bytes=120 test.epub
PK!oa?mimetypeapplication/epub+zipPU?N?;???META-INF/container.xml]?A
?0E?=
果不其然,我們看到了一些有趣的字眼: “mimetypeapplication/epub+zip” 执庐,憑經(jīng)驗猜測酪耕,這應(yīng)該是 ePub 文件格式的文件頭,而其中的 “zip” 也說明—— ePub 文件可能本質(zhì)上就是一個壓縮檔轨淌。
其實很多文件格式 (例如 Word 文檔 “*.docx”) 其本質(zhì)都是在一個壓縮檔中加入各種資源文件和配置文件迂烁,只要有對應(yīng)的軟件進行讀取并重新加工,用戶即能看到效果递鹉。
Round III. 目錄樹結(jié)構(gòu)
現(xiàn)在我們可以使用解壓縮程序解出 ePub 文件中的內(nèi)容了盟步。在終端中執(zhí)行 unzip -l test.epub
:
[littleye233@lymjrolt Downloads]$ unzip -l test.epub
Archive: test.epub
Length Date Time Name
--------- ---------- ----- ----
20 1980-01-01 00:00 mimetype
251 2019-06-27 10:40 META-INF/container.xml
12307 2019-06-27 10:40 OEBPS/content.opf
112368 2019-06-27 10:40 OEBPS/Images/cover00464.jpeg
128680 2019-06-27 10:40 OEBPS/Images/image00456.jpeg
120936 2019-06-27 10:40 OEBPS/Images/image00457.jpeg
1392 2019-06-27 10:40 OEBPS/Images/image00458.jpeg
101948 2019-06-27 10:40 OEBPS/Images/image00459.jpeg
119124 2019-06-27 10:40 OEBPS/Images/image00460.jpeg
1268 2019-06-27 10:40 OEBPS/Images/image00461.jpeg
42944 2019-06-27 10:40 OEBPS/Images/image00462.jpeg
121284 2019-06-27 10:40 OEBPS/Images/image00463.jpeg
2251 2019-06-27 10:40 OEBPS/Styles/style0001.css
9816 2019-06-27 10:40 OEBPS/Styles/style0002.css
2251 2019-06-27 10:40 OEBPS/Styles/style0003.css
9789 2019-06-27 10:40 OEBPS/Styles/style0004.css
2251 2019-06-27 10:40 OEBPS/Styles/style0005.css
29245 2019-06-27 10:40 OEBPS/Styles/style0006.css
2235 2019-06-27 10:40 OEBPS/Styles/style0007.css
29914 2019-06-27 10:40 OEBPS/Styles/style0008.css
2251 2019-06-27 10:40 OEBPS/Styles/style0009.css
624 2019-06-27 10:40 OEBPS/Text/cover_page.xhtml
851 2019-06-27 10:40 OEBPS/Text/part0000.xhtml
561 2019-06-27 10:40 OEBPS/Text/part0001.xhtml
428 2019-06-27 10:40 OEBPS/Text/part0002.xhtml
1518 2019-06-27 10:40 OEBPS/Text/part0003.xhtml
661 2019-06-27 10:40 OEBPS/Text/part0004.xhtml
2311 2019-06-27 10:40 OEBPS/Text/part0005.xhtml
55157 2019-06-27 10:40 OEBPS/Text/part0006.xhtml
58266 2019-06-27 10:40 OEBPS/Text/part0007.xhtml
59953 2019-06-27 10:40 OEBPS/Text/part0008.xhtml
49789 2019-06-27 10:40 OEBPS/Text/part0009.xhtml
66870 2019-06-27 10:40 OEBPS/Text/part0010.xhtml
57342 2019-06-27 10:40 OEBPS/Text/part0011.xhtml
67449 2019-06-27 10:40 OEBPS/Text/part0012.xhtml
16183 2019-06-27 10:40 OEBPS/Text/part0013.xhtml
561 2019-06-27 10:40 OEBPS/Text/part0014.xhtml
428 2019-06-27 10:40 OEBPS/Text/part0015.xhtml
1575 2019-06-27 10:40 OEBPS/Text/part0016.xhtml
496 2019-06-27 10:40 OEBPS/Text/part0017.xhtml
1446 2019-06-27 10:40 OEBPS/Text/part0018.xhtml
52358 2019-06-27 10:40 OEBPS/Text/part0019.xhtml
75746 2019-06-27 10:40 OEBPS/Text/part0020.xhtml
63420 2019-06-27 10:40 OEBPS/Text/part0021.xhtml
57399 2019-06-27 10:40 OEBPS/Text/part0022.xhtml
58590 2019-06-27 10:40 OEBPS/Text/part0023.xhtml
40263 2019-06-27 10:40 OEBPS/Text/part0024.xhtml
66099 2019-06-27 10:40 OEBPS/Text/part0025.xhtml
15143 2019-06-27 10:40 OEBPS/Text/part0026.xhtml
561 2019-06-27 10:40 OEBPS/Text/part0027.xhtml
612 2019-06-27 10:40 OEBPS/Text/part0028.xhtml
1344 2019-06-27 10:40 OEBPS/Text/part0029.xhtml
640 2019-06-27 10:40 OEBPS/Text/part0030.xhtml
6144 2019-06-27 10:40 OEBPS/Text/part0031.xhtml
25197 2019-06-27 10:40 OEBPS/Text/part0032.xhtml
54594 2019-06-27 10:40 OEBPS/Text/part0033.xhtml
87394 2019-06-27 10:40 OEBPS/Text/part0034.xhtml
97557 2019-06-27 10:40 OEBPS/Text/part0035.xhtml
109901 2019-06-27 10:40 OEBPS/Text/part0036.xhtml
17181 2019-06-27 10:40 OEBPS/Text/part0037.xhtml
5238 2019-06-27 10:40 OEBPS/Text/part0038.xhtml
561 2019-06-27 10:40 OEBPS/Text/part0039.xhtml
644 2019-06-27 10:40 OEBPS/Text/part0040.xhtml
1163 2019-06-27 10:40 OEBPS/Text/part0041.xhtml
1473 2019-06-27 10:40 OEBPS/Text/part0042.xhtml
38427 2019-06-27 10:40 OEBPS/Text/part0043.xhtml
90589 2019-06-27 10:40 OEBPS/Text/part0044.xhtml
51278 2019-06-27 10:40 OEBPS/Text/part0045.xhtml
58321 2019-06-27 10:40 OEBPS/Text/part0046.xhtml
29670 2019-06-27 10:40 OEBPS/Text/part0047.xhtml
12903 2019-06-27 10:40 OEBPS/Text/part0048.xhtml
7364 2019-06-27 10:40 OEBPS/toc.ncx
--------- -------
2422768 72 files
同時可以直接解壓:
[littleye233@lymjrolt Downloads]$ unzip test.epub -d test_epub
Archive: test.epub
extracting: test_epub/mimetype
inflating: test_epub/META-INF/container.xml
inflating: test_epub/OEBPS/content.opf
inflating: test_epub/OEBPS/Images/cover00464.jpeg
inflating: test_epub/OEBPS/Images/image00456.jpeg
inflating: test_epub/OEBPS/Images/image00457.jpeg
inflating: test_epub/OEBPS/Images/image00458.jpeg
inflating: test_epub/OEBPS/Images/image00459.jpeg
inflating: test_epub/OEBPS/Images/image00460.jpeg
inflating: test_epub/OEBPS/Images/image00461.jpeg
inflating: test_epub/OEBPS/Images/image00462.jpeg
inflating: test_epub/OEBPS/Images/image00463.jpeg
inflating: test_epub/OEBPS/Styles/style0001.css
inflating: test_epub/OEBPS/Styles/style0002.css
inflating: test_epub/OEBPS/Styles/style0003.css
inflating: test_epub/OEBPS/Styles/style0004.css
inflating: test_epub/OEBPS/Styles/style0005.css
inflating: test_epub/OEBPS/Styles/style0006.css
inflating: test_epub/OEBPS/Styles/style0007.css
inflating: test_epub/OEBPS/Styles/style0008.css
inflating: test_epub/OEBPS/Styles/style0009.css
inflating: test_epub/OEBPS/Text/cover_page.xhtml
inflating: test_epub/OEBPS/Text/part0000.xhtml
inflating: test_epub/OEBPS/Text/part0001.xhtml
inflating: test_epub/OEBPS/Text/part0002.xhtml
inflating: test_epub/OEBPS/Text/part0003.xhtml
inflating: test_epub/OEBPS/Text/part0004.xhtml
inflating: test_epub/OEBPS/Text/part0005.xhtml
inflating: test_epub/OEBPS/Text/part0006.xhtml
inflating: test_epub/OEBPS/Text/part0007.xhtml
inflating: test_epub/OEBPS/Text/part0008.xhtml
inflating: test_epub/OEBPS/Text/part0009.xhtml
inflating: test_epub/OEBPS/Text/part0010.xhtml
inflating: test_epub/OEBPS/Text/part0011.xhtml
inflating: test_epub/OEBPS/Text/part0012.xhtml
inflating: test_epub/OEBPS/Text/part0013.xhtml
inflating: test_epub/OEBPS/Text/part0014.xhtml
inflating: test_epub/OEBPS/Text/part0015.xhtml
inflating: test_epub/OEBPS/Text/part0016.xhtml
inflating: test_epub/OEBPS/Text/part0017.xhtml
inflating: test_epub/OEBPS/Text/part0018.xhtml
inflating: test_epub/OEBPS/Text/part0019.xhtml
inflating: test_epub/OEBPS/Text/part0020.xhtml
inflating: test_epub/OEBPS/Text/part0021.xhtml
inflating: test_epub/OEBPS/Text/part0022.xhtml
inflating: test_epub/OEBPS/Text/part0023.xhtml
inflating: test_epub/OEBPS/Text/part0024.xhtml
inflating: test_epub/OEBPS/Text/part0025.xhtml
inflating: test_epub/OEBPS/Text/part0026.xhtml
inflating: test_epub/OEBPS/Text/part0027.xhtml
inflating: test_epub/OEBPS/Text/part0028.xhtml
inflating: test_epub/OEBPS/Text/part0029.xhtml
inflating: test_epub/OEBPS/Text/part0030.xhtml
inflating: test_epub/OEBPS/Text/part0031.xhtml
inflating: test_epub/OEBPS/Text/part0032.xhtml
inflating: test_epub/OEBPS/Text/part0033.xhtml
inflating: test_epub/OEBPS/Text/part0034.xhtml
inflating: test_epub/OEBPS/Text/part0035.xhtml
inflating: test_epub/OEBPS/Text/part0036.xhtml
inflating: test_epub/OEBPS/Text/part0037.xhtml
inflating: test_epub/OEBPS/Text/part0038.xhtml
inflating: test_epub/OEBPS/Text/part0039.xhtml
inflating: test_epub/OEBPS/Text/part0040.xhtml
inflating: test_epub/OEBPS/Text/part0041.xhtml
inflating: test_epub/OEBPS/Text/part0042.xhtml
inflating: test_epub/OEBPS/Text/part0043.xhtml
inflating: test_epub/OEBPS/Text/part0044.xhtml
inflating: test_epub/OEBPS/Text/part0045.xhtml
inflating: test_epub/OEBPS/Text/part0046.xhtml
inflating: test_epub/OEBPS/Text/part0047.xhtml
inflating: test_epub/OEBPS/Text/part0048.xhtml
inflating: test_epub/OEBPS/toc.ncx
為了更清楚地顯示文件樹結(jié)構(gòu),我們也可以使用 tree
命令 (這個命令在 Windows 中是內(nèi)置的躏结,在 Linux 中需要安裝 tree
這個包却盘,使用軟件包管理器或編譯安裝均可) :
[littleye233@lymjrolt test_epub]$ tree
.
├── META-INF
│ └── container.xml
├── mimetype
└── OEBPS
├── content.opf
├── Images
│ ├── cover00464.jpeg
│ ├── image00456.jpeg
│ ├── image00457.jpeg
│ ├── image00458.jpeg
│ ├── image00459.jpeg
│ ├── image00460.jpeg
│ ├── image00461.jpeg
│ ├── image00462.jpeg
│ └── image00463.jpeg
├── Styles
│ ├── style0001.css
│ ├── style0002.css
│ ├── style0003.css
│ ├── style0004.css
│ ├── style0005.css
│ ├── style0006.css
│ ├── style0007.css
│ ├── style0008.css
│ └── style0009.css
├── Text
│ ├── cover_page.xhtml
│ ├── part0000.xhtml
│ ├── part0001.xhtml
│ ├── part0002.xhtml
│ ├── part0003.xhtml
│ ├── part0004.xhtml
│ ├── part0005.xhtml
│ ├── part0006.xhtml
│ ├── part0007.xhtml
│ ├── part0008.xhtml
│ ├── part0009.xhtml
│ ├── part0010.xhtml
│ ├── part0011.xhtml
│ ├── part0012.xhtml
│ ├── part0013.xhtml
│ ├── part0014.xhtml
│ ├── part0015.xhtml
│ ├── part0016.xhtml
│ ├── part0017.xhtml
│ ├── part0018.xhtml
│ ├── part0019.xhtml
│ ├── part0020.xhtml
│ ├── part0021.xhtml
│ ├── part0022.xhtml
│ ├── part0023.xhtml
│ ├── part0024.xhtml
│ ├── part0025.xhtml
│ ├── part0026.xhtml
│ ├── part0027.xhtml
│ ├── part0028.xhtml
│ ├── part0029.xhtml
│ ├── part0030.xhtml
│ ├── part0031.xhtml
│ ├── part0032.xhtml
│ ├── part0033.xhtml
│ ├── part0034.xhtml
│ ├── part0035.xhtml
│ ├── part0036.xhtml
│ ├── part0037.xhtml
│ ├── part0038.xhtml
│ ├── part0039.xhtml
│ ├── part0040.xhtml
│ ├── part0041.xhtml
│ ├── part0042.xhtml
│ ├── part0043.xhtml
│ ├── part0044.xhtml
│ ├── part0045.xhtml
│ ├── part0046.xhtml
│ ├── part0047.xhtml
│ └── part0048.xhtml
└── toc.ncx
5 directories, 72 files
Round IV. 內(nèi)部文件
到這里我們大概就能猜出來:
-
META-INF
文件夾:里面存放的應(yīng)該是“容器” (也就是這個 ePub 文件) 的相關(guān)配置文件; -
mimetype
文件:里面定義了這個文件的類型為 “ePub” (其中 “MIME” 是 “Multipurpose Internet Mail Extensions” 的縮寫媳拴,從字面上也能看出其具有指示 “Extension” 的機能) 黄橘; -
OEBPS
文件夾:雖暫不知其確切含義,但應(yīng)存放 ePub 的文字屈溉、圖片以及其他的界面數(shù)據(jù)塞关;-
content.opf
文件:里面存放的應(yīng)該是目錄信息——或是定義各種文件的“次序”; -
Images
Styles
和Text
文件夾:明顯分別存放圖片子巾、層疊樣式表和文字數(shù)據(jù)帆赢; -
toc.ncx
文件:可能是真正的目錄 (“toc” 是 “table of contents” 的縮寫)。
-
接下來我們將挨個分析线梗。
Round IV.I. 容器
先看 META-INF/container.xml
:
[littleye233@lymjrolt test_epub]$ file META-INF/container.xml
META-INF/container.xml: XML 1.0 document, ASCII text
輸出其內(nèi)容:
<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/> </rootfiles>
</container>
顯然是一個標準的 XML 文件椰于,其中我們可以注意到 /container/rootfiles/rootfile[@class='full-path']
[1] 中定義了一個我們之前認定的目錄文件,但此處可以規(guī)范化仪搔,故這個文件在大多數(shù) ePub 檔中應(yīng)該是相同的瘾婿。
Round IV.II. 文件類型定性
接下來看 mimetype
文件:
[littleye233@lymjrolt test_epub]$ cat mimetype
application/epub+zip
這也是相當(dāng)顯然的,也不再贅述。
Round IV.III. 目錄憋他?
再看 OEBPS/content.opf
:
[littleye233@lymjrolt test_epub]$ file OEBPS/content.opf
OEBPS/content.opf: XML 1.0 document, Unicode text, UTF-8 text, with very long lines (504)
這也是一個 XML 文件孩饼,令人驚訝的是 file
命令竟能看出這個文件中最長的行有 504 個字符髓削,屬實讓人害怕竹挡。
<details>
<summary>點此查看 OEBPS/content.opf
的全部內(nèi)容 (已經(jīng)過格式化)</summary>
<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="uid">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:title opf:file-as="kafeiguantuilishijianbuxilie(quansiben)">咖啡館推理事件簿系列(全四本)</dc:title>
<dc:language>zh</dc:language>
<dc:identifier id="uid">3899198450</dc:identifier>
<dc:creator opf:file-as="(ri)gangqizuomo">(日)岡崎琢磨</dc:creator>
<dc:date opf:event="publication">2018-03-15</dc:date>
<!-- Extra MetaData from RESC
<dc:coverage/>
-->
<meta name="cover" content="x_cover-image"/>
<meta name="output encoding" content="utf-8"/>
<meta name="primary-writing-mode" content="horizontal-lr"/>
<!-- BEGIN INFORMATION ONLY
<meta name="Cover ThumbNail Image" content="Images/image00466.jpeg" />
<meta name="Drm Ebookbase Book Id" content="0006008690412" />
<meta name="ASIN" content="B07BFTVX98" />
<meta name="Creator-Software" content="201" />
<meta name="Author-Pronunciation" content="(ri)gangqizuomo" />
<meta name="Embedded-Record-Count" content="11" />
<meta name="Unknown_(403)_(hex)" content="00" />
<meta name="HasFakeCover" content="0" />
<meta name="Creator-Major-Version" content="2" />
<meta name="cdeType" content="EBOK" />
<meta name="override-kindle-fonts" content="false" />
<meta name="CDEContentKey" content="B07BFTVX98" />
<meta name="Compression-Upgraded" content="Source-Target:c1-c2 KT_Version:2.9 Build:0805-4a0c57c" />
<meta name="HD-Media-Containers-Info" content="2400x3840:0-11|" />
<meta name="548 (hex)" content="496e4d656d6f7279" />
<meta name="Unknown_(407)_(hex)" content="0000000000000000" />
<meta name="Amazon_Creator_Info" content="kjw" />
<meta name="Clipping-Limit" content="100" />
<meta name="Tamper-Proof-Keys_(hex)" content="01000000d000000001940000000191000000019500000001960000000197" />
<meta name="Title-Pronunciation" content="kafeiguantuilishijianbuxilie(quansiben)" />
<meta name="Creator-Minor-Version" content="9" />
<meta name="MetadataResourceURI" content="kindle:embed:000A" />
<meta name="Updated_Title" content="咖啡館推理事件簿系列(全四本)" />
<meta name="Ownership-Type_(hex)" content="00" />
<meta name="547 (hex)" content="496e4d656d6f7279" />
<meta name="Content-Language-Tag" content="zh" />
<meta name="sample" content="0" />
<meta name="Metadata-Record-Offset" content="4294967295" />
<meta name="Creator-Build-Tag" content="0721-dedaf5" />
<meta name="Publisher-Pronunciation" content="xiandaichubanshe" />
<meta name="StartOffset" content="4294967295" />
<meta name="Watermark_(hex)" content="6174763a6b696e3a323a49396e41307a4239625565766a514961583462736b66476a394535335a51616a696368585638364447746b65544379504d504c4d75445a35524f39676b584d35515a6433694f424b5531546643766f5a62507763705a6b49486f6f366a6639785944327a4158494263536c495879676b6a38616b566e4763327a2b2b50434c454c464b2b4e30495a4556437a6331516f656f4451546b3865374a6f61696251526d6f682b7574586b3661466a554477704a3165636c68665367414a35664745413a68614f61636b496839662b61786c457733397665774b32554a57453d" />
<meta name="Text-to-Speech-Disabled" content="0" />
<meta name="Font-Signature_(hex)" content="0300000000480f08100000000000008000200000000000000000000000000000bef4edec01b701d7409440984099409c409d40a64a804c9c608160826080608b60e8618b60cd60c661aa61f361d661e9629c73df0213c9021fd90173cd01429f037e9a037e8c037e81037e9f" />
<meta name="Rental-Expiration-Time" content="0000000000000000" />
<meta name="Container_Id" content="ZkM0" />
<meta name="Mobi8-Boundary-Section" content="420" />
<meta name="Creator-Build-Number" content="0" />
END INFORMATION ONLY -->
</metadata>
<manifest>
<item id="x_cover" media-type="application/xhtml+xml" href="Text/cover_page.xhtml"/>
<item id="x_TableOfContents" media-type="application/xhtml+xml" href="Text/part0000.xhtml"/>
<item id="x_a1cover.html" media-type="application/xhtml+xml" href="Text/part0001.xhtml"/>
<item id="x_a1bookname" media-type="application/xhtml+xml" href="Text/part0002.xhtml"/>
<item id="x_a1TableOfContents" media-type="application/xhtml+xml" href="Text/part0003.xhtml"/>
<item id="x_a1Chapter001" media-type="application/xhtml+xml" href="Text/part0004.xhtml"/>
<item id="x_a1Chapter002" media-type="application/xhtml+xml" href="Text/part0005.xhtml"/>
<item id="x_a1Chapter003" media-type="application/xhtml+xml" href="Text/part0006.xhtml"/>
<item id="x_a1Chapter004" media-type="application/xhtml+xml" href="Text/part0007.xhtml"/>
<item id="x_a1Chapter005" media-type="application/xhtml+xml" href="Text/part0008.xhtml"/>
<item id="x_a1Chapter006" media-type="application/xhtml+xml" href="Text/part0009.xhtml"/>
<item id="x_a1Chapter007" media-type="application/xhtml+xml" href="Text/part0010.xhtml"/>
<item id="x_a1Chapter008" media-type="application/xhtml+xml" href="Text/part0011.xhtml"/>
<item id="x_a1Chapter009" media-type="application/xhtml+xml" href="Text/part0012.xhtml"/>
<item id="x_a1Chapter010" media-type="application/xhtml+xml" href="Text/part0013.xhtml"/>
<item id="x_a2cover.html" media-type="application/xhtml+xml" href="Text/part0014.xhtml"/>
<item id="x_a2bookname" media-type="application/xhtml+xml" href="Text/part0015.xhtml"/>
<item id="x_a2TableOfContents" media-type="application/xhtml+xml" href="Text/part0016.xhtml"/>
<item id="x_a2Chapter001" media-type="application/xhtml+xml" href="Text/part0017.xhtml"/>
<item id="x_a2Chapter002" media-type="application/xhtml+xml" href="Text/part0018.xhtml"/>
<item id="x_a2Chapter003" media-type="application/xhtml+xml" href="Text/part0019.xhtml"/>
<item id="x_a2Chapter004" media-type="application/xhtml+xml" href="Text/part0020.xhtml"/>
<item id="x_a2Chapter005" media-type="application/xhtml+xml" href="Text/part0021.xhtml"/>
<item id="x_a2Chapter006" media-type="application/xhtml+xml" href="Text/part0022.xhtml"/>
<item id="x_a2Chapter007" media-type="application/xhtml+xml" href="Text/part0023.xhtml"/>
<item id="x_a2Chapter008" media-type="application/xhtml+xml" href="Text/part0024.xhtml"/>
<item id="x_a2Chapter009" media-type="application/xhtml+xml" href="Text/part0025.xhtml"/>
<item id="x_a2Chapter010" media-type="application/xhtml+xml" href="Text/part0026.xhtml"/>
<item id="x_a3cover.html" media-type="application/xhtml+xml" href="Text/part0027.xhtml"/>
<item id="x_a3bookname" media-type="application/xhtml+xml" href="Text/part0028.xhtml"/>
<item id="x_a3TableOfContents" media-type="application/xhtml+xml" href="Text/part0029.xhtml"/>
<item id="x_a3Chapter001" media-type="application/xhtml+xml" href="Text/part0030.xhtml"/>
<item id="x_a3Chapter002" media-type="application/xhtml+xml" href="Text/part0031.xhtml"/>
<item id="x_a3Chapter003" media-type="application/xhtml+xml" href="Text/part0032.xhtml"/>
<item id="x_a3Chapter004" media-type="application/xhtml+xml" href="Text/part0033.xhtml"/>
<item id="x_a3Chapter005" media-type="application/xhtml+xml" href="Text/part0034.xhtml"/>
<item id="x_a3Chapter006" media-type="application/xhtml+xml" href="Text/part0035.xhtml"/>
<item id="x_a3Chapter007" media-type="application/xhtml+xml" href="Text/part0036.xhtml"/>
<item id="x_a3Chapter008" media-type="application/xhtml+xml" href="Text/part0037.xhtml"/>
<item id="x_a3Chapter009" media-type="application/xhtml+xml" href="Text/part0038.xhtml"/>
<item id="x_a4cover.html" media-type="application/xhtml+xml" href="Text/part0039.xhtml"/>
<item id="x_a4bookname" media-type="application/xhtml+xml" href="Text/part0040.xhtml"/>
<item id="x_a4TableOfContents" media-type="application/xhtml+xml" href="Text/part0041.xhtml"/>
<item id="x_a4Chapter001" media-type="application/xhtml+xml" href="Text/part0042.xhtml"/>
<item id="x_a4Chapter002" media-type="application/xhtml+xml" href="Text/part0043.xhtml"/>
<item id="x_a4Chapter003" media-type="application/xhtml+xml" href="Text/part0044.xhtml"/>
<item id="x_a4Chapter004" media-type="application/xhtml+xml" href="Text/part0045.xhtml"/>
<item id="x_a4Chapter005" media-type="application/xhtml+xml" href="Text/part0046.xhtml"/>
<item id="x_a4Chapter006" media-type="application/xhtml+xml" href="Text/part0047.xhtml"/>
<item id="x_a4Chapter007" media-type="application/xhtml+xml" href="Text/part0048.xhtml"/>
<item id="item50" media-type="text/css" href="Styles/style0001.css"/>
<item id="item51" media-type="text/css" href="Styles/style0002.css"/>
<item id="item52" media-type="text/css" href="Styles/style0003.css"/>
<item id="item53" media-type="text/css" href="Styles/style0004.css"/>
<item id="item54" media-type="text/css" href="Styles/style0005.css"/>
<item id="item55" media-type="text/css" href="Styles/style0006.css"/>
<item id="item56" media-type="text/css" href="Styles/style0007.css"/>
<item id="item57" media-type="text/css" href="Styles/style0008.css"/>
<item id="item58" media-type="text/css" href="Styles/style0009.css"/>
<item id="item59" media-type="image/jpeg" href="Images/image00456.jpeg"/>
<item id="item60" media-type="image/jpeg" href="Images/image00457.jpeg"/>
<item id="item61" media-type="image/jpeg" href="Images/image00458.jpeg"/>
<item id="item62" media-type="image/jpeg" href="Images/image00459.jpeg"/>
<item id="item63" media-type="image/jpeg" href="Images/image00460.jpeg"/>
<item id="item64" media-type="image/jpeg" href="Images/image00461.jpeg"/>
<item id="item65" media-type="image/jpeg" href="Images/image00462.jpeg"/>
<item id="item66" media-type="image/jpeg" href="Images/image00463.jpeg"/>
<item id="x_cover-image" media-type="image/jpeg" href="Images/cover00464.jpeg"/>
<item id="ncx" media-type="application/x-dtbncx+xml" href="toc.ncx"/>
</manifest>
<spine toc="ncx">
<itemref idref="x_cover" linear="no"/>
<itemref idref="x_TableOfContents" linear="yes"/>
<itemref idref="x_a1cover.html" linear="yes"/>
<itemref idref="x_a1bookname" linear="yes"/>
<itemref idref="x_a1TableOfContents" linear="yes"/>
<itemref idref="x_a1Chapter001" linear="yes"/>
<itemref idref="x_a1Chapter002" linear="yes"/>
<itemref idref="x_a1Chapter003" linear="yes"/>
<itemref idref="x_a1Chapter004" linear="yes"/>
<itemref idref="x_a1Chapter005" linear="yes"/>
<itemref idref="x_a1Chapter006" linear="yes"/>
<itemref idref="x_a1Chapter007" linear="yes"/>
<itemref idref="x_a1Chapter008" linear="yes"/>
<itemref idref="x_a1Chapter009" linear="yes"/>
<itemref idref="x_a1Chapter010" linear="yes"/>
<itemref idref="x_a2cover.html" linear="yes"/>
<itemref idref="x_a2bookname" linear="yes"/>
<itemref idref="x_a2TableOfContents" linear="yes"/>
<itemref idref="x_a2Chapter001" linear="yes"/>
<itemref idref="x_a2Chapter002" linear="yes"/>
<itemref idref="x_a2Chapter003" linear="yes"/>
<itemref idref="x_a2Chapter004" linear="yes"/>
<itemref idref="x_a2Chapter005" linear="yes"/>
<itemref idref="x_a2Chapter006" linear="yes"/>
<itemref idref="x_a2Chapter007" linear="yes"/>
<itemref idref="x_a2Chapter008" linear="yes"/>
<itemref idref="x_a2Chapter009" linear="yes"/>
<itemref idref="x_a2Chapter010" linear="yes"/>
<itemref idref="x_a3cover.html" linear="yes"/>
<itemref idref="x_a3bookname" linear="yes"/>
<itemref idref="x_a3TableOfContents" linear="yes"/>
<itemref idref="x_a3Chapter001" linear="yes"/>
<itemref idref="x_a3Chapter002" linear="yes"/>
<itemref idref="x_a3Chapter003" linear="yes"/>
<itemref idref="x_a3Chapter004" linear="yes"/>
<itemref idref="x_a3Chapter005" linear="yes"/>
<itemref idref="x_a3Chapter006" linear="yes"/>
<itemref idref="x_a3Chapter007" linear="yes"/>
<itemref idref="x_a3Chapter008" linear="yes"/>
<itemref idref="x_a3Chapter009" linear="yes"/>
<itemref idref="x_a4cover.html" linear="yes"/>
<itemref idref="x_a4bookname" linear="yes"/>
<itemref idref="x_a4TableOfContents" linear="yes"/>
<itemref idref="x_a4Chapter001" linear="yes"/>
<itemref idref="x_a4Chapter002" linear="yes"/>
<itemref idref="x_a4Chapter003" linear="yes"/>
<itemref idref="x_a4Chapter004" linear="yes"/>
<itemref idref="x_a4Chapter005" linear="yes"/>
<itemref idref="x_a4Chapter006" linear="yes"/>
<itemref idref="x_a4Chapter007" linear="yes"/>
</spine>
<tours>
</tours>
<guide>
<reference type="text" title="Start" href="Text/part0004.xhtml"/>
<reference type="toc" title="Table of Contents" href="Text/part0000.xhtml"/>
<reference type="cover" title="Cover" href="Text/cover_page.xhtml"/>
</guide>
</package>
</details>
說明我之前并沒有猜錯,這個文件存放的是超越“目錄”的東西立膛,而是“次序”——更進一步說揪罕。是“索引”。這個文件類似于其他文件格式或目錄樹中的 index.*
宝泵,將 ePub 中的各種數(shù)據(jù)編上號碼好啰,同時這里也定義了標題、語言儿奶、作者框往、出版 (發(fā)布) 日期等元信息。至于之前看到的超長行闯捎,似乎是一種十六進制的水印 (watermark) 椰弊,或許是為了防侵權(quán)等。
其中的 /package/manifest/item
定義了所有的索引瓤鼻,以及文件對應(yīng)的類型秉版; /package/spine/itemref
暫不知進一步的作用,但從中可看出能定義是否“線性” (linear) 茬祷; /package/guide/reference
定義了 ePub 的封面等索引清焕,可供文件管理器和 ePub 閱讀器使用 (顯示預(yù)覽頁) 。
Round IV.IV. 目錄祭犯!
再看 OEBPS/toc.ncx
:
[littleye233@lymjrolt test_epub]$ file OEBPS/toc.ncx
OEBPS/toc.ncx: XML 1.0 document, Unicode text, UTF-8 text
感覺再討論文件類型已經(jīng)無關(guān)緊要了秸妥。再次查看內(nèi)容:
<details>
<summary>點此查看 OEBPS/toc.ncx
的全部內(nèi)容 (已經(jīng)過格式化)</summary>
<?xml version="1.0" encoding="utf-8"?>
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="zh">
<head>
<meta content="3899198450" name="dtb:uid"/>
<meta content="2" name="dtb:depth"/>
<meta content="mobiunpack.py" name="dtb:generator"/>
<meta content="0" name="dtb:totalPageCount"/>
<meta content="0" name="dtb:maxPageNumber"/>
</head>
<docTitle>
<text>咖啡館推理事件簿系列(全四本)</text>
</docTitle>
<navMap>
<navPoint id="np_1" playOrder="1">
<navLabel>
<text>總目錄</text>
</navLabel>
<content src="Text/part0000.xhtml"/>
</navPoint>
<navPoint id="np_2" playOrder="2">
<navLabel>
<text>咖啡館推理事件簿:下次見面時,請讓我品嘗你煮的咖啡</text>
</navLabel>
<content src="Text/part0001.xhtml"/>
<navPoint id="np_3" playOrder="3">
<navLabel>
<text>序章</text>
</navLabel>
<content src="Text/part0005.xhtml"/>
</navPoint>
<navPoint id="np_4" playOrder="4">
<navLabel>
<text>一 事件始于第二次光顧</text>
</navLabel>
<content src="Text/part0006.xhtml"/>
</navPoint>
<navPoint id="np_5" playOrder="5">
<navLabel>
<text>二 Bittersweet Black</text>
</navLabel>
<content src="Text/part0007.xhtml"/>
</navPoint>
<navPoint id="np_6" playOrder="6">
<navLabel>
<text>三 隱藏在乳白色中的心</text>
</navLabel>
<content src="Text/part0008.xhtml"/>
</navPoint>
<navPoint id="np_7" playOrder="7">
<navLabel>
<text>四 棋盤上的狩獵</text>
</navLabel>
<content src="Text/part0009.xhtml"/>
</navPoint>
<navPoint id="np_8" playOrder="8">
<navLabel>
<text>五 past沃粗,present粥惧,f******?</text>
</navLabel>
<content src="Text/part0010.xhtml"/>
</navPoint>
<navPoint id="np_9" playOrder="9">
<navLabel>
<text>六 Animals in the closed room</text>
</navLabel>
<content src="Text/part0011.xhtml"/>
</navPoint>
<navPoint id="np_10" playOrder="10">
<navLabel>
<text>七 下次見面時陪每,請讓我品嘗你煮的咖啡</text>
</navLabel>
<content src="Text/part0012.xhtml"/>
</navPoint>
<navPoint id="np_11" playOrder="11">
<navLabel>
<text>終章</text>
</navLabel>
<content src="Text/part0013.xhtml"/>
</navPoint>
</navPoint>
<navPoint id="np_12" playOrder="12">
<navLabel>
<text>咖啡館推理事件簿2:她夢到了歐蕾咖啡</text>
</navLabel>
<content src="Text/part0014.xhtml"/>
<navPoint id="np_13" playOrder="13">
<navLabel>
<text>序曲 她的夢</text>
</navLabel>
<content src="Text/part0018.xhtml"/>
</navPoint>
<navPoint id="np_14" playOrder="14">
<navLabel>
<text>第一章 敬啟致未來的你</text>
</navLabel>
<content src="Text/part0019.xhtml"/>
</navPoint>
<navPoint id="np_15" playOrder="15">
<navLabel>
<text>第二章 狐貍的迷惑</text>
</navLabel>
<content src="Text/part0020.xhtml"/>
</navPoint>
<navPoint id="np_16" playOrder="16">
<navLabel>
<text>第三章 打碎乳白色的心</text>
</navLabel>
<content src="Text/part0021.xhtml"/>
</navPoint>
<navPoint id="np_17" playOrder="17">
<navLabel>
<text>第四章 咖啡偵探蕾拉事件簿</text>
</navLabel>
<content src="Text/part0022.xhtml"/>
</navPoint>
<navPoint id="np_18" playOrder="18">
<navLabel>
<text>第五章∮跋(She Wanted To Be)WANTED</text>
</navLabel>
<content src="Text/part0023.xhtml"/>
</navPoint>
<navPoint id="np_19" playOrder="19">
<navLabel>
<text>第六章 the Sky Occluded in the Sun</text>
</navLabel>
<content src="Text/part0024.xhtml"/>
</navPoint>
<navPoint id="np_20" playOrder="20">
<navLabel>
<text>第七章 在星空之下同命相連</text>
</navLabel>
<content src="Text/part0025.xhtml"/>
</navPoint>
<navPoint id="np_21" playOrder="21">
<navLabel>
<text>終章 她夢到了歐蕾咖啡</text>
</navLabel>
<content src="Text/part0026.xhtml"/>
</navPoint>
</navPoint>
<navPoint id="np_22" playOrder="22">
<navLabel>
<text>咖啡館推理事件簿3:擾人心神的咖啡</text>
</navLabel>
<content src="Text/part0027.xhtml"/>
<navPoint id="np_23" playOrder="23">
<navLabel>
<text>序曲 五年前</text>
</navLabel>
<content src="Text/part0031.xhtml"/>
</navPoint>
<navPoint id="np_24" playOrder="24">
<navLabel>
<text>第一章 參加大賽</text>
</navLabel>
<content src="Text/part0032.xhtml"/>
</navPoint>
<navPoint id="np_25" playOrder="25">
<navLabel>
<text>第二章 前夜</text>
</navLabel>
<content src="Text/part0033.xhtml"/>
</navPoint>
<navPoint id="np_26" playOrder="26">
<navLabel>
<text>第三章 第一天</text>
</navLabel>
<content src="Text/part0034.xhtml"/>
</navPoint>
<navPoint id="np_27" playOrder="27">
<navLabel>
<text>第四章 第二天</text>
</navLabel>
<content src="Text/part0035.xhtml"/>
</navPoint>
<navPoint id="np_28" playOrder="28">
<navLabel>
<text>第五章 真相</text>
</navLabel>
<content src="Text/part0036.xhtml"/>
</navPoint>
<navPoint id="np_29" playOrder="29">
<navLabel>
<text>第六章 日后</text>
</navLabel>
<content src="Text/part0037.xhtml"/>
</navPoint>
<navPoint id="np_30" playOrder="30">
<navLabel>
<text>尾聲 五年前</text>
</navLabel>
<content src="Text/part0038.xhtml"/>
</navPoint>
</navPoint>
<navPoint id="np_31" playOrder="31">
<navLabel>
<text>咖啡館推理事件簿4:休閑時光的五種風(fēng)味</text>
</navLabel>
<content src="Text/part0039.xhtml"/>
<navPoint id="np_32" playOrder="32">
<navLabel>
<text>午后三點前的無聊風(fēng)景</text>
</navLabel>
<content src="Text/part0043.xhtml"/>
</navPoint>
<navPoint id="np_33" playOrder="33">
<navLabel>
<text>帕列塔之戀</text>
</navLabel>
<content src="Text/part0044.xhtml"/>
</navPoint>
<navPoint id="np_34" playOrder="34">
<navLabel>
<text>消失的禮物飛鏢</text>
</navLabel>
<content src="Text/part0045.xhtml"/>
</navPoint>
<navPoint id="np_35" playOrder="35">
<navLabel>
<text>可視化的原生藝術(shù)</text>
</navLabel>
<content src="Text/part0046.xhtml"/>
</navPoint>
<navPoint id="np_36" playOrder="36">
<navLabel>
<text>在塔列蘭咖啡館的庭院里</text>
</navLabel>
<content src="Text/part0047.xhtml"/>
</navPoint>
<navPoint id="np_37" playOrder="37">
<navLabel>
<text>特別篇 如釋重負</text>
</navLabel>
<content src="Text/part0048.xhtml"/>
</navPoint>
</navPoint>
</navMap>
</ncx>
</details>
我們不妨將目光轉(zhuǎn)向較為重要的“目錄”的定義上。為了方便觀察檩禾,筆者偷點懶挂签,使用桌面環(huán)境中自帶的閱讀器觀察:
從中可以看出目錄是二層結(jié)構(gòu),恰好和 OEBPS/toc.ncx
中的定義保持一致盼产。而其中的部分重要屬性均可“望文生義”饵婆,此處不再進一步研究。
Round IV.V. 其余部分
最后剩下的是圖片戏售、文字和層疊樣式表侨核。雖然這部分是在整個 ePub 文件中占比最大也可以說是最重要的部分草穆,但由于這一塊的內(nèi)容實在是太過直白,再講下去恐怕要開始補習(xí) HTML 和 CSS 知識了搓译,故同樣略去悲柱。
總結(jié)
根據(jù)上文中的簡要探究, ePub 是一種以 XML 文件格式為配置文件類型的些己、包含有圖片及文字等數(shù)據(jù)的豌鸡、以壓縮檔為本質(zhì)的文件格式。查閱相關(guān)資料后可知其實質(zhì)與上文中分析類似段标。
而通過上文的分析涯冠,我們初步體驗到分析一種陌生文件格式的規(guī)律和技巧,可以用于后續(xù)對更復(fù)雜的文件格式的探究逼庞。
但最后蛇更,別忘了把那個 ePub 文件的名字改回來 XD :
[littleye233@lymjrolt Downloads]$ mv test.epub 咖啡館推理事件簿系列(全四本).epub
【完】
腳注
-
此處為 XPath 語法,用于描述類 XML 文件各種元素的位置赛糟,后文類似者不再注明派任。 ?