工作中,需要用ghostscript庫將pdf文件中的非內(nèi)嵌字體全部內(nèi)嵌。先測試一下ghostscript直接處理
gswin64c.exe -o new.pdf -sDEVICE=pdfwrite bad.pdf
GPL Ghostscript 9.56.1 (2022-04-04)
Copyright (C) 2022 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 3.
Page 1
Loading font KaiTi (or substitute) from %rom%Resource/Font/NimbusSans-Regular
Loading CIDFont KaiTi (or substitute) from C:/Windows/Fonts/simkai.ttf
Page 2
Page 3
這個源pdf中,有部分KaiTi字體沒有內(nèi)嵌。而Ghostscript因為內(nèi)置字體沒有楷體,用了NimbusSans來替代赘娄,從而導致轉出來的pdf,有部分文字亂碼宏蛉。
為了讓ghostscript可以使用Windows的自帶字體文件遣臼,需要設置FONTPATH指向C:\Windows\Fonts,參考
https://stackoverflow.com/questions/73707930/why-is-ghostscript-replacing-embedded-fonts
gswin64c.exe -o new.pdf -sDEVICE=pdfwrite -sFONTPATH="C:\Windows\Fonts" -dNEWPDF=false bad.pdf
Processing pages 1 through 3.
Page 1
Scanning C:\Windows\Fonts for fonts... 461 files, 269 scanned, 244 new fonts.
Can't find (or can't open) font file %rom%Resource/Font/%rom%.
Can't find (or can't open) font file KaiTi.
Loading KaiTi font from C:\Windows\Fonts/simkai.ttf... 7509676 6105470 22306756 20455617 4 done.
Loading a TT font from C:/Windows/Fonts/simkai.ttf to emulate a CID font KaiTi ... Done.
Page 2
Loading a TT font from C:/Windows/Fonts/simkai.ttf to emulate a CID font KaiTi ... Done.
Can't find (or can't open) font file %rom%Resource/Font/%rom%.
Can't find (or can't open) font file KaiTi.
Loading KaiTi font from C:\Windows\Fonts/simkai.ttf... 9374496 6674894 23208168 21339269 4 done.
Page 3
如果不加 -dNEWPDF=false拾并,雖然顯示可以加載出C:\Windows\Fonts/simkai.ttf揍堰,但是轉換出來的pdf還是有亂碼。Ghostscript的作者解釋到這是9.56.1的bug(實測10.00.0還是有bug辟灰,10.02.0就修復了)
https://stackoverflow.com/questions/72657607/how-to-use-fonts-from-window-fonts-folder-for-a-pdf-in-ghostscript
回到正題个榕。在工作中,是調用ghostscript.dll來完成pdf文件處理的芥喇,按道理把命令行試驗通過的參數(shù)全部傳一遍進 gsapi_init_with_args 就完事了西采。
gs_argv[0] = "gs";
gs_argv[1] = "-o";
gs_argv[2] = "new.pdf";
gs_argv[3] = "-sDEVICE=pdfwrite";
gs_argv[4] = "-sFONTPATH=\"C:/Windows/Fonts\"";
gs_argv[5] = "-dNEWPDF=false";
gs_argv[6] = inputFile;
code = f_gsapi_init_with_args(gs_inst, 7, gs_argv);
但是,實際運行卻不行
Processing pages 1 through 3.
Page 1
Scanning "C:/Windows/Fonts" for fonts... 0 files, 0 scanned, 0 new fonts.
Querying operating system for font files...
Substituting font Helvetica for KaiTi.
Loading NimbusSans-Regular font from %rom%Resource/Font/NimbusSans-Regular... 4916440 3557620 7087640 5585527 4 done.
Loading a TT font from C:/Windows/Fonts/simkai.ttf to emulate a CID font KaiTi ... Done.
Page 2
Loading a TT font from C:/Windows/Fonts/simkai.ttf to emulate a CID font KaiTi ... Done.
Substituting font Helvetica for KaiTi.
Page 3
從 C:/Windows/Fonts 目錄中沒有掃描出來字體文件继控!開始以為是調用 ghostscript.dll 流程不對械馆,導致比gswin64c命令行少執(zhí)行了些初始化操作。于是下載了ghostscript的源代碼武通,認真看了 gswin64c 命令行的執(zhí)行過程霹崎,位于 ghostscript-9.56.1\psi\dwmainc.c。我的程序是少調用了 psapi_set_arg_encoding冶忱、gsapi_set_stdio尾菇、gsapi_run_string等動作。但是把這些動作補全囚枪,依然是讀取不到字庫文件派诬。繼續(xù)搜索代碼,發(fā)現(xiàn) Scanning "C:/Windows/Fonts" for fonts 這行打印語句链沼,是 Resource\Init\gs_fonts.ps 這個postscript腳本打印出來的默赂,要想調試ps腳本比調試c代碼難多了。調試了好一會的初始化c代碼括勺,也沒有發(fā)現(xiàn)什么端倪缆八。
后來又想到曲掰,要能夠讀取"C:/Windows/Fonts"目錄中的字體文件,肯定需要給這個目錄賦予讀權限奈辰。dll層賦予目錄權限的api是 gsapi_add_control_path栏妖,根據(jù)代碼一層層跟蹤,最后實際運行函數(shù)是 gs_add_control_path_len_flags()冯挎。在這里打了一個斷點底哥,觀察傳進來的 path 參數(shù)咙鞍。這個函數(shù)被調用20來次之后房官,終于傳進來 C:/Windows/Fonts 目錄名。對于命令行 gswin64c.exe 续滋,傳過來的目錄名是對的翰守。但是對于我通過 gsapi_init_with_args 的調用,傳過來的目錄名變成了 "C:/Windows/Fonts"疲酌,前后多了雙引號@濉(其實如果認真一點地對比命令行中的輸出,應該也可以發(fā)現(xiàn)這個差異)
繼續(xù)跟蹤朗恳,可以發(fā)現(xiàn)湿颅,命令行中的 -sFONTPATH="C:\Windows\Fonts",傳到wmain函數(shù)時粥诫,argv中得到的已經(jīng)是 -sFONTPATH=C:\Windows\Fonts油航,windows的命令行在解析命令行參數(shù)的時候,已經(jīng)把雙引號去掉了怀浆。但如果我們直接傳 -sFONTPATH="C:\Windows\Fonts" 給gsapi_init_with_args谊囚,這個雙引號原封保留帶了過去,從而導致了postscript中看到的FONTPATH參數(shù)帶了雙引號