引用
前言
fishhook 是什么?
fishhook 是用于 Mach-O 的符號(hào)動(dòng)態(tài)綁定的 facebook 開源維護(hù)的的第三方庫(kù)棚愤。當(dāng) Mach-O 加載第三方庫(kù)的時(shí)候宛畦,可以用 fishhook 進(jìn)行hook揍移。
具體的參考 fishhook 在 GitHub 上的 readme .
原理
readme 上已經(jīng)說的很清楚了那伐。但是想要搞清楚原理,還有很多信息需要補(bǔ)充畅形。
一句話概括就是:fishhook 找到目標(biāo)函數(shù)地址诉探,然后替換成自己的函數(shù)地址肾胯,達(dá)到 hook 的目的。
Mach-O
讓我們來看看 Mach-O 的文件格式宪潮。Mach-O 是Mac OS X 上程序的標(biāo)準(zhǔn)格式滚粟。
這里有張圖弄慰,來表示 Mach-O 的基本格式。
Mach-O 文件包含三個(gè)主要區(qū)域:
- header structure
- load commands
- segment
其中斋日,segment 是一段一段的墓陈。
每個(gè) segment 包含 0 個(gè)或 多個(gè) section贡必。每個(gè) section 包含 code 或者 特殊類型的 data。每個(gè) segment 定義了 一片虛擬內(nèi)存 衫樊,其中動(dòng)態(tài)鏈接器映射到程序的地址空間科侈。
Mach-O 在內(nèi)存中的布局結(jié)構(gòu):
fishhook 要做的事情就是找到 segment: DATA 臀栈,然后找到 section:__la_symbol_ptr挠乳。最后睡扬,找到其中的目標(biāo)函數(shù)地址,替換目標(biāo)函數(shù)地址為自己的函數(shù)地址屎开。
具體分為2個(gè)步驟:
- 找到目標(biāo)函數(shù)地址
- 找到函數(shù)名马靠,一一比對(duì)虑粥。成功后,替換目標(biāo)函數(shù)地址為自己的函數(shù)地址第晰。
其中,找到函數(shù)名的過程甜熔,有點(diǎn)曲折突倍。string table 是一個(gè)數(shù)組,里面包含了函數(shù)名焊虏。根據(jù)偏移 offset 能找到需要的函數(shù)名诵闭。
分析
進(jìn)行符號(hào)綁定前,我們能獲得的信息有:
- mach_header *header瘟芝,header 結(jié)構(gòu) 指針
- intptr_t slide模狭,ALSR 偏移
接著 header 的是 load commands踩衩。 load commans 里面包含了 LG_SEGMENT驱富。 LG_SEGMENT 有4種:
- __PAGEZERO
- __TEXT
- __DATA
- __LINKEDIT
PAGEZERO
是可執(zhí)行程序的第一個(gè)段∠呓牛總是位于虛擬內(nèi)存中的最開始的位置浑侥。它的大小是根據(jù)架構(gòu)類型來的晰绎。在X64里面的荞下,大小是 0x0000000100000000 。
TEXT
里面存儲(chǔ)的是 code 和 只讀數(shù)據(jù)仰税。我們這里用不到陨簇。
DATA
里面包含了可讀寫數(shù)據(jù)迹淌。里面的 section header 正是我們需要的信息。最重要的是找到 section header 里的 __nl_symbol_ptr
和 __la_symbol_ptr
LINKEDIT
里面記錄了動(dòng)態(tài)連接器的一些信息荷鼠,fishhook 主要通過它找到基地址榔幸,一般得到的地址就是 PAGEZERO
之后的地址削咆,也就是 0x0000000100000000蠢笋。
LG_SEGMENT(__DATA)
和 LG_SEGMENT(__TEXT)
里面都包含了 section header昨寞。我們主要用到 DATA 里的 section header ( 主要是 __nl_symbol_ptr
和 __la_symbol_ptr
)援岩。
LG_SEGMENT(__DATA)
包含如下 section header :
__nl_symbol_ptr
__got
__la_symbol_ptr
__objc_imageinfo
__bss
__nl_symbol_ptr
is an array of pointers to non-lazily bound data (these are bound at the time a library is loaded) and__la_symbol_ptr
is an array of pointers to imported functions that is generally filled by a routine called dyld_stub_binder during the first call to that symbol (it's also possible to tell dyld to bind these at launch)
我們先來看看如何調(diào)用 fishhook
static void (*orig_foo)(int);
void my_foo(int x) {
printf("real func: %d\n",x);
}
int main(int argc, const char * argv[]) {
@autoreleasepool {
rebind_symbols((struct rebinding[1]){"foo",my_foo,(void *)&orig_foo}, 1);
foo(20);
}
return 0;
}
函數(shù)內(nèi)部調(diào)用:
- rebind_symbols_image
- _rebind_symbols_for_image
- rebind_symbols_for_image
- perform_rebinding_with_section
rebind_symbols
里大體上做了如下操作:
- 開辟鏈表
rebindings_entry
空間享怀,并填入信息 - 判斷是不是第一次調(diào)用添瓷。第一次調(diào)用的話,對(duì)添加的 image 注冊(cè)回調(diào):
_dyld_register_func_for_add_image(_rebind_symbols_for_image);
- 非第一次的話坯汤,在已有的 image 里調(diào)用
_rebind_symbols_for_image
_rebind_symbols_for_image
里只是調(diào)用: rebind_symbols_for_image
玫霎。
rebind_symbols_for_image
rebind_symbols_for_image 大體上做如下動(dòng)作:
- 在 load commands 里找到 linkedit_segment庶近、symtab_cmd眷蚓、dysymtab_cmd
- 通過 linkedit_segment 和 symtab_cmd 找到 symtab沙热、strtab
- 在 load commands 里找到
__DATA,__la_symbol_ptr
和__DATA,__nl_symbol_ptr
- 調(diào)用 perform_rebinding_with_section
查找 linkedit_segment、symtab_cmd投队、dysymtab_cmd敷鸦。在 MachOView 里扒披,查看事例程序 TestMacOS(參見下面鏈接)的內(nèi)存布局:
mach_header_64
struct mach_header_64
{
uint32_t magic;
cpu_type_t cputype;
cpu_subtype_t cpusubtype;
uint32_t filetype;
uint32_t ncmds;
uint32_t sizeofcmds;
uint32_t flags;
uint32_t reserved;
};
- ncmds
- load commands 的數(shù)量
segment_command_64
struct segment_command_64
{
uint32_t cmd;
uint32_t cmdsize;
char segname[16];
uint64_t vmaddr;
uint64_t vmsize;
uint64_t fileoff;
uint64_t filesize;
vm_prot_t maxprot;
vm_prot_t initprot;
uint32_t nsects;
uint32_t flags;
};
- cmd
- 數(shù)字從0x1開始碟案,代表不同的 load command
- cmdsize
- 當(dāng)前 cmd 的大小
- segname
- segment 的名字,大寫的辆亏。如:
__TEXT
和__DATA
- vmaddr
- 段在虛擬內(nèi)存的開始地址
- fileoff
- vmaddr 的偏移
#define LC_SEGMENT_ARCH_DEPENDENT LC_SEGMENT_64
//獲得 load commands 的起始地址
uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
//查找LG_SEGMENT_64->__LINKEDIT
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
if (strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0) {
linkedit_segment = cur_seg_cmd;
}
}
//查找 LG_SYMTAB
else if (cur_seg_cmd->cmd == LC_SYMTAB) {
symtab_cmd = (struct symtab_command*)cur_seg_cmd;
}
//查找 LG_DYSYMTAB
else if (cur_seg_cmd->cmd == LC_DYSYMTAB) {
dysymtab_cmd = (struct dysymtab_command*)cur_seg_cmd;
}
}
找到 linkedit_base
// Find base symbol/string table addresses
uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;
找到 symtab、strtab疑苔、dysymtab_command:
symtab_command
struct symtab_command
{
uint_32 cmd;
uint_32 cmdsize;
uint_32 symoff;
uint_32 nsyms;
uint_32 stroff;
uint_32 strsize;
};
- symoff
- symbol table 的偏移
- stroff
- string table 的偏移
dysymtab_command
struct dysymtab_command
{
uint32_t cmd;
uint32_t cmdsize;
uint32_t ilocalsym;
uint32_t nlocalsym;
uint32_t iextdefsym;
uint32_t nextdefsym;
uint32_t iundefsym;
uint32_t nundefsym;
uint32_t tocoff;
uint32_t ntoc;
uint32_t modtaboff;
uint32_t nmodtab;
uint32_t extrefsymoff;
uint32_t nextrefsyms;
uint32_t indirectsymoff;
uint32_t nindirectsyms;
uint32_t extreloff;
uint32_t nextrel;
uint32_t locreloff;
uint32_t nlocrel;
};
- indirectsymoff
- indirect symbol table 的偏移
nlist_64
struct nlist_64
{
union {
uint32_t n_strx;
} n_un;
uint8_t n_type;
uint8_t n_sect;
uint16_t n_desc;
uint64_t n_value;
};
- n_un
- 函數(shù)名在 string table 的 index
//注意:這里的偏移都是基于基地址( linkedit_base )的偏移
nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff);
char *strtab = (char *)(linkedit_base + symtab_cmd->stroff);
uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff);
獲取 section header兵迅,并調(diào)用 perform_rebinding_with_section:
cur = (uintptr_t)header + sizeof(mach_header_t);
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
if (strcmp(cur_seg_cmd->segname, SEG_DATA) != 0 &&
strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) != 0) {
continue;
}
for (uint j = 0; j < cur_seg_cmd->nsects; j++) {
section_t *sect =
(section_t *)(cur + sizeof(segment_command_t)) + j;
if ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {
perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
}
if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {
perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
}
}
}
}
perform_rebinding_with_section
section_64
struct section_64
{
char sectname[16];
char segname[16];
uint64_t addr;
uint64_t size;
uint32_t offset;
uint32_t align;
uint32_t reloff;
uint32_t nreloc;
uint32_t flags;
uint32_t reserved1;
uint32_t reserved2;
};
- addr
- section 的虛擬內(nèi)存地址,類型是 integer 的
- reserved1
- 對(duì)于 symbol pointer sections 和 stubs sections 來說瞧省,reserved1 表示 indirect table 數(shù)組的 index。用來索引 section's entries.
uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1;
//找到庫(kù)函數(shù)的地址: section->addr <==> section(__DATA,__la_symbol_ptr)
void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr);
for (uint i = 0; i < section->size / sizeof(void *); i++) {
//查找indirect_symbol_indices數(shù)組鞍匾,獲取其中的內(nèi)容,得到 symtab_index
uint32_t symtab_index = indirect_symbol_indices[i];
if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL ||
symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) {
continue;
}
//找到函數(shù)名位于 string table 的偏移
uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx;
//找到 函數(shù)名
char *symbol_name = strtab + strtab_offset;
if (strnlen(symbol_name, 2) < 2) {
continue;
}
struct rebindings_entry *cur = rebindings;
while (cur) {
for (uint j = 0; j < cur->rebindings_nel; j++) {
if (strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) {
if (cur->rebindings[j].replaced != NULL &&
indirect_symbol_bindings[i] != cur->rebindings[j].replacement) {
//把函數(shù)地址保存到 rebindings.replaced 里
*(cur->rebindings[j].replaced) = indirect_symbol_bindings[i];
}
//替換內(nèi)容為自定義函數(shù)地址
indirect_symbol_bindings[i] = cur->rebindings[j].replacement;
goto symbol_loop;
}
}
cur = cur->next;
}
symbol_loop:;
}
這里有張圖交洗,仿照著 fishhook 的 readme 畫的。
附錄Demo