1.5.5 GOT and PLT
The Global Offset Table (GOT) and Procedure Linkage Table (PLT) are the two data structures central to the ELF run-time. We will introduce now the reasons why they are used and what consequences arise from that.
GOT和PLT是運(yùn)行時(shí)的兩個(gè)主要數(shù)據(jù)結(jié)構(gòu)筏餐。我們將介紹為什么這樣用和這樣用的結(jié)果。
Relocations are created for source constructs like
重加載創(chuàng)建構(gòu)造如下:
extern int foo;
extern int bar (int); int call_bar (void) { return bar (foo); }
The call to bar requires two relocations: one to load the value of foo and another one to find the address of bar. If the code would be generated knowing the addresses of the variable and the function the assembler instructions would directly load from or jump to the address. For IA- 32 the code would look like this:
調(diào)用bar有兩個(gè)需要重加載的過程:加載foo涯竟,加載bar函數(shù)洞慎。如果生成的代碼知道foo的地址栈幸,函數(shù)會(huì)直接加載或者跳轉(zhuǎn)到相應(yīng)地址脯燃。IA-32 的代碼如下:
pushl foo
call bar
This would encode the addresses of foo and bar as part of the instruction in the text segment. If the address is only known to the dynamic linker the text segment would have to be modified at run-time. According to what we learned above this must be avoided.
這會(huì)把foo和bar的地址作為指令編碼到代碼段。如果只知道動(dòng)態(tài)鏈接的代碼地址需要運(yùn)行時(shí)修改议慰。通過之前了解情況需要阻止這種情況發(fā)生蠢古。
Therefore the code generated for DSOs, i.e., when using -fpic or -fPIC, looks like this:
因此代碼生成DSOs,使用-fpic或-fPIC參數(shù)别凹,如下:
movl foo@GOT(%ebx), %eax
pushl (%eax)
call bar@PLT
The address of the variable foo is now not part of the instruction. Instead it is loaded from the GOT. The address of the location in the GOT relative to the PIC register
value (%ebx) is known at link-time. Therefore the text segment does not have to be changed, only the GOT.
現(xiàn)在foo的地址不在指令中草讶,而是從GOT加載。現(xiàn)在地址是GOT之中通過PIC注冊(cè)的鏈接時(shí)相對(duì)地址炉菲。因此代碼段不需要改變到涂,只需要修改GOT內(nèi)容。
The situation for the function call is similar. The function bar is not called directly. Instead control is transferred to a stub for bar in the PLT (indicated by bar@PLT). For IA-32 the PLT itself does not have to be modified and can be placed in a read-only segment, each entry is 16 bytes in size. Only the GOT is modified and each entry consists of 4 bytes. The structure of the PLT in an IA-32 DSO looks like this:
函數(shù)的解決方案類似颁督。不是直接調(diào)用践啄。使用PLT。在IA-32上PLT本身不會(huì)修改沉御,可以存入代碼段屿讽,每一個(gè)有16字節(jié)大小。只有GOT修改每個(gè)實(shí)例的4個(gè)字節(jié)吠裆。IA-32上結(jié)構(gòu)類似如下形式:
.PLOT:
pushl 4(%ebx) jmp *8(%ebx)
nop; nop
nop; nop
.PLT1:jmp *name1@GOT(%ebx)
pushl $offset1
jmp .PLT0@PC
.PLT2:jmp *name2@GOT(%ebx)
pushl $offset2
jmp .PLT0@PC
This shows three entries, there are as many as needed, all having the same size. The first entry, labeled with .PLT0, is special. It is used internally as we will see. All the following entries belong to exactly one function symbol. The first instruction is an indirect jump where the address is taken from a slot in the GOT. Each PLT entry has one GOT slot. At startup time the dynamic linker fills the GOT slot with the address pointing to the second instruction of the appropriate PLT entry. I.e., when the PLT entry is used for the first time the jump ends at the following pushl instruction. The value pushed on the stack is also specific to the PLT slot and it is the offset of the relocation entry for the function which should be called. Then control is transferred to the special first PLT entry which pushes some more values on the stack and finally jumps into the dynamic linker. The dynamic linker has do make sure that the third GOT slot (offset 8) contains the address of the entry point in the dynamic linker. Once the dynamic linker has determined the address of the function it stores the result in the GOT entry which was used in the jmp instruction at the beginning of the PLT entry before jumping to the found function. This has the effect that all future uses of the PLT entry will not go through the dynamic linker, but will instead directly transfer to the function. The overhead for all but the first call is therefore “only” one indirect jump.
這里展示出三個(gè)示例伐谈,和需要的一樣,都有相同的大小试疙。第一個(gè)標(biāo)記PLTO诵棵。如同所見是內(nèi)部使用的。后面的示例屬于擴(kuò)展的一個(gè)功能祝旷。第一個(gè)指令是一個(gè)從GOT上使用的內(nèi)部跳轉(zhuǎn)指令履澳。每個(gè)PLT有一個(gè)GOT。啟動(dòng)時(shí)動(dòng)態(tài)鏈接器使用適當(dāng)?shù)腜LT內(nèi)容填充GOT相應(yīng)槽位地址怀跛,當(dāng)PLT第一次使用跳轉(zhuǎn)到相應(yīng)指令地址距贷。入棧的值也是PLT槽和偏移內(nèi)容用于函數(shù)調(diào)用。然后控制轉(zhuǎn)換PLT入棧更多值最終跳轉(zhuǎn)到動(dòng)態(tài)鏈接器吻谋。動(dòng)態(tài)鏈接器必須確定第三個(gè)GOT槽位包含動(dòng)態(tài)鏈接器的地址忠蝗。一旦動(dòng)態(tài)鏈接器確定了函數(shù)存儲(chǔ)在GOT中的地址,這個(gè)地址回=會(huì)用于PLT入口地址跳轉(zhuǎn)漓拾。這樣所有的使用PLT實(shí)例的內(nèi)容不再進(jìn)入動(dòng)態(tài)鏈接器阁最,使用轉(zhuǎn)換的函數(shù)戒祠。除了第一次調(diào)用,其他只需要一個(gè)直接跳轉(zhuǎn)就OK啦速种。
The PLT stub is always used if the function is not guaranteed to be defined in the object which references it. Please note that a simple definition in the object with the reference is not enough to avoid the PLT entry. Looking at the symbol lookup process it should be clear that the definition could be found in another object (interposition) in which case the PLT is needed. We will later explain exactly when and how to avoid PLT entries.
PLT的stub在函數(shù)不是對(duì)象本身定義的情況下總是使用得哆。注意,只是簡(jiǎn)單的定義引用不能省略PLT內(nèi)容哟旗。查找語法標(biāo)識(shí)的進(jìn)程需要清晰的知道定義位置。后面回展開分析何時(shí)如何不使用PLT栋操。
How exactly the GOT and PLT is structured is architecture-specific, specified in the respective psABI. What was said here about IA-32 is in some form applicable to some other architectures but not for all. For instance, while the PLT on IA-32 is read-only it must be writable for other architectures since instead of indirect jumps using GOT values the PLT entries are modified directly. A reader might think that the designers of the IA-32 ABI made a mistake by requiring a indirect, and therefore slower, call instead of a direct call. This is no mistake, though. Hav- ing a writable and executable segment is a huge security problem since attackers can simply write arbitrary code into the PLT and take over the program. We can anyhow summarize the costs of using GOT and PLT like this:
GOT和PLT的結(jié)構(gòu)是特定于體系結(jié)構(gòu)的闸餐,具體在各自的psABI中指定。這里所說的IA-32在某種程度上適用于其他一些架構(gòu)矾芙,但并非適用于所有架構(gòu)舍沙。例如,雖然IA-32上的PLT是只讀的剔宪,但對(duì)于其他架構(gòu)來說拂铡,它必須是可寫的,因?yàn)镻LT條目是直接修改的葱绒,而不是使用GOT值的間接跳轉(zhuǎn)感帅。讀者可能會(huì)認(rèn)為IA-32 ABI的設(shè)計(jì)者犯了一個(gè)錯(cuò)誤,即要求使用間接的地淀、因此更慢的調(diào)用失球,而不是直接調(diào)用。不過帮毁,這不是錯(cuò)誤实苞。擁有可寫和可執(zhí)行的段是一個(gè)巨大的安全問題,因?yàn)楣粽呖梢院?jiǎn)單地將任意代碼寫入PLT并接管程序烈疚∏#總之,我們可以這樣總結(jié)使用GOT和PLT的成本: (有道翻譯)
? every use of a global variable which is exported uses a GOT entry and loads the variable values in- directly;
每次使用被導(dǎo)出的全局變量時(shí)爷肝,都使用一個(gè)GOT條目并直接加載變量值;(有道翻譯)
? each function which is called (as opposed to refer- enced as a variable) which is not guaranteed to be defined in the calling object requires a PLT entry. The function call is performed indirectly by trans- ferring control first to the code in the PLT entry which in turn calls the function.
?每個(gè)被調(diào)用(相對(duì)于作為變量引用)的函數(shù)猾浦,如果不能保證在調(diào)用對(duì)象中定義,則需要一個(gè)PLT條目灯抛。函數(shù)調(diào)用是間接執(zhí)行的跃巡,首先將控制傳遞給PLT條目中的代碼,該代碼反過來調(diào)用函數(shù)牧愁。(有道翻譯)
? for some architectures each PLT entry requires at least one GOT entry.
?對(duì)于某些架構(gòu)素邪,每個(gè)PLT條目至少需要一個(gè)GOT條目。(有道翻譯)
Avoiding a jump through the PLT therefore removes on IA-32 16 bytes of text and 4 bytes of data. Avoiding the GOT when accessing a global variable saves 4 bytes of data and one load instruction (i.e., at least 3 bytes of code and cycles during the execution). In addition each GOT entry has a relocation associated with the costs described above.
因此猪半,避免PLT的跳轉(zhuǎn)將在IA-32上刪除16字節(jié)的文本和4字節(jié)的數(shù)據(jù)兔朦。在訪問全局變量時(shí)避免GOT將節(jié)省4個(gè)字節(jié)的數(shù)據(jù)和一個(gè)加載指令(即偷线,在執(zhí)行期間至少節(jié)省3個(gè)字節(jié)的代碼和周期)。此外沽甥,每個(gè)GOT條目都有一個(gè)與上述成本相關(guān)的重新定位声邦。(有道翻譯)
todo:重讀