chapter2 Instructions: Language of the Computer
秉承著計(jì)算機(jī)專(zhuān)家的一貫思路捷泞,首先介紹一下子指令;順序一般是從簡(jiǎn)單到復(fù)雜杉畜,從普通到特殊湿弦;
主要有如下章節(jié):
2.1 Introduction
2.2 Operations of the Computer Hardware
2.3 Operands of the Computer Hardware
2.4 Signed ad Unsigned Numbers
2.5 Representing Instructions in the Computer
2.6 Logical Operations
2.7 Instructions for Making Decisions
2.8 Supporting Procedures in Computer Hardware
2.9 Communicating with People
2.10 RISC-V Addressing for Wide Immediates and Addresses
2.11 Parallelism and Instructions: Synchronization
2.12 Translating and Starting a Program
2.13 A C Sort Example to Put it All Together
2.14 Arrays versus Pointers
2.15 Advanced Material: Compiling C and Interpreting Java
2.16 Real Stuff: MIPS Instructions
2.17 Real Stuff:x86 Instructions
2.18 Real Stuff: The Rest of the RISC-V Instruction Set
2.19 Fallacies and Pitfalls
2.20 Concluding Remarks
2.21 Historical Perspective and Further Reading
下面分章節(jié)介紹如下:
2.1 Introduction
兩個(gè)概念
- instruction:計(jì)算機(jī)的詞匯
- instruction set:計(jì)算機(jī)詞匯表
本章以top-down方式講解instructions,從一種解釋語(yǔ)言到最終計(jì)算機(jī)能理解的語(yǔ)言凿将;
2.2 Operations of the Computer Hardware
首先舉一個(gè)RISC-V的匯編例子:
add a,b,c
此指令意為將變量b,c相加后存入變量a中校套;
從而引入一個(gè)重要原則:每RISC-V指令只執(zhí)行一個(gè)操作(加,減等)牧抵,每個(gè)指令最多3個(gè)變量笛匙;
This notation is rigid in that each RISC-V arithmetic instruction performs only one operation and must always have exactly three variables;
e.g. a=b+c+d+e;
add a,b,c // The sum of b and c is placed in a
add a,a,d // The sum of b and c and d is now in a
add a,a,e // The sum of b,c,d and e is now in a
雙斜杠之后為注釋部分;
本節(jié)列出兩個(gè)表格犀变,一個(gè)為operands妹孙,主要我寄存器和memory的表示;
第二個(gè)表為匯編指令表获枝;
最后引入一個(gè)重要原則:
Design Principle 1: Simplicity favors regularity
2.3 Operands of the Computer Hardware
the operands of arithmetic instructions are restricted: they must be from a limited number of special locations built directly in hardware called registers.
算術(shù)運(yùn)算指令的operands只能來(lái)自寄存器蠢正;
RISC-V的寄存器大小為64bits,在本書(shū)中64bits稱(chēng)為雙字省店,一個(gè)字為32bits嚣崭;
registers與普通編程語(yǔ)言的變量相比的一個(gè)重要區(qū)別就是:它只有有限個(gè)數(shù),32個(gè)懦傍;
從而引入第二個(gè)重要原則:
Design Principle 2: Smaller is faster
原因?yàn)樘嘁鹩布ぶ返葧r(shí)間延長(zhǎng)雹舀,從而降低性能;
registers的表示方法為"X"+數(shù)字谎脯;還有其他一些特殊寄存器后面會(huì)接著介紹葱跋;
下面舉上節(jié)例子,將變量指定register源梭,然后替換上節(jié)的變量表示娱俺;
Memory Operands
首先,register很有限废麻,如果程序需要大的數(shù)組或structure荠卷,則需要存儲(chǔ)到memory中:
Memory的特點(diǎn)就是大,訪問(wèn)速度慢烛愧;
而arithmetic操作只能操作register油宜,如果對(duì)memory操作掂碱,而需要將memory數(shù)據(jù)先讀到寄存器,對(duì)寄存器操作慎冤,然后將寄存器寫(xiě)入memory疼燥;
從memory到register操作為load;從register到memory操作為store;
如g=h+A[8]
A為100個(gè)doublewords的array;編譯器將g,h編譯為x20和x21, A的base address存儲(chǔ)到x20
ld x9,8(x22) //Temporary reg x9 gets A[8]
add x20,x21,x9 //g=h+A[8]
8(x22)為memory表示法蚁堤,8為offset醉者,x22存儲(chǔ)base address;
Since 8-bit bytes are useful in many programs, virtually all architctures today address individual bytes. 今天所有的architecture還是使用byte尋址披诗,RISC-V也不例外撬即;
所以register和memory的地址都表示為0x8,0x10,0x18,0x20......
每64bits中各個(gè)byte是如何排序的,一般有兩種方式:
- big-endian: 從右到左為0,1,2......
- small-endian:從左到右為0,1,2......
RISC-V采用small_endian格式呈队;在訪問(wèn)byte或者word級(jí)別數(shù)據(jù)時(shí)候要特別注意大小端問(wèn)題剥槐;
如A[12]=h+A[8]
ld x9,64(x22) //Temporary reg x9 gets A[8]
add x9,x21,x9 //Temporary reg xg gets h+A[8]
sd x9,96(x22) //Stores h+A[8] back into A[12]
編譯器會(huì)將頻繁訪問(wèn)的變量存入register,把剩下的存入memory宪摧;
the process of putting less frequently used variables into memory is called spilling registers
Constant or Immediate Operands
RISC-V將近一半的指令需要操作立即數(shù)或者說(shuō)常量粒竖;
如將常量4與x22寄存器相加:
程序編譯后將常量4寫(xiě)入memory。
ld x9,AddConstant4(x3) //x9=constant 4
add x22,x22,x9 //x22=x22+x9(where x9=4)
另一種方法為立即數(shù)指令:
addi x22,x22,4 //x22=x22+4
注意常量/立即數(shù)為0也很常用绍刮,比如需要負(fù)數(shù)時(shí)候可以用0減去一個(gè)數(shù)值温圆;而RISC-V中將x0寄存器固定接為0挨摸;
此方法更為高效和常用
2.4 Signed ad Unsigned Numbers
首先介紹計(jì)算機(jī)的進(jìn)制:
- 二進(jìn)制孩革;
- 二進(jìn)制和十進(jìn)制的轉(zhuǎn)換;
- 二進(jìn)制溢出的概念得运;
如何表示負(fù)數(shù)的一些思路膝蜈,后來(lái)引入補(bǔ)碼的概念;
如何從補(bǔ)碼計(jì)算出一個(gè)正熔掺、負(fù)數(shù)的值饱搏;
如補(bǔ)碼為1111的十進(jìn)制計(jì)算
補(bǔ)碼運(yùn)算;
lbu: load byte unsigned
lb: load byte
補(bǔ)碼的缺點(diǎn)
- 取補(bǔ)碼相當(dāng)于取反加1置逻,比較復(fù)雜推沸;
- bit擴(kuò)展麻煩,如從16bits擴(kuò)展到64bit券坞,則需要將最高位復(fù)制48份鬓催,然后copy其他16bit;
2.5 Representing Instructions in the Computer
RISC-V指令Fields介紹
將匯編語(yǔ)言一步步轉(zhuǎn)為二進(jìn)制指令
2.6 Logical Operations
介紹左移恨锚,右移宇驾,AND,OR猴伶,XOR NOT邏輯操作课舍;
2.7 Instructions for Making Decisions
主要講解下面兩個(gè)指令:
beq rs1,rs2,L1
- branch if equal
bne rs1,rs2,L1
- branch if not equal
Compiling if-then-else into conditional Branches
f,g,h,i,j對(duì)應(yīng)x19-x23
f g h i j x19 x20 x21 x22 x23 如下C語(yǔ)言如何編譯為RISC-V格式
if(i==j) f=g+h; else f=g-h
bne x22,x23,Else add x19,x20,x21 ELSE:sub x19,x20,x21 Exit:
Loops
C語(yǔ)言loops舉例
while(save[i]==k) i+=1;
- i :x22
- k:x24
- base of array save: x25
Loop: slli x10,x22,3 //Temp reg x10=i*8 add x10,x10,x25 //x10=address of save[i] ld x9,0(x10) //Temp reg x9=save[i] bne x9,x24,Exit //go to Exit if save[i]!=k addi x22,x22,1 //i=i+1 beq x0,x0,Loop //to to Loop Exit:
介紹一個(gè)流行詞:basic block;
a basic block is a sequence of instructions without branches, except possibly at the end, and without branch trgets or branch labels, eccept possibly at the beginning.
one of the first early phases of compilation is breaking the program into basic blocks;
blt和bge將寄存器的值當(dāng)做補(bǔ)碼形式比較塌西;
bltu和bgeu將寄存器的值當(dāng)做unsigned比較;
Bounds Check Shortcut
如何用unsigned方法比較signed寄存器筝尾;
branch to IndexOutOfBounds if x20>=x11 or if x20 if negative;
bgeu x20,x11,IndexOutOfBounds
Case/Switch Statement
branch address table /branch table
2.8 Supporting Procedures in Computer Hardware
A procedure or function is one tool programmers use to structure programs, both to make them easier to understand and to allow code to be reused.
procedure和其他program部分的接口是parameter
In the execution of a procedure, the program must follow 6 steps:
- Put parameters in a place where the prcedure can access them.
- Transfer control to the procedure.
- Acquire the storage resource needed for the procedure.
- Perform the desired task.
- Put the result value in a place whre the calling program can access it.
- Return control to the point of origin, since a procedure can be called from several points in a program.
由于寄存器訪問(wèn)快捡需,程序需要盡可能利用寄存器;
RISC-V software follows the following convention for procedure calling in allocating its 32 regs;
- x10-x17: 8 parameter registers in which to pass parameters or return values.
- x1: one return address register to return to the point of origin;
為procedure而設(shè)計(jì)的一個(gè)指令:
jal x1,ProcedureAddress //jump to ProcedureAddress and write return address to x1
jal:jump and link instrunction.此指令完成2個(gè)動(dòng)作:
- 跳到ProcedureAddress開(kāi)始執(zhí)行程序筹淫;
- 將ProcedureAddress保存到x1寄存器栖忠;
被執(zhí)行的procedure如何取到x1地址,并繼續(xù)執(zhí)行:
jalr x0,0(x1)
jal的另一種用法:
jal x0,Label //unconditionally brach to Label.
Using More Registers
如果一個(gè)procedure需要用很多的registers贸街,當(dāng)使用完成之后庵寞,寄存器的值需要恢復(fù),如何存儲(chǔ)這些寄存器的值呢薛匪?
從而引入stack的概念捐川,中文翻譯為堆棧,procedure操作完成后將寄存器值pop出來(lái)重新寫(xiě)入寄存器逸尖;
把如下C程序翻譯為assemble古沥;
long long int leaf_example(long long int g,long long int h,
long long int i,long long int j){
long long int f;
f=(g+h)-(i+j);
return f;
}
g/h/i/j對(duì)應(yīng)x10/x11/x12/x13,f對(duì)應(yīng)x20
leaf_example:
addi sp,sp,-24
sd x5,16(sp)
sd x6,8(sp)
sd x20,0(sp)
add x5,x10,x11
add x6,x12,x13
sub x20,x5,x6
addi x10,x20,0
ld x20,0(sp)
ld x6,8(sp)
ld x5,16(sp)
addi sp,sp,24
jaalr x0,0(x1)
RISC-V規(guī)定了19個(gè)temporary registers娇跟,并將其份為2類(lèi):
- x5-x7 and x28-x31:not preserved by callee on a procedure call;
- x8-x9 and x18-x27:saved registers that must be preserved on a procedure call
所以上面匯編中可以省去line3/4/13/14 store和load操作岩齿;
Nested Procedures
procedure that do not call others are called leaf procedures.
如果世界上只有l(wèi)eaf procedures,那么世界就變的非常簡(jiǎn)單苞俘;通常情況下一個(gè)procedure會(huì)調(diào)用其他的procedure盹沈,甚至?xí){(diào)用自己,我們稱(chēng)之為recursive procedure.
那么register如何在no-leaf procedure中使用呢吃谣?
舉例說(shuō)明:
以下C回歸調(diào)用例子為例
long long int fact(long long int n){ if(n<1) return(1) else return (n*fact(n-1)) }
parameter n存在x10中
FACT: addi sp,sp,-16 //adjust stack for 2 items sd x1,8(sp) //save the return address sd x10,0(sp) //save the argument n addi x5,x10,-1 //x5=n-1 bge x5,x0,L1 //if(n-1)>=0,go to L1 addi x10,x0,1 //return 1 addi sp,sp,16 //pop 2 items off stack jalr x0,0(x1) //return to caller L1: addi x10,x10,-1 //n>=1: argument gets(n-1) jal x1,FACT //call fact with(n-1) addi x6,x10,0 //return from jal:move result of fact(n-1) to x6 ld x10,0(sp) //restore argument n ld x1,8(sp) //restore the return address addi sp,sp,16 //adjust stack point to pop 2 items mul x10,x10,x6 //return n*fact(n-1) jalr x0,0(x1) //return to the caller
注意jal乞封、jalr,ld 等指令指揮這指令跳來(lái)跳去岗憋,并且將stack的參數(shù)一步步的推出肃晚;
C語(yǔ)言變量可以按照2類(lèi)分:
- type
- integer
- character
- storage class
- automatic
- static
automatic對(duì)于一個(gè)procedure而言是local的,即程序調(diào)用完成后自動(dòng)消失仔戈;而static則貫穿始終关串;一般用static標(biāo)識(shí)聲明;
總結(jié)一下那些需要preserved监徘,而哪些又不需要:
Preserved | Not Preserved |
---|---|
saved registers:x8-x9,x18-x19 | Temprary registers:x5-x7,x28-x31 |
Stack pointer register:x2(sp) | Argument/result register:x10-x17 |
Frame pointer:x8(fp) | |
Return address:x1(ra) | |
Stack above the stack pointer | Stack below the stack pointer |
Allocating Space for New Data on the Stack
Stack被稱(chēng)為棧晋修,存儲(chǔ)程序調(diào)用的automatic型的local變量;
The segment of the stack containing a procdure's saved registers and local variables is called a procedure frame or activation record.
一個(gè)procedure通常使用一個(gè)stack segment耐量,而這個(gè)stack segment就成為procedure frame/activation record飞蚓。
一些RISC-V編譯器使用FP(frame pointer)指向procedure frame的第一個(gè)doubleword。使用FP的好處是可以很方面的定位local parameter的位置廊蜒,方便調(diào)試與定位趴拧;
A frame pointer offers a stable base register within a procedure for local memory-references.
Stack一般是從高地址往低地址遞減的溅漾;
Allocating Space for New Data on the Heap
Heap被稱(chēng)為堆,存儲(chǔ)程序調(diào)用的static型變量著榴;
Heap是從低地址向高地址增加的添履;堆的第一個(gè)地址是保留的reserved,接著是text segement(the home of the RISC-V machine code)脑又,再往上就是static data segment
C語(yǔ)言中用malloc用free來(lái)分配和釋放heap空間暮胧;
C語(yǔ)言這種手動(dòng)分配和釋放空間的機(jī)制帶來(lái)了很多bug,相比JAVA則不會(huì)问麸;
最后用一個(gè)表格來(lái)說(shuō)明RISC-V寄存器的約定:
Name | Register number | Usage | Presrved on call |
---|---|---|---|
x0 | 0 | The constant value 0 | n.a |
x1(ra) | 1 | Return address(link register) | yes |
x2(sp) | 2 | Stack pointer | yes |
x3(gp) | 3 | Global pointer | yes |
x4(tp) | 4 | Thread pointer | yes |
x5-x7 | 5-7 | Temporaries | no |
x8-x9 | 8-9 | Saved | yes |
x10-x17 | 10-17 | Arguments/results | no |
x18-x27 | 18-27 | Saved | yes |
x28-x31 | 28-31 | Temporaries | no |
2.9 Communicating with People
計(jì)算機(jī)如何顯示呢往衷,通常使用8bit的ASCII碼;
為了方便charactor操作严卖,RISC-V提供了2兩個(gè)instructions
lbu x12,0(x10) //Read byte from source
sb x12,0(x11) //Write byte to destination
lbu:load byte unsigned,將byte load到目的寄存器的最右端席舍;
sb:store byte,將最右側(cè)的byte存入memory;
多個(gè)charactors會(huì)組成string哮笆,那么string如何表示,一般有3種方式:
- string第一個(gè)地址保留来颤,給出string長(zhǎng)度;
- 用一個(gè)伴隨變量來(lái)表示string長(zhǎng)度稠肘;
- 最后用一個(gè)特殊字符表示string結(jié)束福铅;
C語(yǔ)言用了3,而JAVA用了1.
而JAVA使用Unicode characters项阴,所以它使用16bit來(lái)表示一個(gè)charactor滑黔;
于是RISC-V需要使用load/store halfword指令來(lái)load和store一個(gè)charactor;
lhu x19,0(x10) //Read halfword from source
sh x10,0(x11) //Write halfword to destination
2.10 RISC-V Addressing for Wide Immediates and Addresses
RISC-V的指令為32bits鲁冯,這樣可以簡(jiǎn)化硬件的設(shè)計(jì)拷沸,但是對(duì)長(zhǎng)度過(guò)長(zhǎng)常量則需要其他的手段解決色查;
這節(jié)就重點(diǎn)講解branch和jump中的手段薯演;
接著講解RISC-V 指令編碼表;
2.11 Parallelism and Instructions: Synchronization
data race:比如兩個(gè)程序訪問(wèn)同一個(gè)數(shù)據(jù)秧了,一個(gè)程序沒(méi)有完成寫(xiě)跨扮,而另一個(gè)已經(jīng)開(kāi)始讀,這樣就叫data race验毡。
In computing, synchronization mechanisms are typically built with user-level software routines that rely on hardwre-supplied synchronization instructions.
軟件routines之間通過(guò)硬件提供的同步指令來(lái)構(gòu)造同步機(jī)制衡创;
we focus on the implementation of lock and unlock synchronization.
對(duì)于單核processor而言,unlock/lock很容易執(zhí)行晶通;
The critical ability we require to implement synchronization in a multiprocessor is a set of hardware primitives with the ability to atomically read and modify a memory location;
以atomic exchange/atomic swap為例璃氢,它主要完成兩個(gè)存儲(chǔ)之間的數(shù)值交換;
Lock=0表示unlock狮辽,Lock=1則表示lock一也;
從而引出lr.d(load reserved double word)和sc.d(store conditional doubleword)
addi x12,x0,1 //copy locked value
again: lr.d x10,(x20) //load-reserved to read lock
bne x10,x0,again //check if it is 0 yet
sc.d x11,x12,(x20) //attempt to store new value
bne x11,x0,again //branch if store fails
sd x0,0(x20) //free lock by writing 0
2.12 Translating and Starting a Program
本節(jié)主要講從磁盤(pán)一段C語(yǔ)言文件到計(jì)算機(jī)可以執(zhí)行的程序之間的4個(gè)步驟:
Compiler
Compiler將高級(jí)語(yǔ)言編譯為通用的匯編(通用的匯編是因?yàn)闆](méi)有目的器件)
Assembler
Assembler就是器件對(duì)應(yīng)的編譯器巢寡,將通用匯編編譯成對(duì)應(yīng)器件支持的匯編;
Linker
如何做到修改了一行代碼而不用重新開(kāi)始編譯椰苟;
Dynamically Linked Libraries
傳統(tǒng)的link libraries的方法是static方法抑月,有一些缺點(diǎn):
- The library routines become part of the executable code.木已成舟,再難修改舆蝴;
- 所有的routines都要被load谦絮,管你要不要;
這些缺點(diǎn)導(dǎo)致了DLL(Dynamically linked libraries)的出現(xiàn)洁仗;
DLL,where the library routines are not linked an loaded until the program is run.
Starting a JAVA Program
初步認(rèn)識(shí)一些JVM和JIT吧
各個(gè)平臺(tái)有不同的JVM层皱,如windows JVM,UNIX JVM等赠潦,分別負(fù)責(zé)將JVM虛擬機(jī)最終解釋為Windows和Unix instruction奶甘;
2.13 A C Sort Example to Put it All Together
舉連個(gè)C語(yǔ)言程序的例子來(lái)解釋當(dāng)前章的內(nèi)容
- swap 函數(shù)
- sort函數(shù),調(diào)用swap
2.14 Arrays versus Pointers
this section shows C and RISC-V assembly versions of two procedures to clear a sequence of doublewords in memory.
2.15 Advanced Material: Compiling C and Interpreting Java
This section gives a bref overview of how the C compiler works and how Java is executed.
在線(xiàn)閱讀內(nèi)容祭椰;
2.16 Real Stuff: MIPS Instructions
簡(jiǎn)單介紹MIPS指令臭家,和RISC-V極其相似;
2.17 Real Stuff:x86 Instructions
將近40年的發(fā)展方淤,每個(gè)milestone加入一系列的指令钉赁,大概有13個(gè)milestones,作者將其比喻為金色的手銬携茂,其龐雜性限制了它的發(fā)展你踩,目前每年有350M x86 chips,而卻又14billion ARM chips讳苦。從此可以看到起差距之大带膜;
接著介紹一下x86的寄存器結(jié)構(gòu)和Data Addressing Modes,接著介紹x86 integer operations和Instruction Encoding鸳谜。
總結(jié)一下就是x86現(xiàn)在不流行了膝藕;
2.18 Real Stuff: The Rest of the RISC-V Instruction Set
RISC-V指令集分為base和extension
這個(gè)section簡(jiǎn)單介紹base之外的其他指令,以及介紹一下extension的分類(lèi)咐扭;
2.19 Fallacies and Pitfalls
fallacy:
- More powerful instructions mean higher performance
- Write in assembly language to obtain the highest performance
- The importance of commercial binary compatibility means successful instruction sets don't change
Pitfall:
- Forgetting that sequential word or double word addersses in machines with byte addrssing do not differ by one.
- Using a pointer to an automatic variable outside its defining procedure.