今天講講arm匯編中除法的底層實(shí)現(xiàn)。匯編代碼本身比較長(zhǎng)了饼酿,如需參考請(qǐng)直接拉到文末运准。
下面我直接把a(bǔ)rm的除法算法的匯編代碼轉(zhuǎn)譯成C語(yǔ)言的代碼貼出來(lái),并進(jìn)行解析碰凶。
因?yàn)槠邢弈喊牛栽诖酥唤馕鰺o(wú)符號(hào)整型的除法運(yùn)算,關(guān)于無(wú)符號(hào)除法和有符號(hào)除法的區(qū)別請(qǐng)參考上一篇推送欲低。
代碼較長(zhǎng)如下辕宏,電腦端看效果更佳,如無(wú)耐心請(qǐng)直接拉下去看講解即可:
#include<stdio.h>
unsigned int count_leading_zeros(unsigned int num)
{
unsigned int cnt = 0;
while(!(num & 0x80000000) && cnt < 32){
cnt++;
num <<= 1;
}
return cnt;
}
unsigned int div_unsigned(unsigned int dividend, unsigned int divisor)
{
unsigned int answer = 0;
int cc;
unsigned int divisor_lz = 0, dividend_lz = 0;
if (divisor == 1){
return dividend;
}else if (divisor < 1){
return -1;
}
if (divisor == dividend){
return 1;
}else if (dividend < divisor){
return 0;
}
if ((divisor & (divisor - 1)) == 0){
return dividend >> (31 - count_leading_zeros(divisor));
}
divisor_lz = count_leading_zeros(divisor);
dividend_lz = count_leading_zeros(dividend);
printf("dividend[0x%x], dividend_lz[%d], divisor[0x%x], divisor_lz[%d]\n", dividend, dividend_lz, divisor, divisor_lz);
cc = divisor_lz - dividend_lz;
while(cc >= 0){
answer <<= 1;
if (dividend >= (divisor << cc)){
answer += 1;
dividend -= (divisor << cc);
}
cc--;
}
return answer;
}
main(){
unsigned int a = 0x80000000 / 3;
unsigned int b = div_unsigned(0x80000000, 3);
printf("[0x%x][0x%x]",a, b);
}
2次冪和移位運(yùn)算
在以上代碼中我們終于看到了移位運(yùn)算對(duì)除法運(yùn)算的優(yōu)化:
當(dāng)除數(shù)是2的N次冪時(shí)砾莱,可以直接對(duì)被除數(shù)做右移運(yùn)算來(lái)代替除法瑞筐, 比如除數(shù)是2即(2的1次冪),此時(shí)只需要對(duì)被除數(shù)做一次右移即可腊瑟,同理如果除數(shù)是8則對(duì)被除數(shù)做三次右移聚假。
而判斷一個(gè)數(shù)字是不是2的N次冪只需要一行代碼:
if ((divisor & (divisor - 1)) == 0){
這一行代碼也幾乎就是leetcode的第231題2的冪的答案:
2^x | n | n - 1 | n & (n - 1) |
---|---|---|---|
2^0 | 0001 | 0000 | (0001) & (0000) == 0 |
2^1 | 0010 | 0001 | (0010) & (0001) == 0 |
2^2 | 0100 | 0011 | (0100) & (0011) == 0 |
2^3 | 1000 | 0111 | (1000) & (0111) == 0 |
如有疑問(wèn)請(qǐng)繼續(xù)參考leetcode的題解:https://leetcode-cn.com/problems/power-of-two/solution/power-of-two-er-jin-zhi-ji-jian-by-jyd/
而計(jì)算2的N次冪中的N,也只需要這一句即可:
(31 - count_leading_zeros(divisor))
count_leading_zeros即為一個(gè)32bit的數(shù)字以二進(jìn)制呈現(xiàn)的時(shí)候闰非,從高位向低位數(shù)開(kāi)始數(shù)有連續(xù)多少個(gè)0的數(shù)量膘格。
比如數(shù)字2的二進(jìn)制是: 0000 0000 0000 0000 0000 0000 0000 0010
在第一個(gè)bit1出現(xiàn)之前有30個(gè)0。
判斷是否是2的N次冪财松,并且計(jì)算出N的大小并進(jìn)行右移也只需要以下三行代碼瘪贱。
if ((divisor & (divisor - 1)) == 0){
return dividend >> (31 - count_leading_zeros(divisor));
}
為什么要使用count_leading_zeros這種方法呢,雖然我在上面的代碼中定義了函數(shù)count_leading_zeros辆毡,但是在arm匯編中只需要一條指令clz即可,計(jì)算2的N次冪的N加上右移也只需要三條指令即可菜秦,非常高效:
clz r2, r1 //計(jì)算leading zeros的數(shù)量
rsb r2, r2, #31 //31 - count_leading_zeros(divisor)
lsr.w r0, r0, r2 // 進(jìn)行右移
二進(jìn)制的除法解析
那么更多情況下,除數(shù)也并不是2的N次冪胚迫。如果除數(shù)是3喷户,那么還是要做一下正規(guī)的除法了。
我做了一張圖來(lái)對(duì)比8/3的十進(jìn)制和二進(jìn)制的除法访锻。
在二進(jìn)制時(shí)褪尝,任何一個(gè)bit不可能大于1闹获,所以當(dāng)兩個(gè)數(shù)字的leading zeros相同時(shí),被除數(shù)不可能會(huì)整除除數(shù)超過(guò)或者等于兩次河哑。也就是說(shuō)leading zeros相同時(shí)避诽,被除數(shù)要么能整除除數(shù)一次,要么是0次璃谨。
二進(jìn)制運(yùn)算除法的時(shí)候沙庐,首先會(huì)對(duì)除數(shù)做左移操作,讓除數(shù)和被除數(shù)進(jìn)行“對(duì)齊”(即leading zeros數(shù)量相同)佳吞,如果此時(shí)的被除數(shù)大于等于此時(shí)(左移后的)除數(shù)拱雏,那么在相應(yīng)的答案位上置一,否則置0底扳。然后對(duì)(左移后的)除數(shù)?做右移一位操作再繼續(xù)和被除數(shù)做比較铸抑,直到除數(shù)恢復(fù)成原來(lái)的初始值(這時(shí)候會(huì)作最后一次運(yùn)算)。如下代碼所示:
cc = divisor_lz - dividend_lz;
while(cc >= 0){
answer <<= 1;
if (dividend >= (divisor << cc)){
answer += 1;
dividend -= (divisor << cc);
}
cc--;
}
所以在二進(jìn)制整型數(shù)字的除法世界中衷模,只需要減法和移位操作就能夠滿足除法運(yùn)算的需求鹊汛。最后我才發(fā)現(xiàn),二進(jìn)制的除法原本就是這么簡(jiǎn)單阱冶,比十進(jìn)制的除法還要簡(jiǎn)單刁憋。
本文完,以下為參考資料木蹬。
arm的指令集查文檔:
http://users.ece.utexas.edu/~valvano/Volume1/QuickReferenceCard.pdf
https://iitd-plos.github.io/col718/ref/arm-instructionset.pdf
div無(wú)符號(hào)整形的除法匯編如下:
00010490 <__udivsi3>:
10490: 1e4a subs r2, r1, #1
10492: bf08 it eq
10494: 4770 bxeq lr
10496: f0c0 8124 bcc.w 106e2 <__udivsi3+0x252>
1049a: 4288 cmp r0, r1
1049c: f240 8116 bls.w 106cc <__udivsi3+0x23c>
104a0: 4211 tst r1, r2
104a2: f000 8117 beq.w 106d4 <__udivsi3+0x244>
104a6: fab0 f380 clz r3, r0
104aa: fab1 f281 clz r2, r1
104ae: eba2 0303 sub.w r3, r2, r3
104b2: f1c3 031f rsb r3, r3, #31
104b6: a204 add r2, pc, #16 ; (adr r2, 104c8 <__udivsi3+0x38>)
104b8: eb02 1303 add.w r3, r2, r3, lsl #4
104bc: f04f 0200 mov.w r2, #0
104c0: 469f mov pc, r3
104c2: bf00 nop
104c4: f3af 8000 nop.w
104c8: ebb0 7fc1 cmp.w r0, r1, lsl #31
104cc: bf00 nop
104ce: eb42 0202 adc.w r2, r2, r2
104d2: bf28 it cs
104d4: eba0 70c1 subcs.w r0, r0, r1, lsl #31
104d8: ebb0 7f81 cmp.w r0, r1, lsl #30
104dc: bf00 nop
104de: eb42 0202 adc.w r2, r2, r2
104e2: bf28 it cs
104e4: eba0 7081 subcs.w r0, r0, r1, lsl #30
104e8: ebb0 7f41 cmp.w r0, r1, lsl #29
104ec: bf00 nop
104ee: eb42 0202 adc.w r2, r2, r2
104f2: bf28 it cs
104f4: eba0 7041 subcs.w r0, r0, r1, lsl #29
104f8: ebb0 7f01 cmp.w r0, r1, lsl #28
104fc: bf00 nop
104fe: eb42 0202 adc.w r2, r2, r2
10502: bf28 it cs
10504: eba0 7001 subcs.w r0, r0, r1, lsl #28
10508: ebb0 6fc1 cmp.w r0, r1, lsl #27
1050c: bf00 nop
1050e: eb42 0202 adc.w r2, r2, r2
10512: bf28 it cs
10514: eba0 60c1 subcs.w r0, r0, r1, lsl #27
10518: ebb0 6f81 cmp.w r0, r1, lsl #26
1051c: bf00 nop
1051e: eb42 0202 adc.w r2, r2, r2
10522: bf28 it cs
10524: eba0 6081 subcs.w r0, r0, r1, lsl #26
10528: ebb0 6f41 cmp.w r0, r1, lsl #25
1052c: bf00 nop
1052e: eb42 0202 adc.w r2, r2, r2
10532: bf28 it cs
10534: eba0 6041 subcs.w r0, r0, r1, lsl #25
10538: ebb0 6f01 cmp.w r0, r1, lsl #24
1053c: bf00 nop
1053e: eb42 0202 adc.w r2, r2, r2
10542: bf28 it cs
10544: eba0 6001 subcs.w r0, r0, r1, lsl #24
10548: ebb0 5fc1 cmp.w r0, r1, lsl #23
1054c: bf00 nop
1054e: eb42 0202 adc.w r2, r2, r2
10552: bf28 it cs
10554: eba0 50c1 subcs.w r0, r0, r1, lsl #23
10558: ebb0 5f81 cmp.w r0, r1, lsl #22
1055c: bf00 nop
1055e: eb42 0202 adc.w r2, r2, r2
10562: bf28 it cs
10564: eba0 5081 subcs.w r0, r0, r1, lsl #22
10568: ebb0 5f41 cmp.w r0, r1, lsl #21
1056c: bf00 nop
1056e: eb42 0202 adc.w r2, r2, r2
10572: bf28 it cs
10574: eba0 5041 subcs.w r0, r0, r1, lsl #21
10578: ebb0 5f01 cmp.w r0, r1, lsl #20
1057c: bf00 nop
1057e: eb42 0202 adc.w r2, r2, r2
10582: bf28 it cs
10584: eba0 5001 subcs.w r0, r0, r1, lsl #20
10588: ebb0 4fc1 cmp.w r0, r1, lsl #19
1058c: bf00 nop
1058e: eb42 0202 adc.w r2, r2, r2
10592: bf28 it cs
10594: eba0 40c1 subcs.w r0, r0, r1, lsl #19
10598: ebb0 4f81 cmp.w r0, r1, lsl #18
1059c: bf00 nop
1059e: eb42 0202 adc.w r2, r2, r2
105a2: bf28 it cs
105a4: eba0 4081 subcs.w r0, r0, r1, lsl #18
105a8: ebb0 4f41 cmp.w r0, r1, lsl #17
105ac: bf00 nop
105ae: eb42 0202 adc.w r2, r2, r2
105b2: bf28 it cs
105b4: eba0 4041 subcs.w r0, r0, r1, lsl #17
105b8: ebb0 4f01 cmp.w r0, r1, lsl #16
105bc: bf00 nop
105be: eb42 0202 adc.w r2, r2, r2
105c2: bf28 it cs
105c4: eba0 4001 subcs.w r0, r0, r1, lsl #16
105c8: ebb0 3fc1 cmp.w r0, r1, lsl #15
105cc: bf00 nop
105ce: eb42 0202 adc.w r2, r2, r2
105d2: bf28 it cs
105d4: eba0 30c1 subcs.w r0, r0, r1, lsl #15
105d8: ebb0 3f81 cmp.w r0, r1, lsl #14
105dc: bf00 nop
105de: eb42 0202 adc.w r2, r2, r2
105e2: bf28 it cs
105e4: eba0 3081 subcs.w r0, r0, r1, lsl #14
105e8: ebb0 3f41 cmp.w r0, r1, lsl #13
105ec: bf00 nop
105ee: eb42 0202 adc.w r2, r2, r2
105f2: bf28 it cs
105f4: eba0 3041 subcs.w r0, r0, r1, lsl #13
105f8: ebb0 3f01 cmp.w r0, r1, lsl #12
105fc: bf00 nop
105fe: eb42 0202 adc.w r2, r2, r2
10602: bf28 it cs
10604: eba0 3001 subcs.w r0, r0, r1, lsl #12
10608: ebb0 2fc1 cmp.w r0, r1, lsl #11
1060c: bf00 nop
1060e: eb42 0202 adc.w r2, r2, r2
10612: bf28 it cs
10614: eba0 20c1 subcs.w r0, r0, r1, lsl #11
10618: ebb0 2f81 cmp.w r0, r1, lsl #10
1061c: bf00 nop
1061e: eb42 0202 adc.w r2, r2, r2
10622: bf28 it cs
10624: eba0 2081 subcs.w r0, r0, r1, lsl #10
10628: ebb0 2f41 cmp.w r0, r1, lsl #9
1062c: bf00 nop
1062e: eb42 0202 adc.w r2, r2, r2
10632: bf28 it cs
10634: eba0 2041 subcs.w r0, r0, r1, lsl #9
10638: ebb0 2f01 cmp.w r0, r1, lsl #8
1063c: bf00 nop
1063e: eb42 0202 adc.w r2, r2, r2
10642: bf28 it cs
10644: eba0 2001 subcs.w r0, r0, r1, lsl #8
10648: ebb0 1fc1 cmp.w r0, r1, lsl #7
1064c: bf00 nop
1064e: eb42 0202 adc.w r2, r2, r2
10652: bf28 it cs
10654: eba0 10c1 subcs.w r0, r0, r1, lsl #7
10658: ebb0 1f81 cmp.w r0, r1, lsl #6
1065c: bf00 nop
1065e: eb42 0202 adc.w r2, r2, r2
10662: bf28 it cs
10664: eba0 1081 subcs.w r0, r0, r1, lsl #6
10668: ebb0 1f41 cmp.w r0, r1, lsl #5
1066c: bf00 nop
1066e: eb42 0202 adc.w r2, r2, r2
10672: bf28 it cs
10674: eba0 1041 subcs.w r0, r0, r1, lsl #5
10678: ebb0 1f01 cmp.w r0, r1, lsl #4
1067c: bf00 nop
1067e: eb42 0202 adc.w r2, r2, r2
10682: bf28 it cs
10684: eba0 1001 subcs.w r0, r0, r1, lsl #4
10688: ebb0 0fc1 cmp.w r0, r1, lsl #3
1068c: bf00 nop
1068e: eb42 0202 adc.w r2, r2, r2
10692: bf28 it cs
10694: eba0 00c1 subcs.w r0, r0, r1, lsl #3
10698: ebb0 0f81 cmp.w r0, r1, lsl #2
1069c: bf00 nop
1069e: eb42 0202 adc.w r2, r2, r2
106a2: bf28 it cs
106a4: eba0 0081 subcs.w r0, r0, r1, lsl #2
106a8: ebb0 0f41 cmp.w r0, r1, lsl #1
106ac: bf00 nop
106ae: eb42 0202 adc.w r2, r2, r2
106b2: bf28 it cs
106b4: eba0 0041 subcs.w r0, r0, r1, lsl #1
106b8: ebb0 0f01 cmp.w r0, r1
106bc: bf00 nop
106be: eb42 0202 adc.w r2, r2, r2
106c2: bf28 it cs
106c4: eba0 0001 subcs.w r0, r0, r1
106c8: 4610 mov r0, r2
106ca: 4770 bx lr
106cc: bf0c ite eq
106ce: 2001 moveq r0, #1
106d0: 2000 movne r0, #0
106d2: 4770 bx lr
106d4: fab1 f281 clz r2, r1
106d8: f1c2 021f rsb r2, r2, #31
106dc: fa20 f002 lsr.w r0, r0, r2
106e0: 4770 bx lr
106e2: b108 cbz r0, 106e8 <__udivsi3+0x258>
106e4: f04f 30ff mov.w r0, #4294967295 ; 0xffffffff
106e8: f000 b966 b.w 109b8 <__aeabi_idiv0>