很多人在寫c語言程序的時(shí)候磅甩,經(jīng)常分不清兩者的區(qū)別,到底該定義signed char還是unsigned char呢,從而可能會(huì)引起一些潛在風(fēng)險(xiǎn)。
先從一個(gè)例子開始
#include <stdio.h>
void foo(signed char sc, unsigned char uc) {
if (sc == '\x85') printf("%s\n", "equal"); else printf("%s\n", "not equal"); /* result: equal */
if (uc == '\x85') printf("%s\n", "equal"); else printf("%s\n", "not equal"); /* result: not equal */
if (sc == 0x85) printf("%s\n", "equal"); else printf("%s\n", "not equal"); /* result: not equal */
if (uc == 0x85) printf("%s\n", "equal"); else printf("%s\n", "not equal"); /* result: equal */
}
int main(int argc, char * argv[]) {
signed char sc = '\x85';
unsigned char uc = '\x85';
foo(uc, sc);
return 0;
}
運(yùn)行結(jié)果是
equal
not equal
not equal
equal
執(zhí)行環(huán)境是在Darwin gcc-4.2
$ uname -a
Darwin localhost 15.6.0 Darwin Kernel Version 15.6.0: Tue Apr 11 16:00:51 PDT 2017; root:xnu-3248.60.11.5.3~1/RELEASE_X86_64 x86_64
$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
查看其中比較語句的匯編代碼:
if (sc == '\x85'):
movsbl -1(%rbp), %esi
cmpl $-123, %esi
jne LBB0_2
if (uc == '\x85'):
movzbl -2(%rbp), %eax
cmpl $-123, %eax
jne LBB0_5
if (sc == 0x85):
movsbl -1(%rbp), %eax
cmpl $133, %eax
jne LBB0_8
if (uc == 0x85):
movzbl -2(%rbp), %eax
cmpl $133, %eax
jne LBB0_11
先解釋上述使用到的幾條指令
movzbl:
Move Zero-Extended Byte to Long楚昭。
The low 8 bits of the destination are replaced by the source operand. the top 24 bits are set to 0.
movsbl:
Move Sign-Extend Byte to Long。
cmpl:
Logical comparison meaning it does not look at the sign and treats the operands as unsigned integers.
If arg1 is an immediate value it will be sign extended to the length of arg2拍顷。
下面來分析每條比較語句
- if (sc == '\x85'):
指令 | 描述 |
---|---|
movsbl -1(%rbp), %esi | 把參數(shù)sc的值move到寄存器esi, 做符號(hào)擴(kuò)展成0xffffff85 |
cmpl $-123, %esi | 這個(gè)指令有兩部分操作: 1. 把immediate value($-123)符合擴(kuò)展成0xffffff85 抚太,2. 做無符號(hào)比較 |
jne LBB0_2 | 比較結(jié)果是相同 |
- if (uc == '\x85'):
指令 | 描述 |
---|---|
movzbl -2(%rbp), %eax | 把參數(shù)sc的值move到寄存器esi, 做0擴(kuò)展成0x00000085 |
cmpl $-123, %esi | 這個(gè)指令有兩部分操作: 1. 把immediate value($-123)符合擴(kuò)展成0xffffff85 ,2. 做無符號(hào)比較 |
jne LBB0_2 | 比較結(jié)果是不相同 |
- if (sc == 0x85)
指令 | 描述 |
---|---|
movsbl -1(%rbp), %eax | 把參數(shù)sc的值move到寄存器esi, 做符號(hào)擴(kuò)展成0xffffff85 |
cmpl $133, %esi | 這個(gè)指令有兩部分操作: 1. 把immediate value($133)符合擴(kuò)展成0x00000085 昔案,2. 做無符號(hào)比較 |
jne LBB0_2 | 比較結(jié)果是不相同 |
- if (uc == 0x85)
指令 | 描述 |
---|---|
movzbl -1(%rbp), %eax | 把參數(shù)sc的值move到寄存器esi, 做0擴(kuò)展成0x00000085 |
cmpl $133, %esi | 這個(gè)指令有兩部分操作: 1. 把immediate value($133)符合擴(kuò)展成0x00000085 尿贫,2. 做無符號(hào)比較 |
jne LBB0_2 | 比較結(jié)果是相同 |
還有一個(gè)地方需要注意,為什么都是0x85的“常量”踏揣,在匯編代碼里有些地方使用$133庆亡,又有些地方使用$-123呢?這個(gè)和x64指令系統(tǒng)沒有什么關(guān)系呼伸,這是gcc編譯器處理的結(jié)果身冀。
因?yàn)樵赾語言層面, 雖然'\x85'和0x85兩者很多場(chǎng)合可以通用,但還是有區(qū)別的括享,'\x85'是一個(gè)字符,缺省情況下字符類型是帶符號(hào)的珍促,所以盡管表示成16進(jìn)制都是0x85铃辖,但是表述成10進(jìn)制的意義是不一樣的,字符型的'\x85' = -123猪叙,而不是133(8*16+5)娇斩;而0x85是一個(gè)數(shù)字常量不存在符號(hào)擴(kuò)展的問題仁卷,其值表述成10進(jìn)制就是133。
結(jié)論
字符類型缺省是signed char犬第,當(dāng)把字符類型和其他類型數(shù)據(jù)做比較的時(shí)候锦积,要充分考慮到擴(kuò)展的問題,是符號(hào)數(shù)擴(kuò)展還是無符號(hào)數(shù)擴(kuò)展歉嗓。
Darwin環(huán)境下的gcc編譯器會(huì)報(bào)告兩個(gè)warning
$ gcc t.c
test.c:5:12: warning: comparison of constant -123 with expression of type 'unsigned char' is always false
[-Wtautological-constant-out-of-range-compare]
if (uc == '\x85') printf("%s\n", "equal"); else printf("%s\n", "not equal"); /* not equal */
~~ ^ ~~~~~~
test.c:7:12: warning: comparison of constant 133 with expression of type 'signed char' is always false
[-Wtautological-constant-out-of-range-compare]
if (sc == 0x85) printf("%s\n", "equal"); else printf("%s\n", "not equal"); /* not equal */
~~ ^ ~~~~
2 warnings generated.
在Linux x64 gcc環(huán)境下更神奇丰介,直接丟棄了warning信息,在關(guān)閉優(yōu)化開關(guān)(gcc -S -O0)的條件下鉴分,生成匯編代碼如下:
.section .rodata
.LC0:
.string "not equal"
.text
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, %edx
movl %esi, %eax
movb %dl, -4(%rbp)
movb %al, -8(%rbp)
nop
movl $.LC0, %edi
call puts
nop
movl $.LC0, %edi
call puts
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
可以看到在函數(shù)foo里面哮幢,比較指令已經(jīng)被丟棄了,甚至字符串"equal"也被丟棄了志珍,函數(shù)直接就打印出兩條"not equal"($.LC0)橙垢,不管輸入?yún)?shù)的值是什么。這相當(dāng)于foo函數(shù)等價(jià)于:
void foo(signed char sc, unsigned char uc) {
printf("%s\n", "not equal"); /* not equal */
printf("%s\n", "not equal"); /* not equal */
}
換一種寫法伦糯,函數(shù)foo又將會(huì)輸出什么結(jié)果?
void foo(signed char sc, unsigned char uc) {
if (sc == -123) printf("%s\n", "equal"); else printf("%s\n", "not equal"); /* result: equal */
if (uc == -123) printf("%s\n", "equal"); else printf("%s\n", "not equal"); /* result: not equal */
if (sc == 133) printf("%s\n", "equal"); else printf("%s\n", "not equal"); /* result: not equal */
if (uc == 133) printf("%s\n", "equal"); else printf("%s\n", "not equal"); /* result: equal */
}