序言
show character set;
--查看mysql支持的字符集娶靡,部分輸出如下
Character | Describption | Default collation | Maxlen |
---|---|---|---|
ascii | US ASCII | ascii_general_ci | 1 |
big5 | Big5 Traditional Chinese(繁體中文) | big5_chinese_ci | 2 |
binary | Binary pseudo charset | binary | 1 |
gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |
gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 |
show collation;
--查看mysql支持的collation
指定字符集與排序規(guī)則
服務器的字符集與排序規(guī)則
-
character_set_server
的默認值為utf8mb4
show variables like 'character_set_server'; --輸出的Value列為utf8mb4
-
在
my.ini
中可以設置character_set_server
的值:character-set-server=utf8; --在my.ini中添加上述行牧牢,重啟服務,即可設置character_set_server的值
-
在運行過程中姿锭,可以更改
character_set_server
的值:set character_set_server = gbk;
如果未在
create database
語句中指定字符集和排序規(guī)則塔鳍,則會將character_set_server
和collation_server
作為默認值若只指定了字符集沒有指定排序規(guī)則,則使用字符集對應的默認的排序規(guī)則
數(shù)據(jù)庫的字符集與排序規(guī)則
-
可以在
create database
和alter database
時指定字符集和排序規(guī)則create database d character set gbk collate gbk_bin; alter database d character set gb2312 collate gb2312_bin; use d; SELECT @@character_set_database, @@collation_database; --此sql語句用于輸出當前數(shù)據(jù)庫的字符集與排序規(guī)則
-
當前數(shù)據(jù)庫的字符集和排序規(guī)則呻此,可以通過查看
character_set_database
和collation_database
來確定轮纫。若當前沒有默認數(shù)據(jù)庫,則character_set_database
和collation_database
與character_set_server
和collation_server
相同create database d0 character set gbk collate gbk_bin; create database d1 character set utf8 collate utf8_bin; use d0; show variables like 'character_set_database'; --輸出的character_set_database值為gbk use d1; show variables like 'character_set_database'; --輸出的character_set_database值為utf8
-
數(shù)據(jù)庫的字符集和排序規(guī)則會產(chǎn)生以下影響:
create table
未指定字符集和排序規(guī)則時焚鲜,則采用數(shù)據(jù)庫的字符集和排序規(guī)則load data
未指定字符集和排序規(guī)則掌唾,則使用數(shù)據(jù)庫的字符集來解釋文件中的信息-
創(chuàng)建的存儲過程和函數(shù)未指定字符集和排序規(guī)則,則使用數(shù)據(jù)庫的字符集和排序規(guī)則
use d0; --create database d0 character set gbk collate gbk_bin; delimiter \ create procedure p0(in str varchar(111)) begin select charset(str); end\ delimiter ; call p0("szn"); --輸出:gbk
use d1; --database d1 character set utf8 collate utf8_bin; delimiter \ create procedure p1(in str varchar(111)) begin select charset(str); end\ delimiter ; call p1("szn"); --輸出:utf8
use d1; --database d1 character set utf8 collate utf8_bin; delimiter \ create procedure p_set(in str varchar(111) character set gb2312) begin select charset(str); end\ delimiter ; call p_set("szn"); --輸出:gb2312
表的字符集與排序規(guī)則
-
可以在
create table
和alter table
時指定表的字符集和排序規(guī)則create table t0(name varchar(1024)) character set gbk collate gbk_bin; alter table t0 character set utf8;
若未指定表的字符集和排序規(guī)則忿磅,則使用對應數(shù)據(jù)庫的字符集和排序規(guī)則
若表中的列未指定字符集和排序規(guī)則糯彬,則對應的列使用表的字符集和排序規(guī)則
列的字符集與排序規(guī)則
-
在
create table
和alter table
時指定列的字符集和排序規(guī)則create table t2(name varchar(1024) character set gbk collate gbk_bin); alter table t2 modify column name varchar(1024) character set utf8 collate utf8_bin;
使用
alter table
將列的字符集進行更改時,若字符集不兼容贝乎,則可能會丟失數(shù)據(jù)
字符串字面量的字符集與排序規(guī)則
字符串字面量的默認字符集和排序規(guī)則由
character_set_connection
和collation_connection
指定-
字符串字面量可以設定字符集與排序規(guī)則:
show variables like 'character_set_connection'; --輸出值:utf8 set @b = "szn"; select charset(@b); --輸出值:utf8 set @c = _gbk"哈" COLLATE gbk_bin; --字符串字面量前指定編碼方式情连,這種方式稱為介紹器(Character Set Introducers) select charset(@c); --輸出值:gbk
-
字符串字面量指定字符集并不會更改其值,解析器最終仍按照
character_set_connection
進行處理set @c0 = _gbk"哈" COLLATE gbk_bin; set @c1 = _utf8"哈" COLLATE utf8_bin; select hex(@c0), hex(@c1); --兩個輸出均為"哈"的utf8編碼 E59388
Character Set Introducers
字符串字面量览效,十六進制字面量却舀,位字面量都有一個可選的字符集和排序規(guī)則設置,這稱為介紹器
-
介紹器告訴解析器后面字符串使用的字符集锤灿,但是不會改變字符串的值
set @s0 = _utf8 0xE59388; set @s1 = _gbk 0xB9FE; set @s2 = _gbk X'B9FE'; set @s3 = _utf8 b'111001011001001110001000'; set @s4 = _utf8 0b111001011001001110001000; select @s0, @s1, @s2, @s3, @s4; --輸出:哈挽拔,哈,哈但校,哈, 哈
SET @v1 = X'000D' | X'0BC0'; --@V1是bigint類型 SET @v2 = _binary X'000D' | X'0BC0'; --@v2是字符串 SELECT HEX(@v1), HEX(@v2); --輸出:BCD, 0BCD select @v1 + 1, @v2 + 1, "0BCD" + "1";. --輸出:3022, 1, 1
連接的字符集與排序規(guī)則
每個客戶端的連接都有特定且可更改的字符集和排序規(guī)則
-
客戶端發(fā)起
sql
查詢螃诅,至結果返回到客戶端:-
character_set_client
表明客戶端發(fā)送過來的sql
語句的字符集 - 服務器將接收到的
sql
語句的字符集轉(zhuǎn)換為character_set_connection
- 服務器將
sql
的執(zhí)行結果的字符集轉(zhuǎn)換為character_set_results
-
-
character_set_client
不允許一些字符集設置:ucs2 utf16 utf16le utf32
-
set character set
會同時將character_set_client
和character_set_results
設定為給定值,并且將character_set_connection
設置為character_set_database
的值set character_set_client = ascii; set character_set_results = big5; set character_set_database = gb2312; set character_set_connection = gbk; set character set latin2; show variables like 'character%';
Variable_name Value character_set_client latin2 character_set_connection gb2312 character_set_database gb2312 character_set_results latin2 my.ini
中添加default-character-set=latin2
,重啟服務术裸,將會同時更改character_set_client
,character_set_connection
,character_set_results
為latin2
可以在登錄
mysql
時:mysql -u root -p --default-character-set=latin1
set names gb2312; --等效于下面三句代碼 set character_set_client = gb2312; set character_set_results = gb2312; set character_set_connection = gb2312;
Unicode支持
? The Unicode Standard includes characters from the Basic Multilingual(使用多種語言的) Plane (BMP) and supplementary(追加的) characters that lie outside the BMP.
? BMP characters have these characteristics:
- Their code point values are between 0 and 65535 (or
U+0000
andU+FFFF
). - They can be encoded in a variable-length encoding using 8, 16, or 24 bits (1 to 3 bytes).
- They can be encoded in a fixed-length encoding using 16 bits (2 bytes).
- They are sufficient(充分的) for almost all characters in major(主要的) languages.
? Supplementary characters lie outside the BMP:
- Their code point values are between
U+10000
andU+10FFFF
). - 占用的空間比BMP內(nèi)的字符大倘是,最多占用4字節(jié)
mysql
支持的unicode
:
字符集 | 單個字符占用空間 | 支持的字符 | 描述 |
---|---|---|---|
utf8mb4 |
1到4字節(jié) | BMP and supplementary | A UTF-8 encoding of the Unicode character set |
utf8mb3 | 1到3字節(jié) | BMP only | A UTF-8 encoding of the Unicode character set |
utf8 | 1到3字節(jié) | BMP only | An alias for utf8mb3 . |
ucs2 | 2字節(jié) | BMP and supplementary | The UCS-2 encoding of the Unicode character set |
utf16 | 2或4字節(jié) | BMP and supplementary | The UTF-16 encoding for the Unicode character set . Like ucs2 but with an extension for supplementary characters. |
utf16le | 2或4字節(jié) | BMP and supplementary | The UTF-16LE encoding for the Unicode character set. Like utf16 but little-endian rather than big-endian. |
utf32 | 4字節(jié) | BMP and supplementary | The UTF-32 encoding for the Unicode character set |
utf8
當前是utf8mb3
的別稱,但是utf8mb3
將在mysql
將來的版本被刪除袭艺。有時候utf8
會變成utf8mb4
的別稱搀崭,所以為了避免模糊不清,請考慮使用utf8mb4
對于BMP字符猾编,
utf8mb4
與utf8mb3
具有相同編碼值瘤睹,占用的空間也相同。即utf8mb4
是utf8mb3
的超集對于補充字符答倡,
utf8mb4
需要四個字節(jié)來存儲