報錯信息:
java.lang.RuntimeException: java.sql.SQLException:
Incorrect string value: '\\xF0\\x9F\\x9A\\xB9\\xE4\\xBC...' for column 'xxx' ...
引起的原因:
在varchar類型的字段里面插入的text包含emoj表情。
MySQL 5.5.3 版本以下玖绿,CHARACTER SET 為 utf8 的字段最長只有3個字節(jié)纵菌。
查看MySQL版本可用以下命令:
mysql> status;
--------------
mysql Ver 14.14 Distrib 5.7.17, for Linux (x86_64) using EditLine wrapper
Connection id: 5
Current database:
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.7.17-0ubuntu0.16.04.1 (Ubuntu)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8mb4
Db characterset: utf8mb4
Client characterset: utf8mb4
Conn. characterset: utf8mb4
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 1 hour 28 min 58 sec
Threads: 1 Questions: 28 Slow queries: 0 Opens: 116 Flush tables: 1 Open tables: 29 Queries per second avg: 0.005
--------------
查看相關(guān)字符集設(shè)置信息(下面是我修改后的,修改前utf8mb4處都為utf8):
mysql> show variables like 'character_set_%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
查看所有字符集信息:
mysql> show character set;
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
| binary | Binary pseudo charset | binary | 1 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |
+----------+---------------------------------+---------------------+--------+
由上表可見,utf8最長3個字節(jié)带射。
在5.5.3版本以后,增加了 [utf8mb4] (https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html) 字符集循狰。
這是官方原文:
The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplementary characters:
- For a BMP character, utf8 and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.
- For a supplementary character, utf8 cannot store the character at all, whereas utf8mb4 requires four bytes to store it. Because utf8 cannot store the character at all, you have no supplementary characters in utf8 columns and need not worry about converting characters or losing data when upgrading utf8 data from older versions of MySQL.
因此窟社,只需要將字符集由utf8修改為utf8mb4即可支持4字節(jié)的BPM等數(shù)據(jù)。
如何修改:
修改MySQL配置文件绪钥,修改mysql的客戶端默認連接字符集灿里,服務端以及數(shù)據(jù)庫的默認字符集:
sudo vim /etc/mysql/my.cnf
在末尾或者已有的中括號里的塊中加上如下配置:
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'
重啟MySQL服務:
sudo service mysql restart
重啟服務后,通過以下SQL可以查看結(jié)果:
mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
MySQL方面修改好后程腹,需要修改已有的數(shù)據(jù)庫或者表或者字段:
修改表的字符集:
ALTER TABLE tablename charset=utf8mb4;
修改表字段:
ALTER TABLE tablename MODIFY COLUMN columnname VARCHAR(255) CHARACTER SET utf8mb4;
修改表的字符集和修改字段可以用一條SQL完成匣吊。
如果使用的java連接MySQL,需要修改原有的連接設(shè)置:
原jdbcUrl:jdbc:mysql:///dbname?useUnicode=true&characterEncoding=UTF-8
需要去掉編碼參數(shù):
jdbcUrl:jdbc:mysql:///dbname
** 此處不要想當然的將參數(shù)useUnicode=true&characterEncoding=UTF-8修改為useUnicode=true&characterEncoding=utf8mb4
寸潦,否則會報出以下錯誤:
Could not get JDBC Connection; nested exception is com.MySQL.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:
Unknown character set: 'utf8mb4'
我們在MySQL的配置文件里已經(jīng)配置了[client]的默認訪問字符集了缀去,所以,此處將參數(shù)去掉即可甸祭。
吃水不忘打井人:
http://blog.csdn.net/secretx/article/details/21253559
http://stackoverflow.com/questions/7814293/how-to-insert-utf-8-mb4-characteremoji-in-ios5-in-mysql