在 MySql 建表時候一般會指定字符集吵冒,大多數(shù)情況下為了更好的兼容性無腦選了 utf8mb4。但是有時會因為選錯西剥,或歷史遺留問題痹栖,導致使用了 utf8 字符集。當兩個表的字符集不一樣瞭空,在使用字符型字段進行表連接查詢時揪阿,就需要特別注意下查詢耗時是否符合預期。
有次使用 left join 寫一個 SQL咆畏,發(fā)現(xiàn)用時明顯超過預期南捂,經過一頓折騰才發(fā)現(xiàn)是兩個表字符集不一樣,特此記錄一下旧找。
問題分析
mysql> SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ;
+-----------+
| COUNT( *) |
+-----------+
| 13447 |
+-----------+
1 row in set (0.89 sec)
例如上面的 SQL溺健,左表 1W 條數(shù)據(jù),右表 400 多條數(shù)據(jù)钮蛛,在 host_sn 字段上都有索引鞭缭,查詢竟然用了近 900ms剖膳,怎么會這么慢?
mysql> explain SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ;
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-----------------------------------------------------------------+
| 1 | SIMPLE | t | NULL | index | NULL | idx_host_sn | 122 | NULL | 10791 | 100.00 | Using index |
| 1 | SIMPLE | p | NULL | index | NULL | idx_host_sn | 152 | NULL | 457 | 100.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-----------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
查看下執(zhí)行計劃岭辣,的確是使用了索引吱晒,但是細看 Extra 列發(fā)現(xiàn)較正常的連表查詢多了“Using join buffer (Block Nested Loop)”這一信息,這個具體是什么意思我們后面再說易结。
然后我們再看下詳細的執(zhí)行計劃枕荞,使用 explain formart=json。
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "988640.52"
},
"nested_loop": [
{
"table": {
"table_name": "t",
"access_type": "index",
"key": "idx_host_sn",
"used_key_parts": [
"host_sn"
],
"key_length": "122",
"rows_examined_per_scan": 10791,
"rows_produced_per_join": 10791,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "161.00",
"eval_cost": "2158.20",
"prefix_cost": "2319.20",
"data_read_per_join": "2M"
},
"used_columns": [
"host_sn"
]
}
},
{
"table": {
"table_name": "p",
"access_type": "index",
"key": "idx_host_sn",
"used_key_parts": [
"host_sn"
],
"key_length": "152",
"rows_examined_per_scan": 457,
"rows_produced_per_join": 4931487,
"filtered": "100.00",
"using_index": true,
"using_join_buffer": "Block Nested Loop",
"cost_info": {
"read_cost": "23.92",
"eval_cost": "986297.40",
"prefix_cost": "988640.52",
"data_read_per_join": "865M"
},
"used_columns": [
"host_sn"
],
"attached_condition": "<if>(is_not_null_compl(p), (`db0`.`t`.`host_sn` = convert(`db0`.`p`.`host_sn` using utf8mb4)), true)"
}
}
]
}
}
特別需要關注的是這一對 KV
"attached_condition": "<if>(is_not_null_compl(p), (`collection_bullet_0000`.`t`.`host_sn` = convert(`collection_bullet_0000`.`p`.`host_sn` using utf8mb4)), true)"
看字面意思就是當 p 表不為空的時候搞动,執(zhí)行表連接需要先將 p 表的 host_sn 字段轉變?yōu)?utf8mb4 字符集躏精。我們應該都知道在表連接中使用了函數(shù)的話,是無法使用索引的鹦肿。
所以再回到上面我看到的“Using join buffer (Block Nested Loop)”問題矗烛,來解釋下這是一個什么過程。
Nested-Loop Join
MySql 官網(wǎng)對 Nested-Loop Join 有做過解釋箩溃,其實做開發(fā)的同學看到名字就大體知道是啥瞭吃,不就是循環(huán)嵌套嘛。
MySql 中分為 Nested-Loop Join 算法跟 Block Nested-Loop Join 算法涣旨。
例如歪架,有如下三個表,t1霹陡、t2和蚪、t3 使用了這三種 join type。
Table Join Type
t1 range
t2 ref
t3 ALL
當使用 Nested-Loop Join 算法時烹棉,其 join 過程如下所示攒霹,其實就是簡單的三層循環(huán)。
for each row in t1 matching range {
for each row in t2 matching reference key {
for each row in t3 {
if row satisfies join conditions, send to client
}
}
}
Block Nested-Loop Join(BNL) 算法是對 Nested-Loop Join 算法的一種優(yōu)化浆洗。BNL 算法緩沖外部循環(huán)中讀取的行來減少內部循環(huán)中讀取表的次數(shù)催束。例如,將 10 行數(shù)據(jù)讀取到緩沖器中伏社,并且將緩沖器傳遞到下一個循環(huán)內部抠刺,內部循環(huán)中讀取的每一行與緩沖器中的所有 10 行進行比較。這將使讀取內部表的次數(shù)減少一個數(shù)量級洛口。
for each row in t1 matching range {
for each row in t2 matching reference key {
store used columns from t1, t2 in join buffer
if buffer is full {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client
}
}
empty join buffer
}
}
}
if buffer is not empty {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client
}
}
}
算法實現(xiàn)如上矫付,只有當 “join buffer” 滿的時候才會觸發(fā) t3 表的讀取,如果 “join buffer” 的 size = 10 那么就可以減少 10 倍的 t3 表被讀取次數(shù)第焰,從內存中讀取數(shù)據(jù)的效率顯然要比從磁盤讀取的效率高的多买优。從而提升 join 的效率。
但其實再好的優(yōu)化畢竟也是嵌套循環(huán),做開發(fā)的同學應該都知道 O(N2) 的時間復雜度是無法接受的。這也是我們這個查詢這么慢的根因。
解決辦法
解決辦法其實很簡單锐想,修改右表的字符集就可以解決餐曹。
在變更數(shù)據(jù)集之前我們先用 show table status 查看下當前表的狀態(tài)媳溺。
mysql> show table status like 'app_config_control_sn';
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
| app_config_control_sn | InnoDB | 10 | Dynamic | 457 | 143 | 65536 | 0 | 32768 | 0 | 1041 | 2023-04-17 03:25:45 | 2023-04-17 03:27:24 | NULL | utf8_general_ci | NULL | | SN |
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
1 row in set (0.00 sec)
接著使用如下命令變更表的字符集。
mysql> ALTER TABLE app_config_control_sn CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Query OK, 457 rows affected (0.09 sec)
Records: 457 Duplicates: 0 Warnings: 0
再次使用 show table status 命令查看下表的狀態(tài)。
mysql> show table status like 'app_config_control_sn';
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| app_config_control_sn | InnoDB | 10 | Dynamic | 457 | 143 | 65536 | 0 | 32768 | 0 | 1041 | 2023-04-17 03:50:11 | 2023-04-17 03:50:11 | NULL | utf8mb4_general_ci | NULL | | SN |
+-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
1 row in set (0.01 sec)
可以看到表的字符集已經發(fā)生了變化,那我們再次執(zhí)行開始的 SQL 及 explain 語句脖咐,確認下問題是否已經解決。
mysql> SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ;
+-----------+
| COUNT( *) |
+-----------+
| 13447 |
+-----------+
1 row in set (0.03 sec)
mysql> explain SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ;
+----+-------------+-------+------------+-------+---------------+-------------+---------+---------------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------------+---------+---------------+-------+----------+--------------------------+
| 1 | SIMPLE | t | NULL | index | NULL | idx_host_sn | 122 | NULL | 10791 | 100.00 | Using index |
| 1 | SIMPLE | p | NULL | ref | idx_host_sn | idx_host_sn | 202 | db0.t.host_sn | 2 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------------+---------+---------------+-------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)
可以看到耗時已經只需要 30ms 左右汇歹,這個就比較符合預期屁擅,而在執(zhí)行計劃中也不再會有“Using join buffer (Block Nested Loop)”信息。
其他
mysql> SELECT COUNT( *) from app_bind_rel t join app_config_control_sn p on t.host_sn = p.host_sn ;
+-----------+
| COUNT( *) |
+-----------+
| 730 |
+-----------+
1 row in set (0.01 sec)
在沒有變更字符集之前产弹,當我們將 left join 修改為 join 的時候會發(fā)現(xiàn)耗時減少了 100 倍派歌,只用了 10 ms,這是為什么呢痰哨?
mysql> explain SELECT COUNT( *) from app_bind_rel t join app_config_control_sn p on t.host_sn = p.host_sn ;
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | p | NULL | index | NULL | idx_host_sn | 152 | NULL | 457 | 100.00 | Using index |
| 1 | SIMPLE | t | NULL | ref | idx_host_sn | idx_host_sn | 122 | func | 1 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)
查看執(zhí)行計劃胶果,發(fā)現(xiàn)使用 join 的時候不會有 “Using join buffer (Block Nested Loop)”。再細看執(zhí)行計劃斤斧,發(fā)現(xiàn)驅動表已經由 t 表變?yōu)榱?p 表早抠。
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "643.80"
},
"nested_loop": [
{
"table": {
"table_name": "p",
"access_type": "index",
"key": "idx_host_sn",
"used_key_parts": [
"host_sn"
],
"key_length": "152",
"rows_examined_per_scan": 457,
"rows_produced_per_join": 457,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "4.00",
"eval_cost": "91.40",
"prefix_cost": "95.40",
"data_read_per_join": "82K"
},
"used_columns": [
"host_sn"
]
}
},
{
"table": {
"table_name": "t",
"access_type": "ref",
"possible_keys": [
"idx_host_sn"
],
"key": "idx_host_sn",
"used_key_parts": [
"host_sn"
],
"key_length": "122",
"ref": [
"func"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 457,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "457.00",
"eval_cost": "91.40",
"prefix_cost": "643.80",
"data_read_per_join": "117K"
},
"used_columns": [
"host_sn"
],
"attached_condition": "(`db0`.`t`.`host_sn` = convert(`db0`.`p`.`host_sn` using utf8mb4))"
}
}
]
}
}
查看詳細的執(zhí)行計劃,可以看到
"attached_condition": "(`collection_bullet_0000`.`t`.`host_sn` = convert(`collection_bullet_0000`.`p`.`host_sn` using utf8mb4))"
這對 KV 依然是存在的撬讽,但是 "using_join_buffer": "Block Nested Loop" 已經不存在了贝或。這個其實主要是因為當 p 表變?yōu)轵寗颖淼臅r候,會先將自己的 host_sn 字段轉為 utf8mb4 字符集锐秦,再與 t 表進行關聯(lián)。t 表由于本來就是 utf8mb4 字符集且存在索引盗忱,就可以正常走數(shù)據(jù)庫索引了酱床,所以查詢耗時也就大大降低。而使用 left join 時候趟佃,t 表作為驅動表是無法優(yōu)化改變的扇谣。
可見在表連接中即使使用了函數(shù)也不一定就沒法走索引,關鍵還是要看用法及明確處理過程闲昭。
記得剛學習數(shù)據(jù)庫的時候罐寨,老師還特別強調驅動表一定要寫在左邊,而隨著數(shù)據(jù)庫技術的不斷迭代發(fā)展序矩,數(shù)據(jù)庫已經能更智能的自動幫我們優(yōu)化處理過程鸯绿,之前很多的數(shù)據(jù)庫規(guī)則也不需要了。