僅記錄一個故障的分析過程,如果有誤請見諒胚膊。
一豺撑、問題來源
最近一個朋友在做大表truncte的時候出現(xiàn)了堵塞娃循。SEMAPHORES幾乎都是DICT_SYS
對于show processlist顯示如下朴肺,并且killed標(biāo)記已經(jīng)不做響應(yīng)了窖剑。
從時間來看我們發(fā)現(xiàn)truncate命令執(zhí)行的時間最長,那么可能是它造成的問題戈稿,但是為什么會造成問題呢西土。結(jié)合pstack簡單分析一下
二、查看pstack棧分析
這里先貼出堵塞源頭的truncate語句的棧鞍盗,如下:
Thread 178 (Thread 0x7f5ed0e8d700 (LWP 135097)):
#0 mach_read_from_1 (b=<optimized out>) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/rem/rem0rec.cc:561
#1 rec_get_bit_field_1 (rec=0x7f5eab8394df "931706384825733120", shift=0, mask=7, offs=3) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/include/rem0rec.ic:168
#2 rec_get_status (rec=0x7f5eab8394df "931706384825733120") at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/include/rem0rec.ic:472
#3 rec_get_offsets_func (rec=0x7f5eab8394df "931706384825733120", index=0x7f5d9f04ad88, offsets=0x7f5d71627d08, n_fields=1, heap=0x7f5ed0e88ba8) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/rem/rem0rec.cc:561
#4 0x000000000111895f in btr_search_drop_page_hash_index (block=0x7f5ea7e48570) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0sea.cc:1278
#5 0x000000000111b0e9 in btr_search_drop_page_hash_when_freed (page_id=..., page_size=...) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0sea.cc:1387
#6 0x000000000119f026 in fseg_free_extent (seg_inode=<optimized out>, space=116, page_size=..., page=192256, ahi=<optimized out>, mtr=<optimized out>) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/fsp/fsp0fsp.cc:3798
#7 0x00000000011a317b in fseg_free_step (header=<optimized out>, ahi=true, mtr=0x7f5ed0e893c0) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/fsp/fsp0fsp.cc:3890
#8 0x00000000010f45bd in btr_free_but_not_root (block=0x7f5e9f76e3e8, log_mode=MTR_LOG_NO_REDO) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0btr.cc:1152
#9 0x00000000010f48ec in btr_free_if_exists (page_id=..., page_size=..., index_id=153, mtr=0x7f5ed0e89a00) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0btr.cc:1200
#10 0x000000000114f86d in dict_drop_index_tree (rec=<optimized out>, pcur=<optimized out>, mtr=0x7f5ed0e89a00) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/dict/dict0crea.cc:1160
#11 0x000000000107ab0c in DropIndex::operator() (this=0x7f5ed0e8b210, mtr=0x7f5ed0e89a00, pcur=0x7f5ed0e89ed0) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:947
#12 0x0000000001083d2a in for_each<DropIndex> (callback=..., this=<optimized out>) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:102
#13 SysIndexIterator::for_each<DropIndex> (this=<optimized out>, callback=...) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:160
#14 0x000000000107fd2d in row_truncate_table_for_mysql (table=<optimized out>, trx=0x7f5ec0c00ad0) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:2022
實際上我們的truncate有很多步驟需了,其中步驟9為
Step-9: Drop all indexes (this include freeing of the pages
associated with them).
這一步中包含3個代價較高的部分,從DEBUG DICT_SYC的加鎖和釋放來看般甲,5.7.22版本如下:
- 持有DICT_SYS維護(hù)AHI肋乍,并且持有較早在row_truncate_table_for_mysql函數(shù)的1844行持有,釋放在2071行的下層函數(shù)row_mysql_unlock_data_dictionary中敷存,隨后進(jìn)行了內(nèi)存維護(hù)墓造。
- 不持有DICT_SYS維護(hù)innodb buffer主要接口為buf_LRU_remove_pages。
- 不持有DICT_SYS锚烦,unlink文件觅闽。
如上故障的棧正是在進(jìn)行AHI的維護(hù),并且在dict_drop_index_tree中開頭就對ut_ad(mutex_own(&dict_sys->mutex))進(jìn)行了斷言涮俄,也就是說這里肯定是在持有dict_sys鎖的情況下進(jìn)行的谱煤,那么我們來看一下到底發(fā)生了什么,我貼出大概的執(zhí)行的步驟如下:
循環(huán)每一個index
btr_free_but_not_root
釋放所有的非root節(jié)點
循環(huán)每個extent進(jìn)行釋放
->fseg_free_step
主要釋放一個extent
是否開啟了AHI
循環(huán)extent
->fseg_free_extent
是否開啟了AHI,如果是,循環(huán)每一個page
->btr_search_drop_page_hash_when_freed
對單個的page進(jìn)行處理禽拔,
首先調(diào)用buf_page_get_gen刘离,標(biāo)記為BUF_PEEK_IF_IN_POOL,看page是否在pool中
/*!< get if in pool, do not make the block young in the LRU list */
如果block不在pool中直接返回睹栖,不做處理硫惕。
如果block存在與pool中,則
->btr_search_drop_page_hash_index
->循環(huán)block中的每行記錄 (這個步驟最慢)
->計算每個字段的偏移量
->循環(huán)每個字段
并且然后flod放入flod數(shù)組中野来。每行都包含這個值
->根據(jù)構(gòu)建的flod值drop AHI信息
ha_remove_all_nodes_to_page
每次extent的刪除進(jìn)行mtr 提交
稍微總結(jié)一下如下恼除,下面做這些的目的是精準(zhǔn)的清理掉AHI中的信息:
循環(huán)每一個index
循環(huán)每個extent
AHI 如果開啟,則
循環(huán)每一個page
如果page在pool中(注意這里)
循環(huán)每一行
循環(huán)每一個字段
我們發(fā)現(xiàn)這實際上是一個5層循環(huán)曼氛,代價最高的落在最后的循環(huán)每個page中每一行的每個字段上豁辉,如果page存在于這正是這個棧帶給我們的信息。
這些步驟需要持有DICT_SYS這樣一個全局字典結(jié)構(gòu)的保護(hù)鎖進(jìn)行舀患,而在很多地方比如如下的函數(shù)中都會持有這個鎖進(jìn)行互斥保護(hù):
- >ha_innobase::get_foreign_key_list:獲取外鍵列表
- >ha_innobase::open:每當(dāng)table cache不在的時候都會調(diào)用它取建立徽级。
這2個函數(shù)也是本例中等待的會話,并且在進(jìn)行mutex的帶的時候并不會被kill標(biāo)記喚醒聊浅,因此kill是不能生效的餐抢。我在查詢BUG的發(fā)現(xiàn)也有不少人遇到類似的問題如下:
https://bugs.mysql.com/bug.php?id=91977
但是bug的狀態(tài)一直沒有變?yōu)閏losed现使,在這個BUG的最后也有人問出問題:
因此不確認(rèn)是否已經(jīng)改進(jìn)了。
三旷痕、5.7中drop和truncate在內(nèi)存維護(hù)上的區(qū)別
其次我們來也簡單討論下truncate在innodb buffer維護(hù)上的代價碳锈,對于5.7來講這個地方和drop是有區(qū)別的,函數(shù)接口 buf_LRU_flush_or_remove_pages 用于確認(rèn)是否維護(hù) LRU list欺抗,其中有三種類型:
/** Algorithm to remove the pages for a tablespace from the buffer pool.
See buf_LRU_flush_or_remove_pages(). */
enum buf_remove_t {
BUF_REMOVE_ALL_NO_WRITE, /*!< Remove all pages from the buffer
pool, don't write or sync to disk */
BUF_REMOVE_FLUSH_NO_WRITE, /*!< Remove only, from the flush list,
don't write or sync to disk */
BUF_REMOVE_FLUSH_WRITE /*!< Flush dirty pages to disk only
don't remove from the buffer pool */
};
- drop為:BUF_REMOVE_FLUSH_NO_WRITE售碳,需要維護(hù)flush list,不回寫數(shù)據(jù)
- trunacte為:BUF_REMOVE_ALL_NO_WRITE绞呈,需要維護(hù)flush list和lru list贸人,不回寫數(shù)據(jù)
因此5.7最好使用drop+create代替trucnate。
四报强、測試和建議
隨后我對一個稍微大的表truncate灸姊,測試發(fā)現(xiàn)代價如我們描述:
正如我們描述的拱燃,我這里實際上沒有建立AHI秉溉,但是全表掃描了一次,那么大量的page加載到了innodb buffer碗誉。這里btr_search_drop_page_hash_when_freed實際就是AHI維護(hù)的上層接口召嘶,如果有大量的AHI的存在,代價確實客觀哮缺,并且這個是持有DICT_SYS的操作弄跌。
對于AHI的維護(hù)方面直到8.0.21當(dāng)時我看一個增加分區(qū)慢的case的時候依舊存在一些問題提交的BUG如下:
https://bugs.mysql.com/bug.php?id=101900
因此建議如下:
- 對于大表的truncate操作,盡量保存在innodb buffer中沒有相關(guān)page了尝苇,自然的淘汰的page也會單頁維護(hù)AHI铛只。如果page在innodb buffer中不存在,自然不會進(jìn)入AHI維護(hù)糠溜。
- 對于大表的truncate操作淳玩,5.7還是建議用drop+create來代替。我也見過因為維護(hù)innodb buffer帶來的故障非竿。
- drop可以先link一個硬鏈接
- 低峰期進(jìn)行DDL蜕着,任何DDL都是,避免未知的影響红柱,影響太大了承匣。
其他:斷點
我主要使用了如下斷點
- 線程條件斷點:this==0x3424938 為DICT_SYS全局地址
- 函數(shù)斷點:unlink為文件釋放的linux接口
- 函數(shù)斷點:btr_free_but_not_root為AHI維護(hù)的上層接口
- 函數(shù)斷點:buf_LRU_remove_all_pages為內(nèi)存維護(hù)接口
20 breakpoint keep y <MULTIPLE> thread 42
stop only if this==0x3424938
stop only in thread 42
breakpoint already hit 29 times
20.1 y 0x0000000001998de0 in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int)
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:973
20.2 y 0x000000000199906a in PolicyMutex<TTASEventMutex<BlockMutexPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int)
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:973
20.3 y 0x0000000001a52966 in PolicyMutex<OSTrackMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int)
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:973
21 breakpoint keep y <MULTIPLE> thread 42
stop only if this==0x3424938
stop only in thread 42
breakpoint already hit 29 times
21.1 y 0x0000000001998e83 in PolicyMutex<TTASEventMutex<GenericPolicy> >::exit()
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:947
21.2 y 0x000000000199910d in PolicyMutex<TTASEventMutex<BlockMutexPolicy> >::exit()
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:947
21.3 y 0x0000000001a52859 in PolicyMutex<OSTrackMutex<GenericPolicy> >::exit() at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:947
22 breakpoint keep y 0x00007ffff6667140 <unlink>
breakpoint already hit 4 times
23 breakpoint keep y 0x0000000001bd815e in btr_free_but_not_root(buf_block_t*, mtr_log_t) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/btr/btr0btr.cc:1158
breakpoint already hit 2 times
24 breakpoint keep y 0x00007ffff6667140 <unlink>
breakpoint already hit 2 times
25 breakpoint keep y 0x0000000001c49722 in buf_LRU_remove_all_pages(buf_pool_t*, ulint) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/buf/buf0lru.cc:786
breakpoint already hit 8 times
5.7 drop棧
#0 btr_free_but_not_root (block=0x7fffb43622c0, log_mode=MTR_LOG_ALL) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/btr/btr0btr.cc:1158
#1 0x0000000001bd8516 in btr_free_if_exists (page_id=..., page_size=..., index_id=7175, mtr=0x7fffec04b450) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/btr/btr0btr.cc:1239
#2 0x0000000001c5f83e in dict_drop_index_tree (rec=0x7fffc6383cbb "", pcur=0x7fff84b0e4b0, mtr=0x7fffec04b450)
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/dict/dict0crea.cc:1162
#3 0x0000000001b429ad in row_upd_clust_step (node=0x7fff84b0ddb0, thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:2883
#4 0x0000000001b43080 in row_upd (node=0x7fff84b0ddb0, thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:3042
#5 0x0000000001b4355f in row_upd_step (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:3188
#6 0x0000000001a890b1 in que_thr_step (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1045
#7 0x0000000001a8938d in que_run_threads_low (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1125
#8 0x0000000001a89543 in que_run_threads (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1165
#9 0x0000000001a897f9 in que_eval_sql (info=0x7fff8403d680,
sql=0x7fff8403b960 "PROCEDURE DROP_TABLE_PROC () IS\nsys_foreign_id CHAR;\ntable_id CHAR;\nindex_id CHAR;\nforeign_id CHAR;\nspace_id INT;\nfound INT;\nDECLARE CURSOR cur_fk IS\nSELECT ID FROM SYS_FOREIGN\nWHERE FOR_NAME = :table"..., reserve_dict_mutex=0, trx=0x7fffeca7ab60) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1242
#10 0x0000000001ae5f83 in row_drop_table_for_mysql (name=0x7fffec04c840 "test/test11", trx=0x7fffeca7ab60, drop_db=false, nonatomic=true, handler=0x0)
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0mysql.cc:5207
#11 0x0000000001984580 in ha_innobase::delete_table (this=0x7fff84006db0, name=0x7fffec04df70 "./test/test11")
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/handler/ha_innodb.cc:13545
#12 0x0000000000f5d1ae in handler::ha_delete_table (this=0x7fff84006db0, name=0x7fffec04df70 "./test/test11") at /opt/percona-server-locks-detail-5.7.22/sql/handler.cc:5240
#13 0x0000000000f566b7 in ha_delete_table (thd=0x7fff84000c00, table_type=0x2e64aa0, path=0x7fffec04df70 "./test/test11", db=0x7fff84006bb8 "test", alias=0x7fff840065e8 "test11",
generate_warning=true) at /opt/percona-server-locks-detail-5.7.22/sql/handler.cc:2763
#14 0x00000000015f6bab in mysql_rm_table_no_locks (thd=0x7fff84000c00, tables=0x7fff84006630, if_exists=false, drop_temporary=false, drop_view=false, dont_log_query=false)
at /opt/percona-server-locks-detail-5.7.22/sql/sql_table.cc:2680
#15 0x00000000015f59b4 in mysql_rm_table (thd=0x7fff84000c00, tables=0x7fff84006630, if_exists=0 '\000', drop_temporary=0 '\000')
at /opt/percona-server-locks-detail-5.7.22/sql/sql_table.cc:2221
由于delete了SYS_INDEXES而調(diào)用了dict_drop_index_tree
sql += "DECLARE CURSOR cur_idx IS\n"
"SELECT ID FROM SYS_INDEXES\n"
"WHERE TABLE_ID = table_id\n"
"LOCK IN SHARE MODE;\n";
sql += "found := 1;\n"
"OPEN cur_idx;\n"
"WHILE found = 1 LOOP\n"
" FETCH cur_idx INTO index_id;\n"
" IF (SQL % NOTFOUND) THEN\n"
" found := 0;\n"
" ELSE\n"
" DELETE FROM SYS_FIELDS\n"
" WHERE INDEX_ID = index_id;\n"
" DELETE FROM SYS_INDEXES\n"
" WHERE ID = index_id\n"
" AND TABLE_ID = table_id;\n"
" END IF;\n"
"END LOOP;\n"
"CLOSE cur_idx;\n";
8.0 為BUF_REMOVE_NONE
#0 Fil_shard::space_delete (this=0x7fffe02e7be0, space_id=1448, buf_remove=BUF_REMOVE_NONE) at /newdata/mysql-8.0.23/storage/innobase/fil/fil0fil.cc:4512
#1 0x000000000543c023 in fil_delete_tablespace (space_id=1448, buf_remove=BUF_REMOVE_NONE) at /newdata/mysql-8.0.23/storage/innobase/fil/fil0fil.cc:4665
#2 0x000000000514733b in row_drop_tablespace (space_id=1448, filepath=0xaa81ec0 "./t10/#sql-ib2899-698767554.ibd") at /newdata/mysql-8.0.23/storage/innobase/row/row0mysql.cc:3772
#3 0x00000000050527a9 in Log_DDL::replay_delete_space_log (this=0x7fffe031c790, space_id=1448, file_path=0xaa81ec0 "./t10/#sql-ib2899-698767554.ibd")
at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1703
#4 0x00000000050521b3 in Log_DDL::replay (this=0x7fffe031c790, record=...) at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1590
#5 0x0000000005051dc2 in Log_DDL::replay_by_thread_id (this=0x7fffe031c790, thread_id=7) at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1520
#6 0x00000000050536bb in Log_DDL::post_ddl (this=0x7fffe031c790, thd=0xabe73e0) at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1924
#7 0x0000000004f18da4 in innobase_post_ddl (thd=0xabe73e0) at /newdata/mysql-8.0.23/storage/innobase/handler/ha_innodb.cc:4682
#8 0x0000000003eaedc6 in Sql_cmd_truncate_table::cleanup_base (this=0xabbd9c8, thd=0xabe73e0, hton=0xaa39a20) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:401
#9 0x0000000003eaf09c in Sql_cmd_truncate_table::<lambda()>::operator()(void) const (__closure=0x7fffe47cc188) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:494
#10 0x0000000003eb0196 in Scope_guard<Sql_cmd_truncate_table::truncate_base(THD*, TABLE_LIST*)::<lambda()> >::~Scope_guard(void) (this=0x7fffe47cc180, __in_chrg=<optimized out>)
at /newdata/mysql-8.0.23/include/scope_guard.h:41
#11 0x0000000003eaf81f in Sql_cmd_truncate_table::truncate_base (this=0xabbd9c8, thd=0xabe73e0, table_ref=0xabbda00) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:606
#12 0x0000000003eb00e6 in Sql_cmd_truncate_table::execute (this=0xabbd9c8, thd=0xabe73e0) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:746