背景
線上很多實(shí)例都是mysql5.7.17版本让网,但是這個(gè)版本有一個(gè)嚴(yán)重的bug可能會(huì)造成主從數(shù)據(jù)不一致或者其他錯(cuò)誤,官方在5.7.19解決了這個(gè)問(wèn)題蹲缠,以下是官方的描述:
英文:Replication: In certain cases, the master could write to the binary log a last_committed value which was smaller than it should have been.
This could cause the slave to execute in parallel transactions which should not have been, leading to inconsistencies or other errors. (Bug #84471, Bug #25379659)
中文:在某些場(chǎng)景下舀瓢,master記錄到binlog里面的last_committed要比他本應(yīng)該記的last_committed小坊夫,這將會(huì)導(dǎo)致主從數(shù)據(jù)不一致或者其他錯(cuò)誤
疑問(wèn)
1.這個(gè)bug一定會(huì)導(dǎo)致數(shù)據(jù)不一致嗎
2.如何復(fù)現(xiàn)這個(gè)bug
3.如何用這個(gè)bug復(fù)現(xiàn)主從不一致
4.有沒(méi)有可能主從復(fù)制兩個(gè)線程正常大溜,但是數(shù)據(jù)已經(jīng)不一致了
5.官方是如何解決的這個(gè)bug
6.評(píng)估當(dāng)前線上mysql5.7.17是否需要升級(jí)
課前知識(shí)(并行復(fù)制-COMMIT_ORDER)
master端:
1.binlog組提交的三個(gè)階段
trans_commit_stmt
ha_commit_trans
MYSQL_BIN_LOG::prepare
xxxx
MYSQL_BIN_LOG::commit
MYSQL_BIN_LOG::ordered_commit
FLUSH /*階段1*/
SYNC /*階段2*/
COMMIT /*階段3*/
2.主庫(kù)哪些事物可以放到一個(gè)組里提交
ordered_commit:FLUSH----->ordered_commit:COMMIT:process_commit_stage_queue,這個(gè)階段的事物肯定互相無(wú)影響掘猿,會(huì)分配相同last_committed及不同的sequence_number
3.一起提交的事物組如何區(qū)別
last_committed相同的是一個(gè)組扶认,組內(nèi)區(qū)別不同事物根據(jù)sequence_number區(qū)別
4.last_committed與sequence_number關(guān)系
last_committed是上一組sequence_number最大值
5.主庫(kù)binlog記錄格式
#200628 19:27:52 server id 1001 end_log_pos 259 CRC32 0x5682e65e GTID last_committed=0 sequence_number=1
#200628 19:28:00 server id 1001 end_log_pos 534 CRC32 0x999fefc6 GTID last_committed=0 sequence_number=2
#200628 19:28:08 server id 1001 end_log_pos 809 CRC32 0x9a7ad89c GTID last_committed=0 sequence_number=3
slave端:
1.slave端有哪些線程參與
io線程接收master binlog寫入自己relay log
coordinator線程讀取自己的relay log進(jìn)行分發(fā)
worker線程獲取分發(fā)到的日志進(jìn)行回放
2.并行的單位是什么
以GTID或者匿名GTID事物為單位進(jìn)行分發(fā)
3.哪些事物可以并行
當(dāng)前有空閑的worker(邏輯見(jiàn)下方源碼)
事物的last_committed <= lwm_estimate (邏輯見(jiàn)下方源碼)
# slave端能否并行的2處關(guān)鍵代碼及注釋解釋
1.獲取空閑worker
Mts_submode_logical_clock::get_free_worker(Relay_log_info *rli)
{
for (Slave_worker **it= rli->workers.begin(); it != rli->workers.end(); ++it)
{
Slave_worker *w_i= *it;
if (w_i->jobs.len == 0)
return w_i;
}
return 0;
}
2.GAQ中l(wèi)wm_estimate解釋
the last time index containg lwm
+------+
| LWM |
| | |
V V V
GAQ: xoooooxxxxxXXXXX...X
^ ^
| | LWM+1
|
+- tne new current_lwm(一定要保證之前所有的事物都已經(jīng)commited,而不能找最左面最大的"x"粱挡,因?yàn)橛锌赡芮懊娲嬖谖赐瓿傻氖挛?o")
<---- logical (commit) time ----
here `x' stands for committed, `X' for committed and discarded from the running range of the queue, `o' for not committed.
3.對(duì)比last_committed與lwm_estimate
if (!is_new_group)
{
longlong lwm_estimate= estimate_lwm_timestamp(); /*返回last_lwm_timestamp*/
/*last_committed為本次要分發(fā)的last_committed與主庫(kù)binlog中的last_committed對(duì)應(yīng),lwm_estimate為GAQ隊(duì)列里面標(biāo)記為已經(jīng)完成commit的并且前面所有事物都已經(jīng)完成提交的seq_number*/
/*當(dāng) a "<=" b 返回 true, 否者返回 false赠幕,也就是說(shuō) last_committed<=lwm_estimate可以并行分發(fā),如果同一組last_committed肯定<lwm_estimate可以并行分發(fā)询筏,如果是2組交界會(huì)出現(xiàn)last_committed>=lwm_estimate榕堰,如果last_committed>lwm_estimate,說(shuō)明上一組事物還沒(méi)有完全commit,此時(shí)不能并行分發(fā),如果last_committed=lwm_estimate此時(shí)上一組最后一個(gè)事物已經(jīng)完成了逆屡,可以并行分發(fā)*/
/*rli->gaq->assigned_group_index為正在分發(fā)的group在GAQ中的位置圾旨,rli->gaq->entry為GAQ中的頭,GAQ隊(duì)列為空的時(shí)候這兩個(gè)相等*/
if (!clock_leq(last_committed, lwm_estimate) && rli->gaq->assigned_group_index != rli->gaq->entry)
{
if (wait_for_last_committed_trx(rli, last_committed, lwm_estimate))
{
DBUG_RETURN(-1);
}
if (gap_successor)
last_lwm_timestamp= sequence_number - 1;
DBUG_ASSERT(!clock_leq(sequence_number, estimate_lwm_timestamp()));
}
delegated_jobs++;
DBUG_ASSERT(!force_new_group);
}
疑問(wèn)1分析(這個(gè)bug一定會(huì)導(dǎo)致數(shù)據(jù)不一致嗎)
一魏蔗、測(cè)試case
1.搭建主從測(cè)試環(huán)境砍的,開啟并行復(fù)制(COMMIT_ORDER)
2.sysbench壓測(cè)oltp_read_write.lua 24小時(shí)
3.觀察并行復(fù)制是否生效
4.pt-checksum校驗(yàn)數(shù)據(jù)是否一致
二、測(cè)試結(jié)果
pt-checksum校驗(yàn)后發(fā)現(xiàn)數(shù)據(jù)并沒(méi)有出現(xiàn)不一致莺治,也就是說(shuō)這個(gè)bug并不能百分百能出現(xiàn)廓鞠,只有在特定場(chǎng)景下才能復(fù)現(xiàn),例如下面疑問(wèn)3疑問(wèn)4人為造出的場(chǎng)景或者其他特殊場(chǎng)景下
疑問(wèn)2分析(如何復(fù)現(xiàn)這個(gè)bug)
1.模擬主庫(kù)組提交并打印組提交binlog
session1:
mysql> SET DEBUG_SYNC = "bgc_after_enrolling_for_commit_stage SIGNAL insert1_ready WAIT_FOR continue_commit_stage";
mysql> INSERT INTO t1 VALUES(1, 1);
session2:
mysql> INSERT INTO t1 VALUES(2, 1);
session3:
mysql> INSERT INTO t1 VALUES(3, 1);
session4:
mysql> SET DEBUG_SYNC = "now SIGNAL continue_commit_stage";
#binglog輸出日志(正常日志):
#200701 12:15:25 server id 1001 end_log_pos 259 CRC32 0x57f565f3 GTID last_committed=0 sequence_number=1
#200701 12:15:43 server id 1001 end_log_pos 517 CRC32 0x29626007 GTID last_committed=0 sequence_number=2
#200701 12:15:46 server id 1001 end_log_pos 775 CRC32 0xea48f246 GTID last_committed=0 sequence_number=3
2.模擬bug中提出的組提交過(guò)程中實(shí)際寫入binlog中的last_committed比應(yīng)該寫入binlog的last_committed小
session1:
mysql> SET DEBUG_SYNC = "bgc_after_enrolling_for_commit_stage SIGNAL insert1_ready WAIT_FOR continue_commit_stage";
mysql> INSERT INTO t1 VALUES(1, 1);
session2:
mysql> SET DEBUG_SYNC = "now WAIT_FOR insert1_ready";
mysql> SET DEBUG_SYNC = "reached_finish_commit WAIT_FOR insert2_finish";
mysql> INSERT INTO t1 VALUES(2, 1);
session3:
mysql> SET DEBUG_SYNC = "now SIGNAL continue_commit_stage";
mysql> UPDATE t1 SET c2 = 2 WHERE c1 = 2;
mysql> SET DEBUG_SYNC = "now SIGNAL insert2_finish";
#binlog輸出日志(異常日志, last_committed應(yīng)該是上一組最大sequence_number谣旁,應(yīng)該是2而不應(yīng)該是1)
#200701 12:38:01 server id 1001 end_log_pos 259 CRC32 0x424f636d GTID last_committed=0 sequence_number=1
#200701 12:38:32 server id 1001 end_log_pos 517 CRC32 0x75f98373 GTID last_committed=0 sequence_number=2
#200701 12:40:50 server id 1001 end_log_pos 775 CRC32 0x3a792138 GTID last_committed=1 sequence_number=3
疑問(wèn)3分析(如何用這個(gè)bug復(fù)現(xiàn)主從不一致)
一床佳、思路
1.讓主庫(kù)實(shí)現(xiàn)組提交,并出現(xiàn)bug中提出的實(shí)際寫入binlog中的last_committed比應(yīng)該寫入binlog的last_committed小
2.讓從庫(kù)實(shí)現(xiàn)并行分發(fā) /*這塊自己改代碼,讓worker干活慢點(diǎn)蔓挖,保證并行分發(fā)夕土,否則事物會(huì)一直分發(fā)給worker 0*/
3.讓從庫(kù)并行的worker滿足:先分發(fā)給事物的worker晚于后分發(fā)事物的worker完成事物提交 /*這塊改代碼讓前面的worker慢點(diǎn)干活馆衔,后面的worker正常執(zhí)行瘟判,確保后來(lái)的worker先干完,事物分發(fā)是在前面做的角溃,這里面人為調(diào)整worker快慢理論不影響正確性*/
二拷获、驗(yàn)證
# 主庫(kù)
-------------session1-------------
truncate table t1;
flush logs;
#快速執(zhí)行兩條語(yǔ)句讓lwm_estimate變化,否則其初始值為0减细,下面的語(yǔ)句并發(fā)不起來(lái)
delete from emp limit 1;
delete from emp limit 1;
SET DEBUG_SYNC = "bgc_after_enrolling_for_commit_stage SIGNAL insert1_ready WAIT_FOR continue_commit_stage";
INSERT INTO t1 VALUES(1, 1);
-------------session2-------------
SET DEBUG_SYNC = "now WAIT_FOR insert1_ready";
SET DEBUG_SYNC = "reached_finish_commit WAIT_FOR insert2_finish";
INSERT INTO t1 VALUES(2, 1);
-------------session3-------------
SET DEBUG_SYNC = "now SIGNAL continue_commit_stage";
UPDATE t1 SET c2 = 2 WHERE c1 = 2;
SET DEBUG_SYNC = "now SIGNAL insert2_finish";
#主庫(kù)binlog
#200702 15:13:47 server id 1001 end_log_pos 259 CRC32 0xd219aa0c GTID last_committed=0 sequence_number=1
#200702 15:13:47 server id 1001 end_log_pos 530 CRC32 0x2cd5ba1c GTID last_committed=1 sequence_number=2
#200702 15:13:47 server id 1001 end_log_pos 801 CRC32 0xf592e3b9 GTID last_committed=2 sequence_number=3
#200702 15:13:53 server id 1001 end_log_pos 1059 CRC32 0xb1735f8d GTID last_committed=2 sequence_number=4
#200702 15:13:57 server id 1001 end_log_pos 1317 CRC32 0xbea30d9c GTID last_committed=3 sequence_number=5
# 從庫(kù)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: replication
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000216
Read_Master_Log_Pos: 1520
Relay_Log_File: localhost-relay-bin.000569
Relay_Log_Pos: 1207
Relay_Master_Log_File: mysql-bin.000216
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 0 failed executing transaction 'e5a31538-b488-11ea-b6f5-0920853da06e:28169' at master log mysql-bin.000216, end_log_pos 1489. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Master_Log_Pos: 994
Relay_Log_Space: 2031
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1032
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 0 failed executing transaction 'e5a31538-b488-11ea-b6f5-0920853da06e:28169' at master log mysql-bin.000216, end_log_pos 1489. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1001
Master_UUID: e5a31538-b488-11ea-b6f5-0920853da06e
Master_Info_File: /Users/gaochao/Desktop/mysql_souce_debug/multi/3316/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 200702 15:14:08
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: e5a31538-b488-11ea-b6f5-0920853da06e:10-28169
Executed_Gtid_Set: e5a31538-b488-11ea-b6f5-0920853da06e:1-28168
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
疑問(wèn)4分析(有沒(méi)有可能主從復(fù)制鏈路正常匆瓜,但是數(shù)據(jù)已經(jīng)不一致了)
# 結(jié)論:可能出現(xiàn)主從復(fù)制鏈路正常,但是數(shù)據(jù)已經(jīng)不一致了未蝌,下面為復(fù)現(xiàn)步驟
# 主庫(kù)操作及輸出
-------------session1-------------
truncate table t1;
flush logs;
#快速執(zhí)行兩條語(yǔ)句讓lwm_estimate變化(lwm_estimate在上一次出現(xiàn)無(wú)法并行的情況下才會(huì)變化)驮吱,否則其初始值為0一直小于等于last_committed,導(dǎo)致后續(xù)的測(cè)試語(yǔ)句無(wú)法并發(fā)而無(wú)法復(fù)現(xiàn)
INSERT INTO t1 VALUES(3, 1);
delete from emp limit 1;
SET DEBUG_SYNC = "bgc_after_enrolling_for_commit_stage SIGNAL insert1_ready WAIT_FOR continue_commit_stage";
INSERT INTO t1 VALUES(1, 1);
-------------session2-------------
SET DEBUG_SYNC = "now WAIT_FOR insert1_ready";
SET DEBUG_SYNC = "reached_finish_commit WAIT_FOR insert2_finish";
update t1 set c2=100 where c1=3;
-------------session3-------------
SET DEBUG_SYNC = "now SIGNAL continue_commit_stage";
update t1 set c2=200 where c1=3;
SET DEBUG_SYNC = "now SIGNAL insert2_finish";
# 主庫(kù)binlog輸出
#200703 11:06:47 server id 1001 end_log_pos 259 CRC32 0x004fd38a GTID last_committed=0 sequence_number=1
#200703 11:06:47 server id 1001 end_log_pos 517 CRC32 0x236a84cc GTID last_committed=1 sequence_number=2
#200703 11:06:47 server id 1001 end_log_pos 788 CRC32 0x5ab78655 GTID last_committed=2 sequence_number=3
#200703 11:06:55 server id 1001 end_log_pos 1046 CRC32 0x78d7f95c GTID last_committed=2 sequence_number=4
#200703 11:07:00 server id 1001 end_log_pos 1314 CRC32 0xf9ea7b41 GTID last_committed=3 sequence_number=5
# 主庫(kù)最后結(jié)果
mysql> select * from t1;
+----+------+
| c1 | c2 |
+----+------+
| 1 | 1 |
| 3 | 200 |
+----+------+
2 rows in set (0.00 sec)
# 從庫(kù)操作及輸出
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: replication
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000294
Read_Master_Log_Pos: 1517
Relay_Log_File: localhost-relay-bin.000057
Relay_Log_Pos: 1730
Relay_Master_Log_File: mysql-bin.000294
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1517
Relay_Log_Space: 2028
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1001
Master_UUID: e5a31538-b488-11ea-b6f5-0920853da06e
Master_Info_File: /Users/gaochao/Desktop/mysql_souce_debug/multi/3316/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: e5a31538-b488-11ea-b6f5-0920853da06e:28389-28501
Executed_Gtid_Set: e5a31538-b488-11ea-b6f5-0920853da06e:1-28501
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
# 從庫(kù)最后結(jié)果
mysql> select * from t1;
+----+------+
| c1 | c2 |
+----+------+
| 1 | 1 |
| 3 | 100 |
+----+------+
2 rows in set (0.00 sec)
疑問(wèn)5分析(官方如何解決的這個(gè)bug)
# 有問(wèn)題的代碼(應(yīng)該組內(nèi)每個(gè)事物都應(yīng)該調(diào)用update_max_committed)
for (THD *head= first ; head ; head = head->next_to_commit)
{
#ifndef DBUG_OFF
stage_manager.clear_preempt_status(head);
#endif
if (thd->get_transaction()->sequence_number != SEQ_UNINIT)
update_max_committed(head); /*參見(jiàn)下方定義*/
xxxxxxx
}
# 修復(fù)后的代碼
for (THD *head= first ; head ; head = head->next_to_commit)
{
#ifndef DBUG_OFF
stage_manager.clear_preempt_status(head);
#endif
if (head->get_transaction()->sequence_number != SEQ_UNINIT)
update_max_committed(head); /*參見(jiàn)下方定義*/
xxxxxxx
}
# update_max_committed定義
void MYSQL_BIN_LOG::update_max_committed(THD *thd)
{
Transaction_ctx *trn_ctx= thd->get_transaction();
max_committed_transaction.set_if_greater(trn_ctx->sequence_number);
trn_ctx->sequence_number= SEQ_UNINIT;
}
問(wèn)題6分析(評(píng)估當(dāng)前線上mysql5.7.17是否需要升級(jí))
1.如果環(huán)境允許建議升級(jí)萧吠,雖然比較難出現(xiàn)這個(gè)bug左冬,但是確實(shí)是一個(gè)潛在的風(fēng)險(xiǎn)。
2.在不影響業(yè)務(wù)的前提下pt-checksum校驗(yàn)當(dāng)前環(huán)境主從是否存在不一致纸型。