線上一個5.7從庫復制中斷:
mysql>show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.0.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 49461560
Relay_Log_File: db-music1-relay-bin.000002
Relay_Log_Pos: 379466837
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1399
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 3 failed executing transaction 'd199707a-5287-11e7-8f35-246e96729f10:1215544883' at master log mysql-bin.000001, end_log_pos 123. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Master_Log_Pos: 379466624
Relay_Log_Space: 2575974028
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1399
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 3 failed executing transaction 'd199707a-5287-11e7-8f35-246e96729f10:1215544883' at master log mysql-bin.000001, end_log_pos 123. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 244331
Master_UUID: ce522aa3-6398-11e7-945f-246e96729568
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 170708 13:22:50
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: ce522aa3-6398-11e7-945f-246e96729568:1,
d199707a-5287-11e7-8f35-246e96729f10:1215032565-1218510979
Executed_Gtid_Set: 30472597-527e-11e7-a93f-246e96729ec0:1-47,
d199707a-5287-11e7-8f35-246e96729f10:1-1215544882,
d23b9cfd-5287-11e7-a307-246e96729ff0:1-9
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
查詢具體報錯:
mysql>select * from performance_schema.replication_applier_status_by_worker \G
*************************** 1. row ***************************
CHANNEL_NAME:
WORKER_ID: 3
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: d199707a-5287-11e7-8f35-246e96729f10:1215544883
LAST_ERROR_NUMBER: 1399
LAST_ERROR_MESSAGE: Worker 3 failed executing transaction 'd199707a-5287-11e7-8f35-246e96729f10:1215544883' at master log mysql-bin.000001, end_log_pos 123; Error 'XAER_RMFAIL: The command cannot be executed when global transaction is in the IDLE state' on query. Default database: ''. Query: 'ROLLBACK'
LAST_ERROR_TIMESTAMP: 2017-07-08 13:22:50
1 rows in set (0.00 sec)
第一感覺很奇怪窗看,為什么會rollback失敗呢蹲坷?于是根據(jù)gtid去對應(yīng)的主庫binlog去看了下绿聘,并沒有任何rollback語句:
# at 379611076
#170707 20:28:38 server id 61481 end_log_pos 379611141 CRC32 0x65240fde GTID last_committed=512318 sequence_number=512319
SET @@SESSION.GTID_NEXT= 'd199707a-5287-11e7-8f35-246e96729f10:1215544883'/*!*/;
# at 379611141
#170707 20:28:38 server id 61481 end_log_pos 379611269 CRC32 0x19e0ec66 Query thread_id=758257 exec_time=59329 error_code=0
SET TIMESTAMP=1499430518/*!*/;
SET @@session.sql_mode=524288/*!*/;
XA START X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
# at 379611269
#170707 20:28:38 server id 61481 end_log_pos 379611392 CRC32 0xd37354d4 Table_map: `XXXX`.`XXXX` mapped to number 123
# at 379611392
#170707 20:28:38 server id 61481 end_log_pos 379611910 CRC32 0xec28e0a5 Update_rows: table id 123 flags: STMT_END_F
### UPDATE XXXX.XXXX
### WHERE
### @1=473756624 /* LONGINT meta=0 nullable=0 is_null=0 */
### @2='挽情1111喜歡的音樂' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */
### @3=338811606 /* LONGINT meta=0 nullable=0 is_null=0 */
### @4=1475019621016 /* LONGINT meta=0 nullable=1 is_null=0 */
### @5=1499148021048 /* LONGINT meta=0 nullable=1 is_null=0 */
### @6=0 /* LONGINT meta=0 nullable=1 is_null=0 */
### @7=114 /* INT meta=0 nullable=1 is_null=0 */
### SET
### @1=473756624 /* LONGINT meta=0 nullable=0 is_null=0 */
### @2='挽情1111喜歡的音樂' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */```
### @3=338811606 /* LONGINT meta=0 nullable=0 is_null=0 */
### @4=1475019621016 /* LONGINT meta=0 nullable=1 is_null=0 */
### @5=1499430518207 /* LONGINT meta=0 nullable=1 is_null=0 */
### @6=0 /* LONGINT meta=0 nullable=1 is_null=0 */
### @7=115 /* INT meta=0 nullable=1 is_null=0 */
# at 379611910```
#170707 20:28:38 server id 61481 end_log_pos 379612036 CRC32 0x8f8d4a35 Query thread_id=758257 exec_time=59329 error_code=0
SET TIMESTAMP=1499430518/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
XA END X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
# at 379612036
#170707 20:28:38 server id 61481 end_log_pos 379612096 CRC32 0xd4966095 XA PREPARE X'3230375f313533353431363835313835373138305f',X'353034',39187
XA PREPARE X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
# at 379614541
#170707 20:28:38 server id 61481 end_log_pos 379614687 CRC32 0x5273d888 Query thread_id=758257 exec_time=59329 error_code=0
SET TIMESTAMP=1499430518/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
XA COMMIT X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
看下本地的relay log爬凑,找到這個事務(wù)的gtid
# at 379611289
#170707 20:28:38 server id 61481 end_log_pos 379611141 CRC32 0x65240fde GTID last_committed=512318 sequence_number=512319
SET @@SESSION.GTID_NEXT= 'd199707a-5287-11e7-8f35-246e96729f10:1215544883'/*!*/;
# at 379611354
#170707 20:28:38 server id 61481 end_log_pos 379611269 CRC32 0x19e0ec66 Query thread_id=758257 exec_time=59329 error_code=0
SET TIMESTAMP=1499430518/*!*/;
SET @@session.sql_mode=524288/*!*/;
XA START X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
# at 379611482
#170707 20:28:38 server id 61481 end_log_pos 379611392 CRC32 0xd37354d4 Table_map: XXXX.XXXX mapped to number 123
# at 379611605
#170707 20:28:38 server id 61481 end_log_pos 379611910 CRC32 0xec28e0a5 Update_rows: table id 123 flags: STMT_END_F
BINLOG '
dn5fWRMp8AAAewAAAABpoBYAAHsAAAAAAAEADGRiX3BsYXlsaXN0MQAOTXVzaWNfUGxheWxpc3QA
KQgPCAgICAMDAwgPAwMIAw8IAQMPCAEPAQMIAQEDCAEBDwMIAQQDAwMIDf0C/QL9AgAMdwGAAQT6
////vwHUVHPT
dn5fWRgp8AAABgIAAAZroBYAAHsAAAAAAAEAKf///////////////wCECAAA/tDzPBwAAAAAGQDm
jL3mg4UxMTEx5Zac5qyi55qE6Z+z5LmQ1toxFAAAAACYQgVuVwEAADhtLwxdAQAAAAAAAAAAAABy
AAAAAAAAAAAAAADhyREAAABCAAAAAAAFAAAA3gv6G10BAAABAAAAgQEAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAHAAAAAAAAAAAAAAAAAAAAAAADhtLwxdAQAAAAAgAGIyYTcxOWE4NjdmZjNkNmJlZWI5
YTRlZjI1NGVjNDliAAAAAAAAAAAAAAAAAGYGrEQAAAAAAAAAAAAAAAAAAAAAAAAAAACECAAA/tDz
PBwAAAAAGQDmjL3mg4UxMTEx5Zac5qyi55qE6Z+z5LmQ1toxFAAAAACYQgVuVwEAAL/9BR1dAQAA
AAAAAAAAAABzAAAAAAAAAAAAAADhyREAAABCAAAAAAAFAAAAv/0FHV0BAAABAAAAgQEAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAHEAAAAAAAAAAAAAAAAAAAAAAL/9BR1dAQAAAAAgAGIyYTcxOWE4Njdm
ZjNkNmJlZWI5YTRlZjI1NGVjNDliAAAAAAAAAAAAAAAAAGYGrEQAAAAAAAAAAAAAAAAAAAAAAAAA
AKXgKOw=
'/*!*/;
### UPDATE XXXX.XXXX
### WHERE
### @1=473756624 /* LONGINT meta=0 nullable=0 is_null=0 */
### @2='挽情1111喜歡的音樂' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */
### @3=338811606 /* LONGINT meta=0 nullable=0 is_null=0 */
### @4=1475019621016 /* LONGINT meta=0 nullable=1 is_null=0 */
### @5=1499148021048 /* LONGINT meta=0 nullable=1 is_null=0 */
### @6=0 /* LONGINT meta=0 nullable=1 is_null=0 */
### @7=114 /* INT meta=0 nullable=1 is_null=0 */
### SET
### @1=473756624 /* LONGINT meta=0 nullable=0 is_null=0 */
### @2='挽情1111喜歡的音樂' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */
### @3=338811606 /* LONGINT meta=0 nullable=0 is_null=0 */
### @4=1475019621016 /* LONGINT meta=0 nullable=1 is_null=0 */
### @5=1499430518207 /* LONGINT meta=0 nullable=1 is_null=0 */
### @6=0 /* LONGINT meta=0 nullable=1 is_null=0 */
### @7=115 /* INT meta=0 nullable=1 is_null=0 */
# at 379612123
#170707 20:28:38 server id 61481 end_log_pos 379612036 CRC32 0x8f8d4a35 Query thread_id=758257 exec_time=59329 error_code=0
SET TIMESTAMP=1499430518/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
XA END X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
# at 379612249
#170708 13:18:53 server id 254331 end_log_pos 379612306 CRC32 0xf2c69619 Rotate to db-music1-relay-bin.000003 pos: 4
ROLLBACK /* added by mysqlbinlog */ /*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
到這里,這個relay log日志文件結(jié)束了冒黑。很顯然問題也找到了田绑,就是執(zhí)行
ROLLBACK /* added by mysqlbinlog */ /*!*/;
出錯了。
首先 我們看到 這個rollback是MySQL自己加上去的抡爹,那么為什么要加呢掩驱?
mysql為了保證一個事務(wù)只在一個binlog里,所以當Binlog或者relay log發(fā)生截斷時冬竟,最后一個事務(wù)要么commit欧穴,要么rollback,如果rollback泵殴,那么下一個binlog或者relay log會把這個事務(wù)重做一遍涮帘,保證這個事務(wù)不會丟。
由于xa事務(wù)無法直接rollback笑诅,而需要xa rollback ‘XXX’调缨,所以復制就停了。
怎么修復吆你?是不是直接跳過這個rollback就行了弦叶?
我們來試一下,跳過這個rollback:
STOP SLAVE;
SET GTID_NEXT="d199707a-5287-11e7-8f35-246e96729f10:1215544883";
BEGIN; COMMIT;
SET GTID_NEXT="AUTOMATIC";
START SLAVE;
這里GTID_NEXT值不能用show slave status的里executed值妇多,得用具體報錯的停止的gtid
但是湾蔓,show slave status看到,還是有報錯:
Error 'XAER_NOTA: Unknown XID' on query. Default database: 'XXXX'. Query: 'XA COMMIT X'3230375f313533353431363835313835373138305f',X'353034',39187'
為什么又報這個事務(wù)commit找不到XID呢砌梆?
之前說過默责,在relaylog截斷的時候,如果事務(wù)沒有commit,會自動在最后加rollback咸包,在下一個relay log開始的時候重新做一次這個事務(wù)桃序,按理說我們跳過這個rollback,在下個relaylog會被重做烂瘫,為啥會在commit的時候找不到xid呢媒熊?
我們看到我們跳過的gtid原來就是重做這個事務(wù)到PREPARE階段的gtid,原來坟比,rollback是沒有g(shù)tid的芦鳍,所以我們實際上就是把這個事務(wù)到PREPARE階段的gtid給跳過了,commit的時候肯定會找不到xid葛账,接著怎么修復柠衅?
- 手動把這個事務(wù)執(zhí)行一遍,手動commit籍琳。
- 指定 ‘/!/;’為結(jié)束符菲宴,每個binlog開頭都有
BINLOG ' 8HFvWQ+8fQQAdwAAAHsAAAAAAAQANS43LjE4LTE1LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA AUvdRu4= '/*!*/;
- 設(shè)置gtid_next模式
SET @@SESSION.GTID_NEXT= 'AUTOMATIC’ ;
- 開始事務(wù)
XA START X'3230375f313533353431363835313835373138305f',X'353034',39187 /*!*/;
- 事務(wù)內(nèi)容
BINLOG ' dn5fWRMp8AAAewAAAABpoBYAAHsAAAAAAAEADGRiX3BsYXlsaXN0MQAOTXVzaWNfUGxheWxpc3QA KQgPCAgICAMDAwgPAwMIAw8IAQMPCAEPAQMIAQEDCAEBDwMIAQQDAwMIDf0C/QL9AgAMdwGAAQT6 ////vwHUVHPT dn5fWRgp8AAABgIAAAZroBYAAHsAAAAAAAEAKf///////////////wCECAAA/tDzPBwAAAAAGQDm jL3mg4UxMTEx5Zac5qyi55qE6Z+z5LmQ1toxFAAAAACYQgVuVwEAADhtLwxdAQAAAAAAAAAAAABy AAAAAAAAAAAAAADhyREAAABCAAAAAAAFAAAA3gv6G10BAAABAAAAgQEAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAHAAAAAAAAAAAAAAAAAAAAAAADhtLwxdAQAAAAAgAGIyYTcxOWE4NjdmZjNkNmJlZWI5 YTRlZjI1NGVjNDliAAAAAAAAAAAAAAAAAGYGrEQAAAAAAAAAAAAAAAAAAAAAAAAAAACECAAA/tDz PBwAAAAAGQDmjL3mg4UxMTEx5Zac5qyi55qE6Z+z5LmQ1toxFAAAAACYQgVuVwEAAL/9BR1dAQAA AAAAAAAAAABzAAAAAAAAAAAAAADhyREAAABCAAAAAAAFAAAAv/0FHV0BAAABAAAAgQEAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAHEAAAAAAAAAAAAAAAAAAAAAAL/9BR1dAQAAAAAgAGIyYTcxOWE4Njdm ZjNkNmJlZWI5YTRlZjI1NGVjNDliAAAAAAAAAAAAAAAAAGYGrEQAAAAAAAAAAAAAAAAAAAAAAAAA AKXgKOw= '/*!*/;
- 結(jié)束 & prepare事務(wù)
XA END X'3230375f313533353431363835313835373138305f',X'353034',39187 /*!*/; XA PREPARE X'3230375f313533353431363835313835373138305f',X'353034',39187 /*!*/;
- 提交事務(wù)
XA COMMIT X'3230375f313533353431363835313835373138305f',X'353034',39187 /*!*/;
- 指定 ‘/!/;’為結(jié)束符菲宴,每個binlog開頭都有
- 跳過commit這個gtid
- start slave
這樣贷祈,才算是完整的修復了這個主從復制,并且數(shù)據(jù)保持一致喝峦。
為什么這是5.7的一個bug呢势誊?5.7之前的版本因為relaylog被截斷并不會出現(xiàn)這個bug。
5.7對xa事務(wù)的binlog記錄方式做了修改谣蠢,把 xa start,xa end,xa prepare放到一個event里粟耻,xa commit又是另外一個event。而在之前的MySQL版本中眉踱,整個xa事務(wù)從start到commit都是在一個event中挤忙,所以其他版本并沒有問題。
最后一個問題:5.7為啥要把xa事務(wù)拆成兩個event勋锤?簡單的講是為了數(shù)據(jù)安全性。
5.5或者5.6假設(shè)下面一個場景:
- 主庫執(zhí)行 xa事務(wù)直到prepare
- kill主庫侥祭。
- 拉起主庫叁执,再xa commit停機前的事務(wù)
主庫提交成功,修改的數(shù)據(jù)也是沒問題的矮冬,再去從庫谈宛,會發(fā)現(xiàn),從庫的數(shù)據(jù)并沒有變化胎署。WHY吆录?
MySQL在某個分布式事務(wù)prepare成功后宕機,宕機前操作該事務(wù)的連接并沒有斷開(如果在宕機前斷開連接琼牧,事務(wù)會被MySQL自動回滾)恢筝,這個時候已經(jīng)prepare的事務(wù)并不會被回滾,所以在MySQL重新啟動后巨坊,引擎層通過recover機制能恢復該事務(wù)撬槽。當然該事務(wù)的Binlog已經(jīng)在宕機過程中被丟失,這個時候趾撵,如果去提交侄柔,則會造成主從數(shù)據(jù)的不一致, 即提交沒有記錄Binlog占调,從上丟失該條數(shù)據(jù)暂题。
正因為5.7之前版本的xa事務(wù)存在這個bug,5.7后做了修復究珊。從XA START到XA PREPARE之間的操作都被記錄到了Master的Binlog中薪者,然后通過復制關(guān)系傳到了Slave上。也就是說5.7開始剿涮,MySQL對于XA事務(wù)啸胧,在prepare的時候就完成了寫B(tài)inlog的操作赶站,通過新增一種叫 XA_prepare_log_event 的event類型來實現(xiàn)。
其實 MySQL5.7在xa事務(wù)上遠不止這個bug纺念,后面再來慢慢總結(jié)贝椿。