MySQL binlog 組提交與 XA(兩階段提交)[轉(zhuǎn)載]

轉(zhuǎn)載:https://www.cnblogs.com/DataArt/p/10083617.html
1. XA-2PC (two phase commit, 兩階段提交 )

XA是由X/Open組織提出的分布式事務(wù)的規(guī)范(X代表transaction; A代表accordant?)罗侯。XA規(guī)范主要定義了(全局)事務(wù)管理器(TM: Transaction Manager)和(局部)資源管理器(RM: Resource Manager)之間的接口钞馁。XA為了實(shí)現(xiàn)分布式事務(wù)携龟,將事務(wù)的提交分成了兩個(gè)階段:也就是2PC (tow phase commit),XA協(xié)議就是通過將事務(wù)的提交分為兩個(gè)階段來實(shí)現(xiàn)分布式事務(wù)穿仪。

1.1 prepare 階段:

第一階段,事務(wù)管理器向所有涉及到的數(shù)據(jù)庫服務(wù)器發(fā)出prepare"準(zhǔn)備提交"請(qǐng)求,數(shù)據(jù)庫收到請(qǐng)求后執(zhí)行數(shù)據(jù)修改和日志記錄等處理筐付,處理完成后只是把事務(wù)的狀態(tài)改成"可以提交",然后把結(jié)果返回給事務(wù)管理器。

1.2 commit 階段:

事務(wù)管理器收到回應(yīng)后進(jìn)入第二階段颓哮,如果在第一階段內(nèi)有任何一個(gè)數(shù)據(jù)庫的操作發(fā)生了錯(cuò)誤家妆,或者事務(wù)管理器收不到某個(gè)數(shù)據(jù)庫的回應(yīng),則認(rèn)為事務(wù)失敗冕茅,回撤所有數(shù)據(jù)庫的事務(wù)伤极。數(shù)據(jù)庫服務(wù)器收不到第二階段的確認(rèn)提交請(qǐng)求,也會(huì)把"可以提交"的事務(wù)回撤姨伤。如果第一階段中所有數(shù)據(jù)庫都提交成功哨坪,那么事務(wù)管理器向數(shù)據(jù)庫服務(wù)器發(fā)出"確認(rèn)提交"請(qǐng)求,數(shù)據(jù)庫服務(wù)器把事務(wù)的"可以提交"狀態(tài)改為"提交完成"狀態(tài)乍楚,然后返回應(yīng)答当编。

2. MySQL 中的XA實(shí)現(xiàn)

Support for XA transactions is available for the InnoDB storage engine. The MySQL XA implementation is based on the X/Open CAE document Distributed Transaction Processing: The XA Specification.

Currently, among the MySQL Connectors, MySQL Connector/J 5.0.0 and higher supports XA directly, by means of a class interface that handles the XA SQL statement interface for you.

XA supports distributed transactions, that is, the ability to permit multiple separate transactional resources to participate in a global transaction. Transactional resources often are RDBMSs but may be other kinds of resources.

A global transaction involves several actions that are transactional in themselves, but that all must either complete successfully as a group, or all be rolled back as a group. In essence, this extends ACID properties “up a level” so that multiple ACID transactions can be executed in concert as components of a global operation that also has ACID properties. (However, for a distributed transaction, you must use the SERIALIZABLE isolation level to achieve ACID properties. It is enough to use REPEATABLE READ for a nondistributed transaction, but not for a distributed transaction.)

最重要的一點(diǎn):使用MySQL中的XA實(shí)現(xiàn)分布式事務(wù)時(shí)必須使用serializable隔離級(jí)別。

The MySQL implementation of XA MySQL enables a MySQL server to act as a Resource Manager that handles XA transactions within a global transaction. A client program that connects to the MySQL server acts as the Transaction Manager.

The process for executing a global transaction uses two-phase commit (2PC). This takes place after the actions performed by the branches of the global transaction have been executed.

  1. In the first phase, all branches are prepared. That is, they are told by the TM to get ready to commit. Typically, this means each RM that manages a branch records the actions for the branch in stable storage. The branches indicate whether they are able to do this, and these results are used for the second phase.

  2. In the second phase, the TM tells the RMs whether to commit or roll back. If all branches indicated when they were prepared that they will be able to commit, all branches are told to commit. If any branch indicated when it was prepared that it will not be able to commit, all branches are told to roll back.

第一階段:為prepare階段徒溪,TM向RM發(fā)出prepare指令忿偷,RM進(jìn)行操作金顿,然后返回成功與否的信息給TM;

第二階段:為事務(wù)提交或者回滾階段鲤桥,如果TM收到所有RM的成功消息揍拆,則TM向RM發(fā)出提交指令;不然則發(fā)出回滾指令茶凳;

XA transaction support is limited to the InnoDB storage engine.(只有innodb支持XA分布式事務(wù))

For "external XA" a MySQL server acts as a Resource Manager and client programs act as Transaction Managers. For "Internal XA", storage engines within a MySQL server act as RMs, and the server itself acts as a TM. Internal XA support is limited by the capabilities of individual storage engines. Internal XA is required for handling XA transactions that involve more than one storage engine. The implementation of internal XA requires that a storage engine support two-phase commit at the table handler level, and currently this is true only for InnoDB.

MySQL中的XA實(shí)現(xiàn)分為:外部XA和內(nèi)部XA嫂拴;前者是指我們通常意義上的分布式事務(wù)實(shí)現(xiàn);后者是指單臺(tái)MySQL服務(wù)器中贮喧,Server層作為TM(事務(wù)協(xié)調(diào)者)筒狠,而服務(wù)器中的多個(gè)數(shù)據(jù)庫實(shí)例作為RM,而進(jìn)行的一種分布式事務(wù)箱沦,也就是MySQL跨庫事務(wù)辩恼;也就是一個(gè)事務(wù)涉及到同一條MySQL服務(wù)器中的兩個(gè)innodb數(shù)據(jù)庫(因?yàn)槠渌娌恢С諼A)。

3. 內(nèi)部XA的額外功能

XA 將事務(wù)的提交分為兩個(gè)階段谓形,而這種實(shí)現(xiàn)运挫,解決了 binlog 和 redo log的一致性問題,這就是MySQL內(nèi)部XA的第三種功能套耕。

MySQL為了兼容其它非事物引擎的復(fù)制谁帕,在server層面引入了 binlog, 它可以記錄所有引擎中的修改操作,因而可以對(duì)所有的引擎使用復(fù)制功能冯袍;MySQL在4.x 的時(shí)候放棄redo的復(fù)制策略而引入binlog的復(fù)制(淘寶丁奇)匈挖。

但是引入了binlog,會(huì)導(dǎo)致一個(gè)問題——binlog和redo log的一致性問題:一個(gè)事務(wù)的提交必須寫redo log和binlog康愤,那么二者如何協(xié)調(diào)一致呢儡循?事務(wù)的提交以哪一個(gè)log為標(biāo)準(zhǔn)?如何判斷事務(wù)提交征冷?事務(wù)崩潰恢復(fù)如何進(jìn)行择膝?

MySQL通過兩階段提交(內(nèi)部XA的兩階段提交)很好地解決了這一問題:

第一階段:InnoDB prepare,持有prepare_commit_mutex检激,并且write/sync redo log肴捉; 將回滾段設(shè)置為Prepared狀態(tài),binlog不作任何操作叔收;

第二階段:包含兩步齿穗,1> write/sync Binlog; 2> InnoDB commit (寫入COMMIT標(biāo)記后釋放prepare_commit_mutex)饺律;

以 binlog 的寫入與否作為事務(wù)提交成功與否的標(biāo)志窃页,innodb commit標(biāo)志并不是事務(wù)成功與否的標(biāo)志。因?yàn)榇藭r(shí)的事務(wù)崩潰恢復(fù)過程如下:

1> 崩潰恢復(fù)時(shí),掃描最后一個(gè)Binlog文件脖卖,提取其中的xid乒省;
2> InnoDB維持了狀態(tài)為Prepare的事務(wù)鏈表,將這些事務(wù)的xid和Binlog中記錄的xid做比較畦木,如果在Binlog中存在作儿,則提交,否則回滾事務(wù)馋劈。

通過這種方式,可以讓InnoDB和Binlog中的事務(wù)狀態(tài)保持一致晾嘶。如果在寫入innodb commit標(biāo)志時(shí)崩潰妓雾,則恢復(fù)時(shí),會(huì)重新對(duì)commit標(biāo)志進(jìn)行寫入垒迂;

在prepare階段崩潰械姻,則會(huì)回滾,在write/sync binlog階段崩潰机断,也會(huì)回滾楷拳。這種事務(wù)提交的實(shí)現(xiàn)是MySQL5.6之前的實(shí)現(xiàn)。

4. binlog 組提交

上面的事務(wù)的兩階段提交過程是5.6之前版本中的實(shí)現(xiàn)吏奸,有嚴(yán)重的缺陷欢揖。當(dāng)sync_binlog=1時(shí),很明顯上述的第二階段中的 write/sync binlog會(huì)成為瓶頸奋蔚,而且還是持有全局大鎖(prepare_commit_mutex: prepare 和 commit共用一把鎖)她混,這會(huì)導(dǎo)致性能急劇下降。解決辦法就是MySQL5.6中的 binlog組提交泊碑。

4.1 MySQL5.6中的binlog group commit:

Binlog Group Commit的過程拆分成了三個(gè)階段:

1> flush stage 將各個(gè)線程的binlog從cache寫到文件中;

2> sync stage 對(duì)binlog做fsync操作(如果需要的話坤按;最重要的就是這一步,對(duì)多個(gè)線程的binlog合并寫入磁盤)馒过;

3> commit stage**** 為各個(gè)線程做引擎層的事務(wù)commit(這里不用寫redo log臭脓,在prepare階段已寫)。每個(gè)stage同時(shí)只有一個(gè)線程在操作腹忽。(分成三個(gè)階段来累,每個(gè)階段的任務(wù)分配給一個(gè)專門的線程,這是典型的并發(fā)優(yōu)化**)

這種實(shí)現(xiàn)的優(yōu)勢(shì)在于三個(gè)階段可以并發(fā)執(zhí)行窘奏,從而提升效率佃扼。注意prepare階段沒有變,還是write/sync redo log.

(另外:5.7中引入了MTS:多線程slave復(fù)制蔼夜,也是通過binlog組提交實(shí)現(xiàn)的兼耀,在binlog組提交時(shí),給每一個(gè)組提交打上一個(gè)seqno,然后在slave中就可以按照master中一樣按照seqno的大小順序瘤运,進(jìn)行事務(wù)組提交了窍霞。)

4.2 MySQL5.7中的binlog group commit:

淘寶對(duì)binlog group commit進(jìn)行了進(jìn)一步的優(yōu)化,其原理如下:

從XA恢復(fù)的邏輯我們可以知道拯坟,只要保證InnoDB Prepare的redo日志在寫B(tài)inlog前完成write/sync即可但金。因此我們對(duì)Group Commit的第一個(gè)stage的邏輯做了些許修改,大概描述如下:

Step1. InnoDB Prepare郁季,記錄當(dāng)前的LSN到thd中冷溃;
Step2. 進(jìn)入Group Commit的flush stage;Leader搜集隊(duì)列梦裂,同時(shí)算出隊(duì)列中最大的LSN似枕。
Step3. 將InnoDB的redo log write/fsync到指定的LSN (注:這一步就是redo log的組寫入。因?yàn)樾∮诘扔贚SN的redo log被一次性寫入到ib_logfile[0|1])
Step4. 寫B(tài)inlog并進(jìn)行隨后的工作(sync Binlog, InnoDB commit , etc)

也就是將 redo log的write/sync延遲到了 binlog group commit的 flush stage 之后年柠,sync binlog之前凿歼。

通過延遲寫redo log的方式,顯式的為redo log做了一次組寫入(redo log group write)冗恨,并減少了(redo log) log_sys->mutex的競(jìng)爭(zhēng)答憔。

也就是將 binlog group commit 對(duì)應(yīng)的redo log也進(jìn)行了 group write. 這樣binlog 和 redo log都進(jìn)行了優(yōu)化。

官方MySQL在5.7.6的代碼中引入了淘寶的優(yōu)化掀抹,對(duì)應(yīng)的Release Note如下:

When using InnoDB with binary logging enabled, concurrent transactions written in the InnoDB redo log are now grouped together before synchronizing to disk when innodb_flush_log_at_trx_commit is set to 1, which reduces the amount of synchronization operations. This can lead to improved performance.

5. XA參數(shù) innodb_support_xa

http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_support_xa

| Command-Line Format | --innodb_support_xa |
| System Variable | Name | [innodb_support_xa](http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_support_xa) |
| Variable Scope | Global, Session |
| Dynamic Variable | Yes |
| Permitted Values | Type | boolean |
| Default | TRUE |

Enables InnoDB support for two-phase commit(2PC) in XA transactions, causing an extra disk flush for transaction preparation. This setting is the default. The XA mechanism is used internally and is essential for any server that has its binary log turned on and is accepting changes to its data from more than one thread. If you turn it off, transactions can be written to the binary log in a different order from the one in which the live database is committing them. This can produce different data when the binary log is replayed in disaster recovery or on a replication slave. Do not turn it off on a replication master server unless you have an unusual setup where only one thread is able to change data.

For a server that is accepting data changes from only one thread, it is safe and recommended to turn off this option to improve performance forInnoDB tables. For example, you can turn it off on replication slaves where only the replication SQL thread is changing data.

You can also turn off this option if you do not need it for safe binary logging or replication, and you also do not use an external XA transaction manager.

參數(shù)innodb_support_xa默認(rèn)為true虐拓,表示啟用XA,雖然它會(huì)導(dǎo)致一次額外的磁盤flush(prepare階段flush redo log). 但是我們必須啟用傲武,而不能關(guān)閉它侯嘀。因?yàn)殛P(guān)閉會(huì)導(dǎo)致binlog寫入的順序和實(shí)際的事務(wù)提交順序不一致,會(huì)導(dǎo)致崩潰恢復(fù)和slave復(fù)制時(shí)發(fā)生數(shù)據(jù)錯(cuò)誤谱轨。如果啟用了log-bin參數(shù)戒幔,并且不止一個(gè)線程對(duì)數(shù)據(jù)庫進(jìn)行修改,那么就必須啟用innodb_support_xa參數(shù)土童。

掃描redo诗茎,對(duì)于checkpoint位置點(diǎn)之前的,這部分?jǐn)?shù)據(jù)已經(jīng)落盤了献汗,就不用管了敢订,就看處于prepare狀態(tài)的事物,拿到這些事物的Xid罢吃,如果都有{Xid--binlog filename position的對(duì)應(yīng)關(guān)系}就說明這部分事物是成功的楚午,如果只有xid,沒有binlog filename尿招,position矾柜,就去掃描最后一個(gè)binlog文件阱驾,看這個(gè)xid在這個(gè)文件里面有沒有,有的話怪蔑,就說明這個(gè)事物寫binlog是寫成功的里覆,只是還沒來得及回寫binlog filename以及position到redo log里面,數(shù)據(jù)庫就掛了缆瓣,所以會(huì)重新將其提交喧枷,如果沒有xid,就說明該事物寫binlog沒寫成功弓坞,再利用undo log將其回滾隧甚。

參考:

1. http://www.csdn.net/article/2015-01-16/2823591 (淘寶丁奇:怎么跳出MySQL的10個(gè)大坑)

  1. http://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_support_xa

3. http://dev.mysql.com/doc/refman/5.7/en/xa.html

4. http://dev.mysql.com/doc/refman/5.7/en/xa-restrictions.html

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市渡冻,隨后出現(xiàn)的幾起案子戚扳,更是在濱河造成了極大的恐慌,老刑警劉巖菩帝,帶你破解...
    沈念sama閱讀 207,113評(píng)論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異茬腿,居然都是意外死亡呼奢,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,644評(píng)論 2 381
  • 文/潘曉璐 我一進(jìn)店門切平,熙熙樓的掌柜王于貴愁眉苦臉地迎上來握础,“玉大人,你說我怎么就攤上這事悴品≠髯郏” “怎么了?”我有些...
    開封第一講書人閱讀 153,340評(píng)論 0 344
  • 文/不壞的土叔 我叫張陵苔严,是天一觀的道長定枷。 經(jīng)常有香客問我,道長届氢,這世上最難降的妖魔是什么欠窒? 我笑而不...
    開封第一講書人閱讀 55,449評(píng)論 1 279
  • 正文 為了忘掉前任,我火速辦了婚禮退子,結(jié)果婚禮上岖妄,老公的妹妹穿的比我還像新娘。我一直安慰自己寂祥,他們只是感情好荐虐,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,445評(píng)論 5 374
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著丸凭,像睡著了一般福扬。 火紅的嫁衣襯著肌膚如雪腕铸。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 49,166評(píng)論 1 284
  • 那天忧换,我揣著相機(jī)與錄音恬惯,去河邊找鬼。 笑死亚茬,一個(gè)胖子當(dāng)著我的面吹牛酪耳,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播刹缝,決...
    沈念sama閱讀 38,442評(píng)論 3 401
  • 文/蒼蘭香墨 我猛地睜開眼碗暗,長吁一口氣:“原來是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來了梢夯?” 一聲冷哼從身側(cè)響起言疗,我...
    開封第一講書人閱讀 37,105評(píng)論 0 261
  • 序言:老撾萬榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎颂砸,沒想到半個(gè)月后噪奄,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,601評(píng)論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡人乓,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,066評(píng)論 2 325
  • 正文 我和宋清朗相戀三年勤篮,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片色罚。...
    茶點(diǎn)故事閱讀 38,161評(píng)論 1 334
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡碰缔,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出戳护,到底是詐尸還是另有隱情金抡,我是刑警寧澤,帶...
    沈念sama閱讀 33,792評(píng)論 4 323
  • 正文 年R本政府宣布腌且,位于F島的核電站梗肝,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏铺董。R本人自食惡果不足惜统捶,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,351評(píng)論 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望柄粹。 院中可真熱鬧喘鸟,春花似錦、人聲如沸驻右。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,352評(píng)論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽堪夭。三九已至愕把,卻和暖如春拣凹,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背恨豁。 一陣腳步聲響...
    開封第一講書人閱讀 31,584評(píng)論 1 261
  • 我被黑心中介騙來泰國打工嚣镜, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人橘蜜。 一個(gè)月前我還...
    沈念sama閱讀 45,618評(píng)論 2 355
  • 正文 我出身青樓菊匿,卻偏偏與公主長得像,于是被迫代替她去往敵國和親计福。 傳聞我的和親對(duì)象是個(gè)殘疾皇子跌捆,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,916評(píng)論 2 344

推薦閱讀更多精彩內(nèi)容