zookeeper集群為了保證數(shù)據(jù)一致性辰妙,使用了兩階段提交鹰祸。
在zookeeper集群的角色有:leader、follower密浑、observer蛙婴。
在這幾個(gè)角色中處理讀寫(xiě)請(qǐng)求是不同的:
讀請(qǐng)求:從當(dāng)前節(jié)點(diǎn)直接讀取數(shù)據(jù)
寫(xiě)請(qǐng)求:在leader直接進(jìn)行兩階段提交、在非leader則是把請(qǐng)求轉(zhuǎn)交給leader處理
所以尔破,分析兩階段提交就是分析集群模式下的請(qǐng)求處理街图。在單機(jī)模式在請(qǐng)求處理是經(jīng)過(guò)RequestProcessor請(qǐng)求處理鏈處理。
單個(gè)zookeeprt請(qǐng)求處理主要有以下幾步:
1懒构、對(duì)當(dāng)前請(qǐng)求生成日志txn
2餐济、持久化日志txn
3、根據(jù)日志txn更新Database
兩階段提交(2PC)步驟:
lead節(jié)點(diǎn)請(qǐng)求處理鏈構(gòu)建
其中標(biāo)綠色的PrepRequestProcessor胆剧、SyncRequestProcessor絮姆、CommitProcessor都繼承了ZooKeeperCriticalThread是一個(gè)線(xiàn)程。
org.apache.zookeeper.server.quorum.LeaderZooKeeperServer#setupRequestProcessors
org.apache.zookeeper.server.quorum.ProposalRequestProcessor#ProposalRequestProcessor
ProposalRequestProcessor中包含SyncRequestProcessor和AckRequestProcessor
LeaderRequestProcessor----->PrepRequestProcessor---->ProposalRequestProcessor(SyncRequestProcessor--->AckRequestProcessor)----->CommitProcessor--->Leader.ToBeAppliedRequestProcessor---->FinalRequestProcessor
1秩霍、LeaderRequestProcessor
org.apache.zookeeper.server.quorum.LeaderRequestProcessor#processRequest
①篙悯、檢查是不是local session本地session,創(chuàng)建臨時(shí)節(jié)點(diǎn)會(huì)升級(jí)session
org.apache.zookeeper.server.quorum.QuorumZooKeeperServer#checkUpgradeSession
②铃绒、交給下一個(gè)請(qǐng)求處理器處理
2鸽照、PrepRequestProcessor
作用與單機(jī)模式相同,給請(qǐng)求Request的Hdr和Txn賦值颠悬,然后交給下一個(gè)請(qǐng)求處理器處理
3矮燎、ProposalRequestProcessor(2PC提交協(xié)議)
如果是寫(xiě)請(qǐng)求(request.getHdr() != null),則會(huì)把當(dāng)前請(qǐng)求封裝為協(xié)議并發(fā)送給follower椿疗。發(fā)送之后交給SyncRequestProcessor持久化處理
org.apache.zookeeper.server.quorum.Leader#propose
org.apache.zookeeper.server.quorum.Leader#sendPacket
發(fā)送到所有其他Followe節(jié)點(diǎn)forwardingFollowers
org.apache.zookeeper.server.quorum.LearnerHandler#queuePacket
添加到LearnerHandler的queuedPackets隊(duì)列中
org.apache.zookeeper.server.quorum.LearnerHandler#sendPackets
3.1漏峰、SyncRequestProcessor(2PC持久化)
把請(qǐng)求放入到queuedRequests阻塞隊(duì)列
①糠悼、對(duì)請(qǐng)求進(jìn)行持久化與單機(jī)相同
org.apache.zookeeper.server.SyncRequestProcessor#run
②届榄、交給下一個(gè)AckRequestProcessor處理
3.2、AckRequestProcessor(兩階段提交leader端處理)
向lead發(fā)送自己的ack(2PC發(fā)送ACK)
org.apache.zookeeper.server.quorum.Leader#processAck
org.apache.zookeeper.server.quorum.Leader#tryToCommit
org.apache.zookeeper.server.quorum.Leader#commit
創(chuàng)建一個(gè)Leader.COMMIT數(shù)據(jù)包并發(fā)送所有Follower節(jié)點(diǎn)
org.apache.zookeeper.server.quorum.Leader#inform
創(chuàng)建一個(gè)INFORM通知包發(fā)送給所有觀(guān)察者Observer節(jié)點(diǎn)
org.apache.zookeeper.server.quorum.CommitProcessor#commit
提交當(dāng)前請(qǐng)求倔喂,放入到committedRequests铝条,最終會(huì)更新database
4靖苇、CommitProcessor
CommitProcessor類(lèi)參數(shù):
queuedRequests:表示接收到的請(qǐng)求,沒(méi)有進(jìn)行兩階段的提交
queuedWriteRequests:表示接收到的寫(xiě)請(qǐng)求班缰,沒(méi)有進(jìn)行兩階段的提交
committedRequests:表示可以提交的請(qǐng)求贤壁,在兩階段驗(yàn)證過(guò)半之后進(jìn)行會(huì)在本地進(jìn)行committe操作,便添加到這個(gè)隊(duì)列
commitIsWaiting:表示存在可以提交的請(qǐng)求(committedRequests是否有值埠忘,有true)
pendingRequests:是一個(gè)map集合脾拆,表示每個(gè)客戶(hù)端sessionId的請(qǐng)求
Leader類(lèi)參數(shù):
outstandingProposals:表示記錄提議的請(qǐng)求的隊(duì)列,符合過(guò)半機(jī)制之后會(huì)移除
toBeApplied:表示記錄待生效的請(qǐng)求莹妒,在FinalRequestProcessor移除
①名船、processRequest
org.apache.zookeeper.server.quorum.CommitProcessor#processRequest
首先判斷是否需要兩階段提交。如果需要?jiǎng)t會(huì)添加到queuedWriteRequests隊(duì)列
org.apache.zookeeper.server.quorum.CommitProcessor#needCommit
如果是更改操作則返回true
②旨怠、CommitProcessor#run
CommitProcessor是一個(gè)線(xiàn)程最主要的是運(yùn)行run方法
org.apache.zookeeper.server.quorum.CommitProcessor#run
a渠驼、commitIsWaiting和requestsToProcess獲取
首先獲取commitIsWaiting是否有待提交的(committedRequests有值返回true),requestsToProcess待處理的請(qǐng)求大小
b鉴腻、wait()等待
如果queuedRequests和committedRequests沒(méi)有數(shù)據(jù)則會(huì)wait();等待
c迷扇、pendingRequests
這里表示:如果需要提交,則會(huì)直接放入到pendingRequests集合中爽哎。如果是個(gè)讀操作蜓席,則會(huì)查看當(dāng)前請(qǐng)求的sessionId是否存在pendingRequests集合,如果存在繼續(xù)添加到pendingRequests集合课锌。如果都不符合瓮床,說(shuō)明是一個(gè)客戶(hù)端的讀請(qǐng)求,直接交給下一個(gè)sendToNextProcessor(request);處理
d产镐、然后隘庄,再看一下這個(gè)while的退出條件。
①癣亚、從queuedRequests取出的是空
②丑掺、如果queuedRequests數(shù)據(jù)不為空,那么requestsToProcess是大于0的述雾。這時(shí)只有maxReadBatchSize < 0或readsProcessed <= maxReadBatchSize才能退出街州。
maxReadBatchSize < 0表示默認(rèn)是-1,如果配置了這個(gè)參數(shù)當(dāng)連續(xù)讀了readsProcessed時(shí)玻孟,也會(huì)退出唆缴。
③、pendingRequests和committedRequests不為空
e黍翎、commitIsWaiting有待提交的
從committedRequests取出請(qǐng)求面徽,while循環(huán)處理寫(xiě)請(qǐng)求
從pendingRequests集合獲取此客戶(hù)端sessionId的等待集合sessionQueue(可能會(huì)有讀寫(xiě))
一個(gè)pendingRequests可能會(huì)存這樣的數(shù)據(jù),一個(gè)客戶(hù)端發(fā)送這樣一系列命令: sessionQueue ={讀、讀趟紊、寫(xiě)氮双、寫(xiě)}
把第一個(gè)請(qǐng)求重新賦值給topPending
把當(dāng)前請(qǐng)求放入到queuesToDrain,把此時(shí)請(qǐng)求從committedRequests移除霎匈,把提交的數(shù)量commitsProcessed加1戴差,把commitsToProcess=maxCommitBatchSize提交處理寫(xiě)的減1,這里為了退出while循環(huán)while (commitIsWaiting && !stopped && commitsToProcess > 0) 铛嘱。最后調(diào)用processWrite方法處理這個(gè)寫(xiě)請(qǐng)求交給下個(gè)處理器處理
f暖释、queuesToDrain
這里是與commitsToProcess結(jié)合,commitIsWaiting表示還有待提交的墨吓,在處理commitsToProcess個(gè)寫(xiě)請(qǐng)求之后退出了饭入,在queuesToDrain中再優(yōu)先處理一部分讀
5、ToBeAppliedRequestProcessor
org.apache.zookeeper.server.quorum.Leader.ToBeAppliedRequestProcessor#processRequest
刪除toBeApplied
follower節(jié)點(diǎn)請(qǐng)求處理鏈構(gòu)建
其中標(biāo)綠色的FollowerRequestProcessor肛真、CommitProcessor谐丢、SyncRequestProcessor都繼承了ZooKeeperCriticalThread是一個(gè)線(xiàn)程。
org.apache.zookeeper.server.quorum.FollowerZooKeeperServer#setupRequestProcessors
開(kāi)了兩條鏈:
FollowerRequestProcessor(firstProcessor)---->CommitProcessor----->FinalRequestProcessor
SyncRequestProcessor---->SendAckRequestProcessor
1蚓让、FollowerRequestProcessor
org.apache.zookeeper.server.quorum.FollowerRequestProcessor#processRequest
請(qǐng)求添加到queuedRequests隊(duì)列
FollowerRequestProcessor是一個(gè)線(xiàn)程乾忱,會(huì)從queuedRequests獲取請(qǐng)求
org.apache.zookeeper.server.quorum.FollowerRequestProcessor#run
org.apache.zookeeper.server.quorum.Learner#request
創(chuàng)建請(qǐng)求轉(zhuǎn)發(fā)給lead節(jié)點(diǎn)處理
createSession和closeSession也會(huì)轉(zhuǎn)發(fā)給lead節(jié)點(diǎn)處理
2、SendAckRequestProcessor
org.apache.zookeeper.server.quorum.SendAckRequestProcessor#processRequest
在用SendAckRequestProcessor處理之前會(huì)先調(diào)用SyncRequestProcessor進(jìn)行持久化處理历极,由于與單機(jī)或lead處理相同就不單獨(dú)列出來(lái)了窄瘟。
向領(lǐng)導(dǎo)者發(fā)送確認(rèn)ack包
org.apache.zookeeper.server.quorum.Learner#writePacket
org.apache.zookeeper.server.quorum.Learner#writePacketNow
LearnerHandler轉(zhuǎn)發(fā)請(qǐng)求
在經(jīng)過(guò)FollowerRequestProcessor處理后,lead端會(huì)得到一個(gè)Request的請(qǐng)求
org.apache.zookeeper.server.quorum.LearnerHandler#run
org.apache.zookeeper.server.quorum.Leader#submitLearnerRequest
org.apache.zookeeper.server.quorum.LeaderZooKeeperServer#submitLearnerRequest
轉(zhuǎn)發(fā)到leader的prepRequestProcessor
在連接Follower節(jié)點(diǎn)的客戶(hù)端發(fā)送更改命令請(qǐng)求會(huì)轉(zhuǎn)發(fā)到leader節(jié)點(diǎn)的prepRequestProcessor進(jìn)行處理
兩階段提交Follower端處理
1趟卸、run
org.apache.zookeeper.server.quorum.QuorumPeer#run
2蹄葱、followLeader
org.apache.zookeeper.server.quorum.Follower#followLeader
不斷讀取從lead端的數(shù)據(jù)包
①、Follower接收到PROPOSAL協(xié)議命令請(qǐng)求
org.apache.zookeeper.server.quorum.Follower#processPacket
org.apache.zookeeper.server.quorum.FollowerZooKeeperServer#logRequest
調(diào)用到SyncRequestProcessor處理器處理锄列、SyncRequestProcessor處理完之后便交給SendAckRequestProcessor處理器處理發(fā)送ACK數(shù)據(jù)包
②图云、Follower接收到commit命令請(qǐng)求
org.apache.zookeeper.server.quorum.FollowerZooKeeperServer#commit
調(diào)用到commitProcessor處理器處理把請(qǐng)求添加到committedRequests隊(duì)列,處理完之后會(huì)交給FinalRequestProcessor處理器處理邻邮,這樣在連接Follower客戶(hù)端的更改操作也會(huì)有數(shù)據(jù)返回
observer節(jié)點(diǎn)請(qǐng)求處理鏈構(gòu)建
其中標(biāo)綠色的ObserverRequestProcessor竣况、CommitProcessor、SyncRequestProcessor都繼承了ZooKeeperCriticalThread是一個(gè)線(xiàn)程筒严。
org.apache.zookeeper.server.quorum.ObserverZooKeeperServer#setupRequestProcessors
也是開(kāi)了兩條鏈:
ObserverRequestProcessor(firstProcessor)---->CommitProcessor----->FinalRequestProcessor
SyncRequestProcessor---->null
observer節(jié)點(diǎn)不參與兩階段提交丹泉,所以同步SyncRequestProcessor之后沒(méi)有ACK確認(rèn)提交。這樣既提高了讀效率鸭蛙,又對(duì)寫(xiě)效率沒(méi)有影響摹恨。請(qǐng)求處理鏈與leader、follower的功能相同不再累述娶视。
總結(jié):
zookeeper集群的兩階段提交晒哄,是在寫(xiě)操作的情況下發(fā)生的。2PC的整體實(shí)現(xiàn)邏輯是在RequestProcessor請(qǐng)求處理鏈處理的。只有在接受到的ACK超過(guò)一半才會(huì)進(jìn)行提交揩晴,提交的實(shí)現(xiàn)邏輯是在CommitProcessor中實(shí)現(xiàn)的勋陪,CommitProcessor處理器中里面涉及多種集合贪磺、隊(duì)列等參數(shù)(需要首先了解這些參數(shù)意義硫兰,然后再讀CommitProcessor源碼)。