Raft一致性算法偽碼詳解

State

/* Persistent state on all servers:
(Updated on stable storage before responding to RPCs)
*/
currentTerm                   //當(dāng)前任期   
votedFor                      //當(dāng)前任期的候選者編號(hào)痕囱，無(wú)則為null
log[]                         //日志條目

//Volatile state on all servers,所有服務(wù)器上維護(hù)
commitIndex             //已知的最高的可被提交的日志條目的索引娄柳，初始為0
lastApplied             //當(dāng)前已提交給state machine執(zhí)行的條目的索引，初始為0

//Volatile state on leaders:(Reinitialized after election)凡人，只在leader節(jié)點(diǎn)上維護(hù)
nextIndex[]          //對(duì)于每一臺(tái)服務(wù)器措伐，下一條將要發(fā)給該服務(wù)器的條目的索引线召，初始為leader最后一條條目索引+1
matchIndex[]         //每一個(gè)服務(wù)器已知的最高的已復(fù)制的條目的索引驱入，初始為0

RequestVote RPC

//Invoked by candidates to gather votes (§5.2).
Arguments:
term                      //候選者的term值
candidateId               //候選者的id
lastLogIndex              //候選者最新的日志索引
lastLogTerm               //候選者最新的日志所屬的term

Results:
term 
voteGranted                //true表示投票給該candidate

Receiver implementation:
1. Reply false if term < currentTerm 
//這里投票給候選者的條件是要求候選者的日志至少比自身的要新昆烁，也就是要么lastLogIndex比自身最新的日志條目index要大吊骤。
//要么lastLogIndex和lastLogTerm都和自身最新的日志條目一致
//這里對(duì)選舉的這種限制是為了保證安全性。確保commit的日志一定不會(huì)被重寫静尼。
2. If votedFor is null or candidateId, and candidate’s log is at least as up-to-date as receiver’s log, grant vote

AppendEntries RPC

//Invoked by leader to replicate log entries; also used as heartbeat
Arguments:
term                               //leader當(dāng)前的term值
leaderId                           //follower在收到client request時(shí)白粉，可以用該值轉(zhuǎn)發(fā)給leader
prevLogIndex                       //上一條日志條目的索引
prevLogTerm                        //上一條日志條目的term
entries[]                          //日志條目，對(duì)于心跳包則該值為空鼠渺，日志條目可以為多條
leaderCommit                       //leader服務(wù)器的commitIndex

Results:
term                          //當(dāng)前任期
success                       //具體的判斷如下

Receiver implementation:
//任期值比當(dāng)前任期小鸭巴，則該RPC已失效，或當(dāng)前l(fā)eader已變更
1. Reply false if term < currentTerm 
//不包含匹配prevLogTerm的prevLogIndex所對(duì)應(yīng)的條目拦盹，通常該情況為節(jié)點(diǎn)掛掉一段時(shí)間奕扣，落后leader節(jié)點(diǎn)
//leader會(huì)重新發(fā)包含較早的prevLogTerm及prevLogIndex的RPC給該節(jié)點(diǎn)
2. Reply false if log doesn’t contain an entry at prevLogIndex whose term matches prevLogTerm 
// 以下均返回true
// 若日志條目已有內(nèi)容與entries里的內(nèi)容沖突，則刪除已有及其后的條目
3. If an existing entry conflicts with a new one (same index but different terms), delete the existing entry and all that follow it 
// 將新的日志條目追加到日志中
4. Append any new entries not already in the log
//如果leaderCommit比自身commitIndex大掌敬，則更新自身的commitIndex為min(leaderCommit,當(dāng)前最新日志條目索引)
5. If leaderCommit > commitIndex, set commitIndex = min(leaderCommit, index of last new entry)

InstallSnapshot RPC

//Invoked by leader to send chunks of a snapshot to a follower. Leaders always send chunks in order.
Arguments:
term                               //leader的當(dāng)前term
leaderId                           //leader的id
lastIncludedIndex                  //該snapshot中包含的最大的日志的索引值
lastIncludedTerm                   //該snapshot中包含的最大的日志的所屬的term
offset                             //用來(lái)定位shapshot文件的偏移量惯豆，snapshot文件可能很大，要分幾次傳奔害，每次稱之為一個(gè)chunk
data[]                             //snapshot數(shù)據(jù)楷兽，通常為state machine的當(dāng)前狀態(tài)     
done                               //是否為最后一個(gè)chunk

Results:
term                               //currentTerm

Receiver implementation:
1. Reply immediately if term < currentTerm
//如果是第一個(gè)chunk，則新建snapshot文件
2. Create new snapshot file if first chunk (offset is 0)
//將data的數(shù)據(jù)寫入到snapshot的相應(yīng)位置
3. Write data into snapshot file at given offset
//如果done為false华临，則重復(fù)1-3過(guò)程芯杀，回復(fù)并等待最后一個(gè)chunk
4. Reply and wait for more data chunks if done is false
//保存snapshot文件，丟棄更老的snapshot
5. Save snapshot file, discard any existing or partial snapshot with a smaller index
// 已有的日志處理
6. If existing log entry has same index and term as snapshot’s last included entry, retain log entries following it and reply
// 丟棄老的日志
7. Discard the entire log
// 按照snapshot內(nèi)容重設(shè)state machine
8. Reset state machine using snapshot contents (and load snapshot’s cluster configuration)

Rules for Servers

All Servers:
// commitIndex > lastApplied,證明lastApplied到commitIndex之間的日志條目都可以提交給state machine執(zhí)行
? If commitIndex > lastApplied: increment lastApplied, apply log[lastApplied] to state machine 
// 若有新term雅潭，則更新自己的term值
? If RPC request or response contains term T > currentTerm: set currentTerm = T, convert to follower 

Followers:
//響應(yīng)leaders和candidates發(fā)來(lái)的RPC揭厚，響應(yīng)規(guī)則參照AppendEntries和RequestVote部分
? Respond to RPCs from candidates and leaders
// 一段時(shí)間內(nèi)，沒(méi)有收到AppendEntries或者RequestVote的消息扶供，則轉(zhuǎn)變?yōu)閏andidate
? If election timeout elapses without receiving AppendEntries RPC from current leader or granting vote to candidate: convert to candidate

Candidates :
//開啟選舉筛圆，增加自身term值，投票給自己椿浓，重設(shè)定時(shí)器太援，發(fā)送RequestVote給其他服務(wù)器
 ? On conversion to candidate, start election:
  ? Increment currentTerm
  ? Vote for self
  ? Reset election timer
  ? Send RequestVote RPCs to all other servers
//從多數(shù)成員收到true的回應(yīng)闽晦，則轉(zhuǎn)變?yōu)閘eader
? If votes received from majority of servers: become leader
//收到AppendEntries，證明新的leader已產(chǎn)生提岔，則自身轉(zhuǎn)變?yōu)閒ollower
? If AppendEntries RPC received from new leader: convert to follower
//如果超時(shí)仙蛉，則開啟下一輪選舉
? If election timeout elapses: start new election

Leaders:
//發(fā)送心跳包給所有服務(wù)器，防止其他服務(wù)器超時(shí)開啟新的選舉
? Upon election: send initial empty AppendEntries RPCs (heartbeat) to each server; repeat during idle periods to prevent election timeouts 
//收到客戶端請(qǐng)求碱蒙，則將條目寫入到日志荠瘪，當(dāng)條目提交之后再回復(fù)給客戶端
? If command received from client: append entry to local log, respond after entry applied to state machine 
//如果當(dāng)前的日志條目索引比f(wàn)ollower大（leader自身的last log index 與其nextIndex[]比較）,則發(fā)送AppendEntries給相應(yīng)follower
? If last log index ≥ nextIndex for a follower: send AppendEntries RPC with log entries starting at nextIndex
     //如果成功，則更新nextIndex數(shù)組及matchIndex數(shù)組中的follower對(duì)應(yīng)的項(xiàng)
    ? If successful: update nextIndex and matchIndex for follower 
    //如果因?yàn)槿罩静煌绞∪停瑒t減小該follower對(duì)應(yīng)的nextIndex值然后重試,
   //若相應(yīng)的nextIndex值減小到leader節(jié)點(diǎn)已經(jīng)進(jìn)行了snapshot哀墓，則leader會(huì)發(fā)送InstallSnapshot RPC
    ? If AppendEntries fails because of log inconsistency: decrement nextIndex and retry 
    //如果更新完matchIndex后，判斷下commitIndex是否可以更新坊秸，更新條件為新的值是多數(shù)同意的麸祷，且該條目的term為當(dāng)前term
    ? If there exists an N such that N > commitIndex, a majority of matchIndex[i] ≥ N, and log[N].term == currentTerm: set commitIndex = N.

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市褒搔，隨后出現(xiàn)的幾起案子阶牍，更是在濱河造成了極大的恐慌，老刑警劉巖星瘾，帶你破解...
沈念sama閱讀 218,858評(píng)論 6贊 508
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件走孽，死亡現(xiàn)場(chǎng)離奇詭異，居然都是意外死亡琳状，警方通過(guò)查閱死者的電腦和手機(jī)磕瓷，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,372評(píng)論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)念逞，“玉大人困食，你說(shuō)我怎么就攤上這事◆岢校” “怎么了硕盹？”我有些...
開封第一講書人閱讀 165,282評(píng)論 0贊 356
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長(zhǎng)叨咖。經(jīng)常有香客問(wèn)我瘩例，道長(zhǎng)，這世上最難降的妖魔是什么甸各？我笑而不...
開封第一講書人閱讀 58,842評(píng)論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任垛贤，我火速辦了婚禮，結(jié)果婚禮上趣倾，老公的妹妹穿的比我還像新娘聘惦。我一直安慰自己，他們只是感情好誊酌，可當(dāng)我...
茶點(diǎn)故事閱讀 67,857評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開白布部凑。她就那樣靜靜地躺著露乏，像睡著了一般碧浊。火紅的嫁衣襯著肌膚如雪涂邀。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,679評(píng)論 1贊 305
城市分裂傳說(shuō)
那天箱锐，我揣著相機(jī)與錄音比勉，去河邊找鬼。笑死驹止，一個(gè)胖子當(dāng)著我的面吹牛浩聋，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播臊恋，決...
沈念sama閱讀 40,406評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼衣洁，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼！你這毒婦竟也來(lái)了抖仅？” 一聲冷哼從身側(cè)響起坊夫，我...
開封第一講書人閱讀 39,311評(píng)論 0贊 276
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎撤卢，沒(méi)想到半個(gè)月后环凿，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,767評(píng)論 1贊 315
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡放吩，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,945評(píng)論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年智听，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片渡紫。...
茶點(diǎn)故事閱讀 40,090評(píng)論 1贊 350
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡到推，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出惕澎，到底是詐尸還是另有隱情莉测，我是刑警寧澤，帶...
沈念sama閱讀 35,785評(píng)論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布集灌，位于F島的核電站悔雹，受9級(jí)特大地震影響，放射性物質(zhì)發(fā)生泄漏欣喧。R本人自食惡果不足惜腌零，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,420評(píng)論 3贊 331
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望唆阿。院中可真熱鬧益涧，春花似錦、人聲如沸驯鳖。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,988評(píng)論 0贊 22
一樁弒父案久免，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)。三九已至扭弧，卻和暖如春阎姥，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背鸽捻。一陣腳步聲響...
開封第一講書人閱讀 33,101評(píng)論 1贊 271
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工呼巴，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人御蒲。一個(gè)月前我還...
沈念sama閱讀 48,298評(píng)論 3贊 372
代替公主和親
正文我出身青樓衣赶，卻偏偏與公主長(zhǎng)得像，于是被迫代替她去往敵國(guó)和親厚满。傳聞我的和親對(duì)象是個(gè)殘疾皇子府瞄，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 45,033評(píng)論 2贊 355

Raft一致性算法偽碼詳解

推薦閱讀更多精彩內(nèi)容