從etcd源碼里看raft
raft 算法主要有兩個(gè)過(guò)程:一個(gè)過(guò)程是領(lǐng)導(dǎo)者選舉汁尺,另一個(gè)過(guò)程是日志復(fù)制创淡,其中日志復(fù)制過(guò)程會(huì)分記錄日志和提交數(shù)據(jù)兩個(gè)階段蔓腐。raft 算法支持最大的容錯(cuò)故障節(jié)點(diǎn)是(N-1)/2灌砖,其中 N 為 集群中總的節(jié)點(diǎn)數(shù)量望侈。
三種狀態(tài)
raft算法中只有三種狀態(tài)印蔬,etcd的實(shí)現(xiàn)中增加了一種狀PreCandidate
- Follower(群眾)
- Candidate(候選人)
- Leader(領(lǐng)導(dǎo))
https://github.com/etcd-io/etcd/blob/master/raft/raft.go#L36,L43
// Possible values for StateType.
const (
StateFollower StateType = iota
StateCandidate
StateLeader
StatePreCandidate
numStates
)
兩種超時(shí)機(jī)制
- election timeout(選舉超時(shí))
- heart timeout(心跳超時(shí))
https://github.com/etcd-io/etcd/blob/master/raft/raft.go#L619,L653
// tickElection is run by followers and candidates after r.electionTimeout.
func (r *raft) tickElection() {
r.electionElapsed++
if r.promotable() && r.pastElectionTimeout() {
r.electionElapsed = 0
r.Step(pb.Message{From: r.id, Type: pb.MsgHup})
}
}
// tickHeartbeat is run by leaders to send a MsgBeat after r.heartbeatTimeout.
func (r *raft) tickHeartbeat() {
r.heartbeatElapsed++
r.electionElapsed++
if r.electionElapsed >= r.electionTimeout {
r.electionElapsed = 0
if r.checkQuorum {
r.Step(pb.Message{From: r.id, Type: pb.MsgCheckQuorum})
}
// If current leader cannot transfer leadership in electionTimeout, it becomes leader again.
if r.state == StateLeader && r.leadTransferee != None {
r.abortLeaderTransfer()
}
}
if r.state != StateLeader {
return
}
if r.heartbeatElapsed >= r.heartbeatTimeout {
r.heartbeatElapsed = 0
r.Step(pb.Message{From: r.id, Type: pb.MsgBeat})
}
}
任期制
哪個(gè)節(jié)點(diǎn)做leader是大家投票選舉出來(lái)的,每個(gè)leader工作一段時(shí)間甜无,然后選出新的leader繼續(xù)負(fù)責(zé)扛点。這跟民主社會(huì)的選舉很像,每一屆新的履職期稱之為一屆任期岂丘,在raft協(xié)議中陵究,也是這樣的,對(duì)應(yīng)的術(shù)語(yǔ)叫term奥帘。