? ? ? ?KafkaController是管理leader的地方镰吵,在kafka controller啟動(dòng)時(shí)會(huì)調(diào)用它梢莽。主要就是在zookeeper的/controller選舉路徑上注冊(cè)一個(gè)leader變更監(jiān)聽器路鹰,然后調(diào)用elect方法開啟選舉,
elect方法
? ? ? 先讀取ZK /controller路徑下是否已經(jīng)有注冊(cè)驼唱,如果有則讀出弹灭,看是不是自己的brokerId毁靶,如果返回true剂习,否則返回false (這里定義的是一個(gè)Boolean型變量amILeader)蛮位。
? ? ? 否則說明controller還沒有選舉,嘗試在在zookeeper上創(chuàng)建臨時(shí)節(jié)點(diǎn)鳞绕。如果創(chuàng)建成功表明broker被選舉為leader失仁,調(diào)用onBecomeLeader;若節(jié)點(diǎn)以存在猾昆,再次獲取controller ID陶因,若不是-1說明已有其他broker被選舉為leader;若是的話表明雖有l(wèi)eader被選出但其放棄了leader角色垂蜗,還需要開啟下一輪的leader選舉.如果當(dāng)前被選舉為leader則:
onControllerFailover
//從/controller_epoch路徑上讀取controller的epoch的值,這個(gè)值用于控制controller的切換. 把讀取到的值存儲(chǔ)到controllerContext中的epoch與epochZkVersion屬性中.
1楷扬,readControllerEpochFromZookeeper()
//更新controllerContext中的epoch的值(加1),并持久化到/controller_epoch路徑上.
2,incrementControllerEpoch(zkUtils.zkClient)//每次選舉更新epoch
//注冊(cè)對(duì)/admin/reassign_partitions節(jié)點(diǎn)的監(jiān)聽處理程序,由PartitionsReassignedListener實(shí)現(xiàn). 用于監(jiān)聽partition的重新分配的動(dòng)作.主要用于監(jiān)聽節(jié)點(diǎn)的內(nèi)容修改
3,registerReassignedPartitionsListener()
//注冊(cè)對(duì)/isr_change_notification節(jié)點(diǎn)的監(jiān)聽處理程序,這個(gè)節(jié)點(diǎn)主要用于通知partitoin的isr的變化, 由IsrChangeNotificationListener實(shí)現(xiàn).主要用于監(jiān)聽節(jié)點(diǎn)的內(nèi)容修改,
registerIsrChangeNotificationListener()
//注冊(cè)對(duì)/admin/preferred_replica_election節(jié)點(diǎn)的監(jiān)聽處理程序,這個(gè)節(jié)點(diǎn)用于對(duì)副本的首選節(jié)點(diǎn)進(jìn)行處理,由PreferredReplicaElectionListener實(shí)現(xiàn).主要用于監(jiān)聽節(jié)點(diǎn)的內(nèi)容修改.
registerPreferredReplicaElectionListener()
//在partitionStateMachine中對(duì)/brokers/topics節(jié)點(diǎn)注冊(cè)監(jiān)聽處理程序,用于監(jiān)聽topic的修改,由TopicChangeListener實(shí)現(xiàn).主要用于監(jiān)聽這個(gè)節(jié)點(diǎn)的子節(jié)點(diǎn)的修改.? 如果配置有deletetopic的啟用時(shí),通過配置delete.topic.enable,默認(rèn)為false. 如果這個(gè)值配置為true時(shí),對(duì)/admin/delete_topics節(jié)點(diǎn)注冊(cè)一個(gè)DeleteTopicsListener監(jiān)聽處理程序, 用于監(jiān)聽這個(gè)節(jié)點(diǎn)下的子節(jié)點(diǎn)的修改.
? partitionStateMachine.registerListeners()
//對(duì)/brokers/ids節(jié)點(diǎn)注冊(cè)一個(gè)BrokerChangeListener監(jiān)聽處理程序,用于監(jiān)聽這個(gè)節(jié)點(diǎn)的子節(jié)點(diǎn)的修改,主要用于監(jiān)聽broker的的改變.
replicaStateMachine.registerListeners()
//初始化controller的上下文.
initializeControllerContext()
//啟動(dòng)對(duì)broker狀態(tài)的監(jiān)聽與partition的狀態(tài)監(jiān)聽的實(shí)例.
replicaStateMachine.startup()
partitionStateMachine.startup()
//根據(jù)現(xiàn)在kafka中所有的topic,對(duì)/brokers/topics/topicname節(jié)點(diǎn)注冊(cè)一個(gè)AddPartitionsListener監(jiān)聽處理程序,用于監(jiān)聽這個(gè)topic的修改.
controllerContext.allTopics.foreach(topic =>partitionStateMachine.registerPartitionChangeListener(topic))
/*
這里對(duì)未完成partition的副本重新分配的partitionsBeingReassigned集合進(jìn)行迭代,執(zhí)行如下的流程處理:
1,根據(jù)準(zhǔn)備重新分配的partition的副本所在的節(jié)點(diǎn)集合,檢查當(dāng)前l(fā)iveBrokers中是否都存在這些節(jié)點(diǎn),如果要重新分配的節(jié)點(diǎn)集合中有在liveBrokers中不包含的節(jié)點(diǎn),表示要分配的副本所在節(jié)點(diǎn)有沒有啟動(dòng)的節(jié)點(diǎn),throw exception,
2,根據(jù)需要重新分配的partition從partitionReplicaAssignment集合中找到對(duì)應(yīng)的partition的信息,這個(gè)集合中存儲(chǔ)了已經(jīng)分配的partition的副本信息,如果在已經(jīng)分配的partition的集合中找不到這個(gè)partition,throw exception.
3,如果準(zhǔn)備重新分配的副本節(jié)點(diǎn)集合與現(xiàn)在partitionReplicaAssignment集合中parition對(duì)應(yīng)的副本節(jié)點(diǎn)集合是相同的內(nèi)容,表示重新分配是沒有必要的,throw exception.
4,這種情況表示重新分配的副本節(jié)點(diǎn)集合對(duì)應(yīng)的節(jié)點(diǎn)都已經(jīng)啟動(dòng),同時(shí)這個(gè)集合與現(xiàn)在此partition對(duì)應(yīng)的分配副本節(jié)點(diǎn)集合不相同,執(zhí)行如下的子流程:
4,1,在/brokers/topics/tpname/partitions/pid/state路徑上生成注冊(cè)一個(gè)用于監(jiān)聽isr的變化的ReassignedPartitionsIsrChangeListener監(jiān)聽程序.
4,2,在deleteTopicManager中檢查這個(gè)topic是否是需要?jiǎng)h除的topic,如果是,添加到準(zhǔn)備刪除的topic的集合中.
4,3,執(zhí)行對(duì)partition中副本的重新分配,通過onPartitionReassignment函數(shù).
*/
? ? ? maybeTriggerPartitionReassignment()
/*
這里對(duì)還未完成首選副本分配的partition進(jìn)行首選副本分配的操作,這些未分配首選副本存儲(chǔ)在partitionsUndergoingPreferredReplicaElection集合中.
1,首先檢查對(duì)應(yīng)的partitions的topic是否是已經(jīng)被刪除的topic,如果包含有要?jiǎng)h除的topic時(shí),把對(duì)應(yīng)的partitions集合添加到待刪除的topic partitions的集合中.
2,通過partitionStateMachine實(shí)例修改要進(jìn)行首選副本分配的所有的partitions的狀態(tài)為OnlinePartition.并通過preferredReplicaPartitionLeaderSelector實(shí)例進(jìn)行partition的首選副本的選擇操作,通過讀取/brokers/topics/topicname/partitions/pid/state路徑的isr的信息,如果這個(gè)路徑還不存在時(shí),根據(jù)當(dāng)前partition的所有活著的副本集合,取第一個(gè)副本為leader,并把這個(gè)副本集合存儲(chǔ)到這個(gè)路徑中,根據(jù)讀取這個(gè)路徑的信息,通過leaderSelector來進(jìn)行首選副本的分配.
3,更新partitionLeadershipInfo集合的內(nèi)容,把這個(gè)partition對(duì)應(yīng)的isr存儲(chǔ)到這個(gè)集合中,并向?qū)?yīng)的broker節(jié)點(diǎn)發(fā)送LeaderAndIsrRequest請(qǐng)求.
4,移出partitionsUndergoingPreferredReplicaElection集合中的內(nèi)容,
并刪除/admin/preferred_replica_election節(jié)點(diǎn)的數(shù)據(jù).
*/
// 并刪除/admin/preferred_replica_election節(jié)點(diǎn)的數(shù)據(jù).
? ? ? maybeTriggerPreferredReplicaElection()
//向所有的broker的節(jié)點(diǎn)發(fā)送全部topic的metadata更新的UpdateMetadataRequest請(qǐng)求.
/* send partition leadership info to all live brokers */
sendUpdateMetadataRequest(controllerContext.liveOrShuttingDownBrokerIds.toSeq)
//如果auto.leader.rebalance.enable配置為true,默認(rèn)值也是true,
// 根據(jù)leader.imbalance.check.interval.seconds配置的間隔時(shí)間,對(duì)partition進(jìn)行balance操作.默認(rèn)配置為300秒.定時(shí)執(zhí)行的調(diào)度函數(shù)為
if (config.autoLeaderRebalanceEnable) {
info("starting the partition rebalance scheduler")
autoRebalanceScheduler.startup()
autoRebalanceScheduler.schedule("partition-rebalance-thread", checkAndTriggerPartitionRebalance,
5, config.leaderImbalanceCheckIntervalSeconds.toLong, TimeUnit.SECONDS)
}
//啟動(dòng)刪除topic的管理組件TopicDeletionManager,這個(gè)實(shí)例中生成一個(gè)DeleteTopicsThread線程,
// 前提是delete.topic.enable配置值為true.否則這個(gè)實(shí)例什么都不做.
deleteTopicManager.start();
onControllerResignation
如果從leader變?yōu)榉莑eader 調(diào)用onControllerResignation
deregisterIsrChangeNotificationListener()//取消對(duì)/isr_change_notification節(jié)點(diǎn)的監(jiān)聽程序IsrChangeNotificationListener實(shí)例.
deregisterReassignedPartitionsListener()//取消對(duì)/admin/reassign_partitions節(jié)點(diǎn)的監(jiān)聽程序PartitionsReassignedListener實(shí)例.
deregisterPreferredReplicaElectionListener()//取消對(duì)/admin/preferred_replica_election節(jié)點(diǎn)的監(jiān)聽程序PreferredReplicaElectionListener實(shí)例.
// shutdown delete topic manager
if (deleteTopicManager !=null)//停止topicdelete的管理組件與各個(gè)broker進(jìn)行通信的管理組件.
? deleteTopicManager.shutdown()
// shutdown leader rebalance scheduler
if (config.autoLeaderRebalanceEnable)//關(guān)閉自動(dòng)balance partitions的自動(dòng)調(diào)度處理程序.
? autoRebalanceScheduler.shutdown()
inLock(controllerContext.controllerLock) {
// de-register partition ISR listener for on-going partition reassignment task
? deregisterReassignedPartitionsIsrChangeListeners()//取消對(duì)partitions/state節(jié)點(diǎn)的監(jiān)聽程序ReassignedPartitionsIsrChangeListener實(shí)例
// shutdown partition state machine
? partitionStateMachine.shutdown()//關(guān)閉partition的狀態(tài)控制器partitionStateMachine與replica的狀態(tài)控制器replicaStateMachine實(shí)例.
// shutdown replica state machine
? replicaStateMachine.shutdown()
// shutdown controller channel manager
? if(controllerContext.controllerChannelManager !=null) {
controllerContext.controllerChannelManager.shutdown()
controllerContext.controllerChannelManager =null
? }
// reset controller context
? controllerContext.epoch=0
? controllerContext.epochZkVersion=0
? brokerState.newState(RunningAsBroker)//,設(shè)置當(dāng)前的broker的狀態(tài)為RunningAsBroker.
相當(dāng)于把原來注冊(cè)的取消一遍