【起因】22:31:04分被通知集群異常,無法正常處理調度任務行拢;遠程登錄查看晴埂,發(fā)現(xiàn)HDFS NameNode發(fā)生腦裂;
NameNode節(jié)點:name21.hadoop聂喇,name22.hadoop
hadoop版本:2.7.1
【事故報告】
1. 【時間】=> 22:31:04
【HA狀態(tài)】=> name21.hadoop:Active, name22.hadoop:Standyby
【日志信息】=> name21.hadoop:/var/log/hadoop/hdfs/gc.log-xxxxxxxxxx
此時辖源,Active NameNode觸發(fā)一次異常JVM GC(正常情況下GC耗時在2s以下)蔚携,此次耗時32.46s;
2018-02-11T22:31:04.707+0800: 801030.875: [GC2018-02-11T22:31:04.707+0800: 801030.875: [ParNew (promotion failed): 1550355K->1544189K(1710528K), 0.8542820 secs]
2018-02-11T22:31:05.562+0800: 801031.729: [CMS: 12100861K->3636381K(13303808K), 31.6119450 secs] 13650016K->3636381K(15014336K), [CMS Perm : 55251K->55008K(131072K)], 32.4666280 secs] [Times: user=33.04 sys=0.00, **real=32.46 secs**]
2.【時間】=> 22:31:37
【HA狀態(tài)】=> name21.hadoop:Active, name22.hadoop:Standyby
【日志信息】=> name21.hadoop:/var/log/hadoop/hdfs/hadoop-hdfs-namenode-name21.hadoop.log
此時克饶,由于之前GC時間過長酝蜒,導致active NameNode向JournalNodes進行數(shù)據(jù)通信時,超時導致無法響應(超時時間默認值為20s矾湃,GC耗時32s)亡脑;
2018-02-11 22:31:37,201 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [xxx.xx.xx.xx:8485, xxx.xx.xx.xx:8485, xxx.xx.xx.xx:8485], stream=QuorumOutputStream starting at txid 3108262584)) java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
2018-02-11 22:31:37,208 WARN util.JvmPauseMonitor (JvmPauseMonitor.java:run(191)) - Detected pause in JVM or host machine (eg GC): pause of approximately 32158ms GC pool 'ParNew' had collection(s): count=1 time=855ms GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=31612ms
2018-02-11 22:31:37,211 INFO namenode.FSNamesystem (FSNamesystem.java:rollEditLog(5937)) - Roll Edit Log from 172.21.54.22
2018-02-11 22:31:37,211 INFO namenode.FSEditLog (FSEditLog.java:rollEditLog(1202)) - Rolling edit logs 2018-02-11 22:31:37,211 INFO namenode.FSEditLog (FSEditLog.java:endCurrentLogSegment(1258)) - Ending log segment 3108262584
2018-02-11 22:31:37,228 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
至此,Active Namenode節(jié)點進入異常狀態(tài)(進程無法結束邀跃,程序所起8021端口不存在)霉咨;
3.【時間】=> 22:32:22
【HA狀態(tài)】=> name21.hadoop:Active, name22.hadoop:Standyby
【日志信息】=> name21.hadoop:/var/log/hadoop/hdfs/hadoop-hdfs-zkfc-name21.hadoop.log
此時, name21.hadoop上的ZKFailoverController無法獲取到本機Namenode的狀態(tài)[無法將Active狀態(tài)修改為Standby],此后每隔45s反復重試拍屑;同時途戒,與zookeeper的session連接關閉;
2018-02-11 22:32:22,267 INFO ha.HealthMonitor (HealthMonitor.java:enterState(249)) - Entering state SERVICE_NOT_RESPONDING
2018-02-11 22:32:22,267 INFO ha.ZKFailoverController (ZKFailoverController.java:setLastHealthState(852)) - Local service NameNode at name21.hadoop/xxx.xx.xx.xx:8021 entered state: SERVICE_NOT_RESPONDING
2018-02-11 22:32:22,267 INFO ha.ZKFailoverController (ZKFailoverController.java:recheckElectability(768)) - Quitting master election for NameNode at name21.hadoop2/ xxx.xx.xx.xx:8021 and marking that fencing is necessary
2018-02-11 22:32:22,267 INFO ha.ActiveStandbyElector (ActiveStandbyElector.java:quitElection(401)) - Yielding from election
2018-02-11 22:32:22,268 INFO ha.ActiveStandbyElector (ActiveStandbyElector.java:terminateConnection(830)) - Terminating ZK connection for elector id=660690290 appData=0a06636261734e4112036e6e311a0e6e616d6532312e6861646f6f703220d53e28d33e cb=Elector callbacks for NameNode at name21.hadoop2/172.21.54.21:8021
2018-02-11 22:32:22,270 INFO zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x15f5451238a08b3 closed
2018-02-11 22:32:22,271 WARN ha.ActiveStandbyElector (ActiveStandbyElector.java:isStaleClient(1066)) - Ignoring stale result from old client with sessionId 0x15f5451238a08b3
2018-02-11 22:32:22,271 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down
4.【時間】=> 22:32:22
【HA狀態(tài)】=> name21.hadoop:Active, name22.hadoop:Standyby
【日志信息】=> name22.hadoop:/var/log/hadoop/hdfs/hadoop-hdfs-zkfc-name22.hadoop.log
與此同時僵驰, name22.hadoop的namenode與zookeeper建立session連接喷斋, ZKFailoverController 嘗試連接name21.hadoop上的Namenode,將狀態(tài)從Active修改為Standby矢渊, 報錯連接超時继准;
2018-02-11 22:32:22,286 INFO ha.ActiveStandbyElector (ActiveStandbyElector.java:fenceOldActive(931)) - Checking for any old active which needs to be fenced...
2018-02-11 22:32:22,290 INFO ha.ActiveStandbyElector (ActiveStandbyElector.java:fenceOldActive(952)) - Old node exists: 0a06636261734e4112036e6e311a0e6e616d6532312e6861646f6f703220d53e28d33e
2018-02-11 22:32:22,294 INFO ha.ZKFailoverController (ZKFailoverController.java:doFence(513)) - Should fence: NameNode at name21.hadoop/ xxx.xx.xx.xx:8021
2018-02-11 22:32:27,304 WARN ha.FailoverController (FailoverController.java:tryGracefulFence(178)) - Unable to gracefully make NameNode at name21.hadoop/ xxx.xx.xx.xx:8021 standby (unable to connect) java.net.SocketTimeoutException: Call From name22.hadoop/ xxx.xx.xx.xx to name21.hadoop:8021 failed on socket timeout exception: java.net.SocketTimeoutException: 5000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/ xxx.xx.xx.xx :56511remote=name21.hadoop/ xxx.xx.xx.xx :8021];
......
Caused by: **java.net.SocketTimeoutException: 5000 millis timeout** while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/ xxx.xx.xx.xx :56511 remote=name21.hadoop/ xxx.xx.xx.xx :8021]
......
啟動Fence服務(集群并未設置,直接返回ture)矮男,并將name22.hadoop Namenode的狀態(tài)從Standby變?yōu)锳ctive;
2018-02-11 22:32:27,323 INFO ha.NodeFencer (NodeFencer.java:fence(91)) - ====== Beginning Service Fencing Process... ======
2018-02-11 22:32:27,323 INFO ha.NodeFencer (NodeFencer.java:fence(94)) - Trying method 1/1: org.apache.hadoop.ha.ShellCommandFencer(/bin/true)
2018-02-11 22:32:27,339 INFO ha.ShellCommandFencer (ShellCommandFencer.java:tryFence(99)) - Launched fencing command '/bin/true' with pid 48839
2018-02-11 22:32:27,359 INFO ha.NodeFencer (NodeFencer.java:fence(98)) - ====== Fencing successful by method org.apache.hadoop.ha.ShellCommandFencer(/bin/true) ======
2018-02-11 22:32:27,360 INFO ha.ActiveStandbyElector (ActiveStandbyElector.java:writeBreadCrumbNode(878)) - Writing znode /hadoop-ha/cbasNA/ActiveBreadCrumb to indicate that the local node is the most recent active...
2018-02-11 22:32:27,365 INFO ha.ZKFailoverController (ZKFailoverController.java:becomeActive(380)) - Trying to make NameNode at name22.hadoop/ xxx.xx.xx.xx :8021 active...
2018-02-11 22:32:46,337 INFO ha.ZKFailoverController (ZKFailoverController.java:becomeActive(387)) - Successfully transitioned NameNode at name22.hadoop/ xxx.xx.xx.xx :8021 to active state
5.【時間】=> 22:32:46
【HA狀態(tài)】=> name21.hadoop:Active, name22.hadoop:Active
至此移必,集群的NameNode出現(xiàn)雙Active狀態(tài),發(fā)生腦裂毡鉴;
6.【恢復措施】
后面運維介入時崔泵,嘗試重啟namenode,發(fā)現(xiàn)一直卡在35%進度猪瞬,無法在執(zhí)行下去憎瘸;
原因:運維介入時,首先嘗試重啟name22.hadoop上的namenode陈瘦,此時重啟程序檢測到雙active幌甘,程序會反復獲取狀態(tài),直到active只有一個痊项;
NameNode HA states: active_namenodes = [('nn1', 'name21.hadoop:50070'), ('nn2', 'name22.hadoop:50070')], standby_namenodes = [], unknown_namenodes = []
此時出現(xiàn)的情況是:
name21.hadoop的namenode異常(8021端口連接超時)锅风,無法將自身狀態(tài)從active修改為standby;
name22.hadoop的namenode已經(jīng)被ZKFailoverController選舉為active鞍泉,選舉理由也是name21.hadoop的8021端口連接異常皱埠;
因此,需要重啟name21.hadoop上的namenode(恢復8021端口)咖驮,重啟時會檢測name22.hadoop的namenode狀態(tài)(檢測name22.hadoop的8021端口)边器,將自身從active切換為standby训枢;name22.hadoop無需重啟;
7.【預防措施】
a. 優(yōu)化GC參數(shù)忘巧,添加namenode的GC監(jiān)控恒界,找出GC異常的原因并修復;
b. 提高journalNode的寫入超時時間袋坑,目前默認值是20s仗处,對于生產(chǎn)環(huán)境有必要提高到60s;
c. 啟動Fence服務;
d. 集群namenode節(jié)點的狀態(tài)監(jiān)控枣宫;