1 概述
解決hadoop啟動hdfs時缸棵,datanode無法啟動的問題拂封。錯誤為:
[plain] view plaincopy
<embed id="ZeroClipboardMovie_1" src="https://csdnimg.cn/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_1" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" flashvars="id=1&width=16&height=16" wmode="transparent" style="box-sizing: border-box;">
- java.io.IOException: Incompatible clusterIDs in /home/lxh/hadoop/hdfs/data: namenode clusterID = CID-a3938a0b-57b5-458d-841c-d096e2b7a71c; datanode clusterID = CID-200e6206-98b5-44b2-9e48-262871884eeb
2 問題描述
執(zhí)行start-dfs.sh后箕母,根據(jù)打印日志,可以看到分別執(zhí)行了NameNode、DataNode的操作。
[plain] view plaincopy
<embed id="ZeroClipboardMovie_2" src="https://csdnimg.cn/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_2" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" flashvars="id=2&width=16&height=16" wmode="transparent" style="box-sizing: border-box;">
- Starting namenodes on [localhost]
- localhost: starting namenode, logging to /home/lxh/hadoop/hadoop-2.4.1/logs/hadoop-lxh-namenode-ubuntu.out
- localhost: starting datanode, logging to /home/lxh/hadoop/hadoop-2.4.1/logs/hadoop-lxh-datanode-ubuntu.out
但是執(zhí)行jps查看啟動結(jié)果時胰挑,返現(xiàn)DataNode并沒有啟動。
[plain] view plaincopy
<embed id="ZeroClipboardMovie_3" src="https://csdnimg.cn/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_3" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" flashvars="id=3&width=16&height=16" wmode="transparent" style="box-sizing: border-box;">
- 10256 ResourceManager
- 29634 NameNode
- 29939 SecondaryNameNode
- 30054 Jps
- 10399 NodeManager
3 查找問題
很是費解椿肩,剛剛還能夠正常運行瞻颂,并且執(zhí)行了wordcount的測試程序。于是回想了一下剛才的操作郑象,執(zhí)行了dfs格式化(hdfs namenode -format和hdfs datanode -format)贡这,然后重新啟動就出現(xiàn)了這個情況。難道與格式化有關(guān)厂榛?于是查看日志:
[plain] view plaincopy
<embed id="ZeroClipboardMovie_4" src="https://csdnimg.cn/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_4" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" flashvars="id=4&width=16&height=16" wmode="transparent" style="box-sizing: border-box;">
- 2014-08-08 00:32:08,787 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:9000. Exiting.
- java.io.IOException: Incompatible clusterIDs in /home/lxh/hadoop/hdfs/data: namenode clusterID = CID-a3938a0b-57b5-458d-841c-d096e2b7a71c; datanode clusterID = CID-200e6206-98b5-44b2-9e48-262871884eeb
- at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:477)
- at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:226)
- at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:254)
- at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:974)
- at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:945)
- at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:278)
- at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
- at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
- at java.lang.Thread.run(Thread.java:745)
- 2014-08-08 00:32:08,790 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:9000
- 2014-08-08 00:32:08,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
根據(jù)日志描述盖矫,原因是datanode的clusterID 和 namenode的clusterID 不匹配丽惭。
原因找到,看看是否如日志描述的這樣炼彪。
打開hdfs-site.xml中關(guān)于datanode和namenode對應(yīng)的目錄吐根,分別打開其中的current/VERSION文件正歼,進行對比辐马。
${datanode}/current/VERSION:
[plain] view plaincopy
<embed id="ZeroClipboardMovie_5" src="https://csdnimg.cn/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_5" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" flashvars="id=5&width=16&height=16" wmode="transparent" style="box-sizing: border-box;">
- storageID=DS-be8dfa2b-17b1-4c9f-bbfe-4898956a39ed
- clusterID=CID-200e6206-98b5-44b2-9e48-262871884eeb
- cTime=0
- datanodeUuid=406b6d6a-0cb1-453d-b689-9ee62433b15d
- storageType=DATA_NODE
- layoutVersion=-55
${namenode}/current/VERSION:
[plain] view plaincopy
<embed id="ZeroClipboardMovie_6" src="https://csdnimg.cn/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_6" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" flashvars="id=6&width=16&height=16" wmode="transparent" style="box-sizing: border-box;">
- namespaceID=670379
- clusterID=CID-a3938a0b-57b5-458d-841c-d096e2b7a71c
- cTime=0
- storageType=NAME_NODE
- blockpoolID=BP-325596647-127.0.1.1-1407429078192
- layoutVersion=-56
果然如日志中記錄的一樣,于是修改datanode的VERSION文件中的clusterID局义,使與namenode保持一致喜爷,然后啟動dfs(執(zhí)行start-dfs.sh),在執(zhí)行jps查看啟動情況萄唇,發(fā)現(xiàn)全部正常啟動檩帐。
[plain] view plaincopy
<embed id="ZeroClipboardMovie_7" src="https://csdnimg.cn/public/highlighter/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_7" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" flashvars="id=7&width=16&height=16" wmode="transparent" style="box-sizing: border-box;">
- 10256 ResourceManager
- 30614 NameNode
- 30759 DataNode
- 30935 SecondaryNameNode
- 31038 Jps
- 10399 NodeManager
4 分析問題原因
執(zhí)行hdfs namenode -format后,current目錄會刪除并重新生成另萤,其中VERSION文件中的clusterID也會隨之變化湃密,而datanode的VERSION文件中的clusterID保持不變,造成兩個clusterID不一致四敞。
所以為了避免這種情況泛源,可以再執(zhí)行的namenode格式化之后,刪除datanode的current文件夾忿危,或者修改datanode的VERSION文件中出clusterID與namenode的VERSION文件中的clusterID一樣达箍,然后重新啟動dfs。