環(huán)境
安裝環(huán)境
環(huán)境 | 配置 |
---|---|
Linux | Centos6.5 |
Java | JDK1.8 |
Hadoop | Hadoop 2.7.3 |
節(jié)點(diǎn)配置
創(chuàng)建用戶(hù)組
groupadd hadoop
useradd hadoop -g hadoop
JDK安裝
需要提前安裝好jdk编曼,jdk版本參考:
https://wiki.apache.org/hadoop/HadoopJavaVersions
ssh免密碼登陸
分別在三臺(tái)機(jī)器生成ssh公鑰和私鑰
ssh-keygen -t rsa
將三臺(tái)機(jī)器的公鑰放到authorized_keys
cd ~/.ssh
cat id_rsa.pub >> authorized_keys
ssh hadoop@10.xxx.xxx.2 ~/.ssh/id_rsa.pub >> authorized_keys
ssh hadoop@10.xxx.xxx.3 ~/.ssh/id_rsa.pub >> authorized_keys
這兩步需要輸入對(duì)應(yīng)機(jī)器的hadoop賬戶(hù)密碼讲冠,因?yàn)榇藭r(shí)還沒(méi)有無(wú)密碼免登錄面哼。
將authorized_keys拷貝到另外兩臺(tái)機(jī)器:
scp authorized_keys hadoop@10.xxx.xxx.2:~/.ssh/
scp authorized_keys hadoop@10.xxx.xxx.3:~/.ssh/
這樣可以直接登錄另外兩臺(tái)機(jī)器绰垂,而不需要密碼了一睁。
注意:如果這時(shí)仍然需要輸入密碼杰赛,那有可能是authorized_keys權(quán)限問(wèn)題,將其改為600即可苔咪。
chmod 600 authorized_keys
Hadoop安裝
下載hadoop鏡像文件
http://hadoop.apache.org/releases.html
解壓縮
tar -zxvf hadoop-2.7.3.tar.gz
配置hadoop配置文件
hadoop的配置文件都在${HADOOP_HOME}/etc/hadoop/下面。
配置hadoop-env.sh
修改JDK安裝目錄
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/opt/xxx/java/jdk1.8.0_111/
配置slaves文件
在slaves文件中添加所有從節(jié)點(diǎn)的IP
10.xxx.xxx.2
10.xxx.xxx.3
配置core-site.xml
<configuration>
<!-- 指定使用HDFS作為默認(rèn)文件系統(tǒng)柳骄,并且指定其N(xiāo)ameNode節(jié)點(diǎn) -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://10.xxx.xxx.1:9000</value>
</property>
<!-- 在使用序列化文件時(shí)候讀寫(xiě)緩沖區(qū)大小 -->
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<!-- 指定臨時(shí)目錄 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop/tmp</value>
</property>
</configuration>
配置hdfs-site.xml
<configuration>
<!-- SecondaryNameNode地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>http://10.xxx.xxx.1:50090</value>
</property>
<!-- hdfs副本數(shù)量 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- namenode數(shù)據(jù)存儲(chǔ)目錄 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop/hdfs/name</value>
</property>
<!-- datanode數(shù)據(jù)存儲(chǔ)目錄 -->
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/home/hadoop/hadoop/hdfs/data</value>
</property>
<!-- hdfs 塊大小 128M -->
<property>
<name>dfs.bocksize</name>
<value>134217728</value>
</property>
</configuration>
配置mapred-site.xml
mapred-site.xml沒(méi)有提供团赏,而是提供了其模版mapred-siter.xml.template,我們需要復(fù)制一個(gè)
cp mapred-site.xml.template mapred-site.xml
<configuration>
<!-- 配置mapreduce執(zhí)行框架 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 配置mapreduce任務(wù)歷史服務(wù)地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>http://10.xxx.xxx.1:10020</value>
</property>
<!-- 配置mapreduce任務(wù)服務(wù)ui地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>http://10.xxx.xxx.1:19888</value>
</property>
</configuration>
配置yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 配置ResourceManger的主機(jī)和端口耐薯,為客戶(hù)端提交任務(wù)使用 -->
<property>
<name>yarn.resourcemanager.address</name>
<value>http://10.xxx.xxx.1:8032</value>
</property>
<!-- ApplicationMasters通過(guò)提供ResourceManger提供的地址獲取調(diào)度資源 -->
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>http://10.xxx.xxx.1:8030</value>
</property>
<!-- 為NodeManger提供ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>http://10.xxx.xxx.1:8031</value>
</property>
<!-- 指定管理命令的地址 -->
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>http://10.xxx.xxx.1:8033</value>
</property>
<!-- ResourceManger web ui 地址 -->
<!--
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>http://10.xxx.xxx.1:8088</value>
</property>
-->
</configuration>
需要注意的是不能有以下配置:
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>http://10.xxx.xxx.1:8088</value>
</property>
否則ResourceManeger啟動(dòng)不起來(lái)舔清,之后看下為什么?
分發(fā)節(jié)點(diǎn)
將配置配置好的hadoop復(fù)制到其它節(jié)點(diǎn):
scp -r hadoop-2.7.3 hadoop@10.xxx.xxx.2:/home/hadoop/
scp -r hadoop-2.7.3 hadoop@10.xxx.xxx.3:/home/hadoop/
格式化NameNode
bin/hdfs namenode -format
....
17/03/27 18:11:29 INFO common.Storage: Storage directory /home/hadoop/hadoop/hdfs/name has been successfully formatted.
......
啟動(dòng)hdfs和yarn
sbin/start-hdfs.sh
sbin/start-yarn.sh
web查看NameNode
http://10.xxx.xxx.1:50070
web查看ResourceManager
http://10.xxx.xxx.1:8080/cluster
注意:如果報(bào)了下面這個(gè)問(wèn)題曲初,應(yīng)該是你的/etc/hosts下面并沒(méi)有slave 的ip地址体谒,將slave的地址填上即可。
2017-05-31 04:08:54,915 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1212305280-172.26.3.61-1496217355761 (Datanode Uuid null) service to /10.5.234.238:9000 beginning handshake with NN
2017-05-31 04:08:54,929 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-1212305280-172.26.3.61-1496217355761 (Datanode Uuid null) service to /10.5.234.238:9000 Datanode denied communication with namenode because hostname cannot be resolved (ip=10.5.237.131, hostname=10.5.237.131): DatanodeRegistration(0.0.0.0:50010, datanodeUuid=62026f56-a10d-4ddc-962c-48eaff24a8a2, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-d7717928-6909-4dc8-bbfd-dea4d6d509db;nsid=1819736668;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:873)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4529)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1286)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:96)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28752)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
這里只是集群的基礎(chǔ)配置复斥,后續(xù)會(huì)跟進(jìn)各種配置营密。