這篇文章闡述下Hadoop分布式環(huán)境的搭建膨处,Hadoop版本采用hadoop-2.6.0-cdh5.7.0乓搬,手頭有三臺機器甜橱,即hadoop000/hadoop001/hadoop002寂诱,我會把其中一臺機器節(jié)點分配NameNode和ResourceManager角色躯泰,同時這臺機器也作為一個數(shù)據(jù)存儲節(jié)點分配DataNode和NodeManager角色谭羔,另外兩臺機器僅作為數(shù)據(jù)存儲節(jié)點分配DataNode和NodeManager角色。
- hadoop000:NameNode/DataNode ResourceManager/NodeManager
- hadoop001:DataNode NodeManager
- hadoop002:DataNode NodeManager
準備工作
- hostname設置
在三臺機器上分別使用sudo vi /etc/sysconfig/network命令修改hostname瘟裸,比如對第一臺機器做如下設置,另外兩臺同理:
NETWORKING=yes
HOSTNAME=hadoop000 - 配置hostname和ip地址的映射關系诵竭,使用sudo vi /etc/hosts對三臺機器做如下配置:
192.168.199.102 hadoop000
192.168.199.247 hadoop001
192.168.199.138 hadoop002
前置安裝
- ssh免密碼登錄
在每臺機器上執(zhí)行:ssh-keygen -t rsa
以hadoop000機器為主
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop000
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop001
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop002 - jdk安裝
在hadoop000機器上解壓jdk安裝包话告,并設置JAVA_HOME到系統(tǒng)環(huán)境變量
tar -zxvf jdk-8u131-linux-x64.tar.gz -C ~/app/
設置環(huán)境變量
vi ~/.bash_profile
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_131
export PATH=$JAVA_HOME/bin:$PATH
source ~/.bash_profile使之生效
集群安裝
-
Hadoop安裝
在hadoop000機器上解壓Hadoop安裝包十办,并設置HADOOP_HOME到系統(tǒng)環(huán)境變量
hadoop-env.sh
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_79core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop000:8020</value>
</property>-
hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/app/tmp/dfs/name</value>
</property><property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/app/tmp/dfs/data</value>
</property> -
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property><property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop000</value>
</property> mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>slaves
hadoop000
hadoop001
hadoop002
分發(fā)安裝包和配置文件到hadoop001和hadoop002節(jié)點
scp -r ~/app hadoop@hadoop001:~/
scp -r ~/app hadoop@hadoop002:~/
scp ~/.bash_profile hadoop@hadoop001:~/
scp ~/.bash_profile hadoop@hadoop002:~/
在hadoop001和hadoop002機器上讓.bash_profile生效對NameNode做格式化:只要在hadoop000上執(zhí)行即可
bin/hdfs namenode -format啟動集群:只要在hadoop000上執(zhí)行即可
sbin/start-all.sh-
驗證
jps查看進程:- hadoop000:
SecondaryNameNode
DataNode
NodeManager
NameNode
ResourceManager - hadoop001:
NodeManager
DataNode - hadoop002:
NodeManager
DataNode
webui訪問: hadoop000:50070(hdfs) hadoop000:8088(yarn)
- hadoop000:
集群停止: stop-all.sh
將Hadoop項目運行到集群中
1)上傳數(shù)據(jù)到hadoop000機器的data目錄下
2)上傳開發(fā)的jar到hadoop000機器的lib目錄下
3)需要將數(shù)據(jù)上傳到hdfs
4)在分布式集群上運行我們開發(fā)的程序
比如我這里運行官方給的計算Pi的案例:
hadoop jar /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3