一般而言,HDFS的NameNode要處理客戶端的所有請求彻采。這就面臨著兩個(gè)問題:(1)當(dāng)客戶端的訪問量過大時(shí)档插,NameNode的壓力也會過大,從而導(dǎo)致HDFS性能下降搏明;(2)一個(gè)NameNode內(nèi)部只能緩存1000M的元信息(雖然可以將該值設(shè)置大一些鼠锈,但一般不這么做),對于數(shù)據(jù)量較大時(shí)星著,在這1000M的緩存中往往查找不到請求數(shù)據(jù)的元信息购笆,還要去磁盤上fsimage文件中查找,這也會造成HDFS性能下降虚循。
那么同欠,有沒有一種辦法能夠同時(shí)解決上面兩個(gè)問題呢样傍?答案就是:搭建多個(gè)NameNode,將客戶端請求按照一定策略分給不同的NameNode處理——這就是NameNode聯(lián)盟铺遂,也就是常說的HDFS的負(fù)載均衡(Load Balance衫哥,簡稱LB)。本節(jié)就來介紹HDFS LB集群的搭建過程襟锐。
環(huán)境說明:
bigdata130 192.168.126.130
bigdata131 192.168.126.131
bigdata132 192.168.126.132
bigdata133 192.168.126.133
bigdata134 192.168.126.134
安裝介質(zhì)下載:
1.HDFS負(fù)載均衡的原理
通過上圖的分析撤逢,搭建一個(gè)最小規(guī)模的HDFS負(fù)載均衡集群至少需要5臺機(jī)器:
bigdata130 viewFS
bigdata131 NameNode1 ResourceManager
bigdata132 NameNode2
bigdata133 DataNode1
bigdata134 DataNode2
常見的路由規(guī)則:
- IP-Hash:如果每個(gè)IP的訪問量差不多,可以根據(jù)IP的Hash值分流粮坞;比如:來自111.111.111.111的請求分給NameNode1處理蚊荣,來自222.222.222.222的請求分給NameNode2處理。
- 請求ID:如果請求沒什么規(guī)律莫杈,可以根據(jù)請求ID的Hash值分流互例;比如:將奇數(shù)ID的請求分給NameNode1處理,將偶數(shù)ID的請求分給NameNode2處理筝闹。
- 操作分流:如果兩個(gè)操作的次數(shù)相當(dāng)媳叨,可以根據(jù)操作分流;比如:將put操作分給NameNode1處理丁存,將get操作分給NameNode2處理肩杈。
- 目錄分流:如果兩個(gè)目錄的訪問量相當(dāng),可以根據(jù)訪問目錄分流解寝;比如上圖:將訪問/dir1的請求分給NameNode1處理扩然,將訪問/dir2的請求分給NameNode2處理。
- 將多種規(guī)則組合使用聋伦。
2.搭建HDFS的負(fù)載均衡集群
以下步驟在bigdata130節(jié)點(diǎn)上執(zhí)行:
2.1上傳Hadoop安裝包
使用winscp工具將Hadoop安裝包hadoop-2.7.3.tar.gz上傳到bigdata130節(jié)點(diǎn)的/root/tools/目錄中夫偶,該目錄是事先創(chuàng)建的。
# ls /root/tools/
hadoop-2.7.3.tar.gz
2.2解壓Hadoop安裝包
進(jìn)入/root/tools/目錄觉增,將hadoop安裝包解壓到/root/trainings/目錄中兵拢,該目錄也是事先創(chuàng)建的。
# cd /root/tools/
# tar -zxvf hadoop-2.7.3.tar.gz -C /root/trainings/
2.3配置Hadoop環(huán)境變量(5臺主機(jī)上都做一遍)
# vim /root/.bash_profile
HADOOP_HOME=/root/trainings/hadoop-2.7.3
export HADOOP_HOME
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH# source /root/.bash_profile
2.4配置Hadoop參數(shù)
進(jìn)入Hadoop配置文件目錄:
# cd /root/trainings/hadoop-2.7.3/etc/hadoop/
(1)配置hadoop-env.sh文件:
# echo $JAVA_HOME
/root/trainings/jdk1.8.0_144
# vim hadoop-env.sh
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/root/trainings/jdk1.8.0_144
(2)配置hdfs-site.xml文件(配置nameservice和namenode):
# vim hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>ns1,ns2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1</name>
<value>bigdata131:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1</name>
<value>bigdata131:50070</value>
</property>
<property>
<name>dfs.namenode.secondaryhttp-address.ns1</name>
<value>bigdata131:50090</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns2</name>
<value>bigdata132:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns2</name>
<value>bigdata132:50070</value>
</property>
<property>
<name>dfs.namenode.secondaryhttp-address.ns12</name>
<value>bigdata132:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
(3)配置core-site.xml文件:
# mkdir /root/trainings/hadoop-2.7.3/tmp
# vim core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/trainings/hadoop-2.7.3/tmp</value>
</property>
</configuration>
(4)配置mapred-site.xml文件:
將模板文件mapred-site.xml.template拷貝一份重命名為mapred-site.xml然后編輯:
# cp mapred-site.xml.template mapred-site.xml
# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
(5)配置yarn-site.xml文件(讓ResourceManager運(yùn)行在bigdata131):
# vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata131</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
(6)配置slaves文件:
# vim slaves
bigdata133
bigdata134
2.5創(chuàng)建路由規(guī)則
創(chuàng)建mountTable.xml文件逾礁,并同時(shí)放到下面兩個(gè)目錄下:
- $HADOOP_HOME
- $HADOOP_HOME/etc/hadoop
# vim mountTable.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.viewfs.mounttable.xdl1.homedir</name>
<value>/home</value>
</property>
<property>
<name>fs.viewfs.mounttable.xdl1.link./dir1</name>
<value>hdfs://bigdata131:9000/dir1</value>
</property>
<property>
<name>fs.viewfs.mounttable.xdl1.link./dir2</name>
<value>hdfs://bigdata132:9000/dir2</value>
</property>
</configuration># cp mountTable.xml $HADOOP_HOME
2.6將路由規(guī)則加入到core-site.xml文件中
修改文件頭说铃,加入fs.default.name屬性:
# vim core-site.xml
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="mountTable.xml"/>
<property>
<name>fs.default.name</name>
<value>viewfs://xdl1</value>
</property>
</configuration>
2.7將配置好的hadoop拷貝到其他節(jié)點(diǎn)
[root@bigdata130 ~]# scp -r /root/trainings/hadoop-2.7.3/ root@bigdata131:/root/trainings/
[root@bigdata130 ~]# scp -r /root/trainings/hadoop-2.7.3/ root@bigdata132:/root/trainings/
[root@bigdata130 ~]# scp -r /root/trainings/hadoop-2.7.3/ root@bigdata133:/root/trainings/
[root@bigdata130 ~]# scp -r /root/trainings/hadoop-2.7.3/ root@bigdata134:/root/trainings/
2.8在兩個(gè)NameNode上進(jìn)行格式化
[root@bigdata131 ~]# hdfs namenode -format -clusterId xdl1
18/12/03 05:08:11 INFO common.Storage: Storage directory /root/trainings/hadoop-2.7.3/tmp/dfs/name has been successfully formatted.
[root@bigdata132 ~]# hdfs namenode -format -clusterId xdl1
18/12/03 05:09:00 INFO common.Storage: Storage directory /root/trainings/hadoop-2.7.3/tmp/dfs/name has been successfully formatted.
2.9啟動Hadoop LB集群
在bigdata131上啟動Hadoop集群:
[root@bigdata131 ~]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [bigdata131 bigdata132]
bigdata131: starting namenode, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-namenode-bigdata131.out
bigdata132: starting namenode, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-namenode-bigdata132.out
bigdata133: starting datanode, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-datanode-bigdata133.out
bigdata134: starting datanode, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-datanode-bigdata134.out
starting yarn daemons
starting resourcemanager, logging to /root/trainings/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdata131.out
bigdata134: starting nodemanager, logging to /root/trainings/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdata134.out
bigdata133: starting nodemanager, logging to /root/trainings/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdata133.out
[root@bigdata131 ~]# jps
1543 NameNode
1788 ResourceManager
2047 Jps
[root@bigdata132 ~]# jps
1456 Jps
1385 NameNode
[root@bigdata133 ~]# jps
1426 DataNode
1637 Jps
1533 NodeManager
[root@bigdata134 ~]# jps
1409 DataNode
1621 Jps
1516 NodeManager
2.10在對應(yīng)的NameNode上創(chuàng)建對應(yīng)的目錄
[root@bigdata131 ~]# hadoop fs -mkdir hdfs://bigdata131:9000/dir1
[root@bigdata132 ~]# hadoop fs -mkdir hdfs://bigdata132:9000/dir2
3.測試Hadoop的LB集群環(huán)境
(1)查看兩個(gè)NameNode的狀態(tài)
可以看到:bigdata131和bigdata132都是active狀態(tài)。這一點(diǎn)是和HA的主要區(qū)別嘹履。
(2)分別向/dir1和/dir2目錄上傳一個(gè)文件腻扇,查看任務(wù)被哪個(gè)NameNode處理
[root@bigdata130 ~]# hdfs dfs -put /root/tools/hadoop-2.7.3.tar.gz /dir1
[root@bigdata130 ~]# hdfs dfs -put /root/tools/zookeeper-3.4.10.tar.gz /dir2
可以看到:bigdata131上的NameNode處理了/dir1目錄的請求;bigdata132上的NameNode處理了/dir2的請求砾嫉。
因此我們的Hadoop LB集群已經(jīng)實(shí)現(xiàn)了正確的負(fù)載均衡功能幼苛,可以更加高性能的對外提供Hadoop服務(wù)了。
至此焕刮,Hadoop的LB集群環(huán)境搭建已經(jīng)介紹完畢舶沿。祝你玩得愉快墙杯!