大數(shù)據(jù)集群搭建
本文檔將搭建以下集群
- hadoop 集群
- zookeeper 集群
- hbase 集群
- spark 集群
- kafka 集群
一肩钠、準(zhǔn)備
1.1 軟件版本
- ubuntu 18.04
- jdk1.8.0_151
- hadoop-3.1.3
- apache-zookeeper-3.6.1-bin
- hbase-2.2.5
- spark-3.0.0-bin-without-hadoop
- kafka_2.12-2.5.0
- kafka-eagle-bin-1.4.8
1.2 網(wǎng)絡(luò)規(guī)劃
規(guī)劃搭建3臺(tái)機(jī)器組成集群模式哄酝,IP與計(jì)算機(jī)名如下:
192.168.100.100 master
192.168.100.110 slaver1
192.168.100.120 slaver2
1.3 軟件包拷貝
可將上述軟件包拷貝到master機(jī)器的/opt
目錄下
1.4 綁定IP和修改計(jì)算機(jī)名
1.4.1 設(shè)置靜態(tài)IP
以下方法使用于20.04版本
1.4.1.1 通過(guò) ifconfig
查看對(duì)應(yīng)網(wǎng)卡名稱(chēng)
1.4.1.2 修改 /etc/netplan/01-netcfg.yaml
文件
# This is the network config written by 'subiquity'
network:
ethernets:
eno3:
addresses: [192.168.100.200/24]
dhcp4: no
optional: true
gateway4: 192.168.100.1
nameservers:
addresses: [8.8.8.8,114.114.114.114]
version: 2
renderer: networkd
說(shuō)明:
eno3: 配置的網(wǎng)卡的名稱(chēng)
addresses: 配置的靜態(tài)ip地址和掩碼
dhcp4: no表示關(guān)閉DHCP哨查,如果需要打開(kāi)DHCP則寫(xiě)yes
gateway4: 網(wǎng)關(guān)地址
nameservers: addresses: DNS服務(wù)器地址苍蔬,多個(gè)DNS服務(wù)器地址需要用英文逗號(hào)分隔開(kāi)
renderer: networkd 指定后端采用systemd-networkd或者Network Manager麦牺,可不填寫(xiě)則默認(rèn)使用systemd-workd
注意:
- ip地址和DNS服務(wù)器地址需要用[]括起來(lái),但是網(wǎng)關(guān)地址不需要
- 注意每個(gè)冒號(hào)后邊都要先加一個(gè)空格
- 注意每一層前邊的縮進(jìn)昂拂,至少比上一層多兩個(gè)空格
1.4.1.3 使配置的ip地址生效 sudo netplan apply
1.4.2 修改/etc/hosts,添加IP綁定
root@master:~# cat /etc/hosts
192.168.100.100 master
192.168.100.110 slaver1
192.168.100.120 slaver2
1.4.3 修改/etc/hostname,為綁定計(jì)算機(jī)名受神。(計(jì)算機(jī)名和上面hosts綁定名必須一致)
root@master:~# cat /etc/hostname
master
root@slaver1:~# cat /etc/hostname
slaver1
root@slaver2:~# cat /etc/hostname
slaver2
修改完hostname之后好像需要重啟才能生效
1.4.4 設(shè)置時(shí)區(qū)為東八區(qū)
sudo timedatectl set-timezone Asia/Shanghai
1.4.5 更改鏡像源,方便更快的更新軟件
1.4.5.1 備份舊的源
sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak
1.4.5.2 修改 /etc/apt/sources.list
格侯, 用清華源替換
# 默認(rèn)注釋了源碼鏡像以提高 apt update 速度鼻听,如有需要可自行取消注釋
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
# 預(yù)發(fā)布軟件源,不建議啟用
# deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse
1.4.5.3 更新
sudo apt-get update
sudo apt-get upgrade
1.5 SSH 設(shè)置
1.5.1 設(shè)置ssh可以通過(guò)root用戶(hù)登陸
設(shè)置root用戶(hù)密碼
sudo passwd
設(shè)置允許ssh以root用戶(hù)登陸
# 先安裝ssh-server
sudo apt install openssh-server
# 修改sshd_config配置
sudo sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
重啟ssh服務(wù)养交,使配置生效
sudo service ssh restart
1.5.2 SSH無(wú)密碼登陸 (在master主機(jī)進(jìn)行如下操作)
切換到root用戶(hù)(注:之后所有操作均以root用戶(hù)執(zhí)行)
sudo su
用rsa生成密鑰精算,一路回車(chē)瓢宦。
ssh-keygen -t rsa
把公鑰復(fù)制一份碎连,并改名為authorized_keys
cd /root/.ssh
cp id_rsa.pub authorized_keys
將authorized_keys文件拷貝到slaver1,slaver2上
scp ./authorized_keys root@slaver1:/root/.ssh
scp ./authorized_keys root@slaver2:/root/.ssh
1.6 JDK安裝(三臺(tái)機(jī)器可同步進(jìn)行)
解壓
cd /opt
tar xavf ./jdk-8u151-linux-x64.tar.gz
建立軟連接
cd /usr/local
ln -s /opt/jdk1.8.0_151 jdk
將JDK環(huán)境變量配置到/etc/profile中(注:如果不好使也可以配置到 /root/.bashrc中)
export JAVA_HOME=/usr/local/jdk
export PATH=$PATH:$JAVA_HOME/bin
檢查JDK是否配置好
source /etc/profile
javac -version
提示 javac 1.8.0_151
代表JDK安裝完成
二、Hadoop集群搭建
2.1 Hadoop安裝
解壓 驮履,并在/home目錄下創(chuàng)建tmp鱼辙、dfs、dfs/name玫镐、dfs/node倒戏、dfs/data。然后在/usr/local目錄下創(chuàng)建對(duì)應(yīng)的軟鏈接恐似。
注意:需要將這些目錄放到空間大的盤(pán)下(e.g. /home)杜跷,避免應(yīng)為硬盤(pán)空間不足引發(fā)錯(cuò)誤
可通過(guò)
df -h
命令查看各分區(qū)大小
# 解壓
cd /opt
tar xavf ./hadoop-3.1.3.tar.gz
# 創(chuàng)建對(duì)于目錄
cd /home
mkdir hadoop
mkdir hadoop/tmp
mkdir hadoop/dfs
mkdir hadoop/dfs/name
mkdir hadoop/dfs/node
mkdir hadoop/dfs/data
# 創(chuàng)建軟鏈接
cd /usr/local/
ln -s /opt/hadoop-3.1.3 hadoop
2.2 Hadoop配置
以下操作都在
/usr/local/hadoop/etc/hadoop
下進(jìn)行
2.2.1 hadoop-env.sh
# 修改JAVA_HOME配置項(xiàng)為JDK安裝目錄
export JAVA_HOME=/usr/local/jdk
# 修改日志級(jí)別,避免長(zhǎng)期跑日志文件太多問(wèn)題
# 注意:測(cè)試發(fā)現(xiàn)只有這一個(gè)有用矫夷,別的都不好使
export HADOOP_DAEMON_ROOT_LOGGER=WARN,console
2.2.2 core-site.xml
fs.default.name : 這個(gè)屬性用來(lái)指定namenode的hdfs協(xié)議的文件系統(tǒng)通信地址葛闷,可以指定一個(gè)主機(jī)+端口
hadoop.tmp.dir : hadoop集群在工作的時(shí)候存儲(chǔ)的一些臨時(shí)文件的目錄
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
<description>Abase for other temporaty directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131702</value>
</property>
</configuration>
2.2.3 hdfs-site.xml
dfs.name.dir:namenode數(shù)據(jù)的存放地點(diǎn)。也就是namenode元數(shù)據(jù)存放的地方双藕,記錄了hdfs系統(tǒng)中文件的元數(shù)據(jù)淑趾。
dfs.data.dir: datanode數(shù)據(jù)的存放地點(diǎn)。也就是block塊存放的目錄了忧陪。
dfs.replication:hdfs的副本數(shù)設(shè)置扣泊。也就是上傳一個(gè)文件近范,其分割為block塊后,每個(gè)block的冗余副本個(gè)數(shù)延蟹,默認(rèn)配置是3评矩。
dfs.namenode.secondary.http-address:secondarynamenode 運(yùn)行節(jié)點(diǎn)的信息,和 namenode 不同節(jié)點(diǎn)
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/dfs/name</value>
<description>Path on the local filesystem where the NameNode stores the namespace and transactions log persistently.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/dfs/data</value>
<description>Comma separated list of on the local filesystem where the NameNode stores the namespace and transactions log persistently.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>replication num.</description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>need not permissions.</description>
</property>
<property>
<name>dfs.namenode.sencondary.http-address</name>
<value>master:9001</value>
<description>第二namenode地址.</description>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
<description>check.</description>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
</configuration>
2.2.4 mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
mapreduce.framework.name:指定mr框架為yarn方式
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2560M</value>
</property>
<!-- 用于開(kāi)始?xì)v史服務(wù)器-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
2.2.5 yarn-site.xml
yarn.resourcemanager.scheduler.class:設(shè)置調(diào)度算法阱飘,默認(rèn)FIFO
yarn.log-aggregation-enable:如果開(kāi)啟了日志聚合稚照,那么容器日志將被復(fù)制到HDFS,并刪除本地日志俯萌。而后這些日志可以在集群任何節(jié)點(diǎn)上用yarn logs命令查看 yarn logs -applicationId <app ID>
yarn.nodemanager.remote-app-log-dir:指定日志存在hdfs上的位置
yarn.nodemanager.resource.memory-mb:根據(jù)每臺(tái)機(jī)器實(shí)際內(nèi)存大小設(shè)置NM最大管理的內(nèi)存
yarn.nodemanager.resource.cpu-vcores:根據(jù)每臺(tái)機(jī)器實(shí)際CPU核數(shù)設(shè)置NM最大管理的CPU核數(shù)
yarn.resourcemanager.xxx.address 以下這些地址需要明確配出來(lái)果录,要不然可能導(dǎo)致集群上yarn app 起不來(lái)
<configuration>
<!--設(shè)置使用公平調(diào)度算法 -->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<!-- 使支持資源搶占,要不要還是會(huì)出現(xiàn)資源不夠時(shí)咐熙,app出現(xiàn)等待大的情況 -->
<property>
<name>yarn.scheduler.fair.preemption</name>
<value>true</value>
</property>
<!-- Shuffle service 需要加以設(shè)置的MapReduce的應(yīng)用程序服務(wù) -->
<!-- 增加spark_shuffle, 使能動(dòng)態(tài)資源分配 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!-- 客戶(hù)端對(duì)ResourceManager主機(jī)通過(guò) host:port 提交作業(yè)-->
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<!-- 客戶(hù)端對(duì)ResourceManager主機(jī)通過(guò) host:port 提交作業(yè)-->
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<!-- NodeManagers通過(guò)ResourceManager主機(jī)訪(fǎng)問(wèn)host:port-->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8035</value>
</property>
<!-- 管理命令通過(guò)ResourceManager主機(jī)訪(fǎng)問(wèn)host:port-->
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<!-- ResourceManager web頁(yè)面host:port.-->
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<!--開(kāi)啟了日志聚合-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--設(shè)置聚合日志放置的位置-->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<!--設(shè)置聚合日志存放的時(shí)間弱恒,單位為秒,會(huì)自動(dòng)移除-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
<!-- 配置日志服務(wù)器的地址,work節(jié)點(diǎn)使用 -->
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs</value>
</property>
<!-- 設(shè)置每個(gè)nodemanager占有內(nèi)存為實(shí)際內(nèi)存的一半棋恼,留一半給hbase用-->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
</property>
<!-- 設(shè)置每個(gè)nodemanager的vCores為(邏輯CPU核數(shù)-1)的2-3倍 -->
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>21</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>21</value>
</property>
<!-- 關(guān)閉內(nèi)存上限檢查返弹,避免程序啟動(dòng)時(shí)請(qǐng)求太多內(nèi)存報(bào)錯(cuò) -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<!-- spark動(dòng)態(tài)資源分配相關(guān)參數(shù)-->
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
<!-- AM檢查到APP Failed之后,自動(dòng)重啟次數(shù)-->
<property>
<name>yarn.resourcemanager.am.max-attempts</name>
<value>10000</value>
</property>
</configuration>
對(duì)yarn.resourcemanager.am.max-attempts的說(shuō)明:
這是一個(gè)全局的appmaster重試次數(shù)的限制爪飘,yarn提交應(yīng)用時(shí)义起,還可以為單獨(dú)一個(gè)應(yīng)用設(shè)置最大重試次數(shù)(
--conf spark.yarn.maxAppAttempts=4
), 如果兩個(gè)都設(shè)置了,則去最小的那個(gè)师崎。如果應(yīng)用程序運(yùn)行數(shù)天或數(shù)周默终,而不重新啟動(dòng)或重新部署在高度使用的群集上,則可能在幾個(gè)小時(shí)內(nèi)耗盡4次嘗試犁罩。為了避免這種情況齐蔽,嘗試計(jì)數(shù)器應(yīng)該在每個(gè)小時(shí)都重置(--conf spark.yarn.am.attemptFailuresValidityInterval=1h
)。如果想讓AM檢測(cè)到程序失敗后床估,一直能重啟含滴,可以把嘗試次數(shù)設(shè)置的大一點(diǎn)。
2.2.6 fair-scheduler.xml
公平調(diào)度算法用到的配置文件
<?xml version="1.0"?>
<allocations>
<queue name="medium">
<minResources>4096 mb,8vcores</minResources>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
<minSharePreemptionTimeout>60</minSharePreemptionTimeout>
</queue>
<queue name="high">
<minResources>4096 mb,8vcores</minResources>
<weight>3.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
<minSharePreemptionTimeout>30</minSharePreemptionTimeout>
</queue>
<queuePlacementPolicy>
<rule name="specified" create="false"/>
<rule name="user" create="false"/>
<rule name="default" create="true"/>
</queuePlacementPolicy>
</allocations>
2.2.7 workers
集群中機(jī)器的主機(jī)名列表丐巫,會(huì)在下邊每臺(tái)機(jī)器上啟動(dòng)一個(gè)NodeManager
master
slaver1
slaver2
2.2.7 將配置好的hadoop分發(fā)到slaver1,slaver2上
cd /opt
scp -r hadoop-3.1.3 root@slaver1:/opt
scp -r hadoop-3.1.3 root@slaver2:/opt
注意:分發(fā)完成后谈况,需要在各slaver節(jié)點(diǎn)上創(chuàng)建對(duì)于的目錄和軟鏈接
2.3 Hadoop啟動(dòng)
2.3.1 格式化一個(gè)新的文件系統(tǒng)
注意:第一次搭建的時(shí)候格式化一次就好!递胧!不要重復(fù)format
cd /usr/local/hadoop/bin
./hadoop namenode -format
2.3.2 啟動(dòng)HDFS
cd /usr/local/hadoop/sbin
./start-dfs.sh
# stop-dfs.sh
通過(guò)jps檢查是否啟動(dòng)成功碑韵,看看master上是否有NameNode和DataNode, slaver節(jié)點(diǎn)上是否有DataNode
root@master:/opt# jps
6306 SecondaryNameNode
6050 NameNode
4987 DataNode
6596 Jps
root@slaver1:~# jps
5307 Jps
4987 DataNode
root@slaver2:~# jps
3156 Jps
3003 DataNode
集群?jiǎn)?dòng)不成功,可能的原因是datanode的clusterID 和 namenode的clusterID 不匹配谓着。解決的辦法將之前創(chuàng)建的tmp泼诱、dfs目錄下內(nèi)容全清掉,然后再重新啟動(dòng)一遍赊锚。
2.3.3啟動(dòng)YARN
./start-yarn.sh
# stop-yarn.sh
通過(guò)jps檢查是否啟動(dòng)成功治筒,看看master上是否有ResourceManager屉栓,slaver上是否有NodeManager
root@master:~# jps
6353 NameNode
7747 Jps
7124 ResourceManager
7333 NodeManager
6855 SecondaryNameNode
6584 DataNode
root@slaver1:~# jps
5605 Jps
5416 NodeManager
4987 DataNode
root@slaver2:~# jps
3296 NodeManager
3480 Jps
3003 DataNode
Web UI:
http://master:8088 #查看Hadoop任務(wù)狀態(tài)
http://master:50070 #查看hdfs相關(guān)信息
三、ZooKeeper集群搭建
3.1 ZooKeeper安裝
解壓耸袜,并創(chuàng)建data友多、datalog目錄,在/usr/local下創(chuàng)建軟鏈接
# 解壓
cd /opt
tar xavf ./apache-zookeeper-3.6.1-bin.tar.gz
# 創(chuàng)建目錄
cd apache-zookeeper-3.6.1-bin
mkdir data
mkdir datalog
# 創(chuàng)建軟鏈接
cd /usr/local
ln -s /opt/apache-zookeeper-3.6.1-bin zookeeper
在data目錄下創(chuàng)建myid文件堤框,在文件第一行寫(xiě)上對(duì)應(yīng)的 Server id域滥。這個(gè)id必須在集群環(huán)境中服務(wù)器標(biāo)識(shí)中是唯一的,且大小在1~255之間蜈抓。
cd /usr/local/zookeeper/data
echo "1" > myid
3.2 ZooKeeper配置
3.2.1 zoo.cfg
cp zoo_sample.cfg zoo.cfg
修改如下:
dataDir:存放內(nèi)存數(shù)據(jù)結(jié)構(gòu)的snapshot启绰,便于快速恢復(fù),默認(rèn)情況下沟使,事務(wù)日志也會(huì)存儲(chǔ)在這里委可。建議同時(shí)配置參數(shù)dataLogDir, 事務(wù)日志的寫(xiě)性能直接影響zk性能。
server.id
=IP/Host
: port1
: port2
id:用來(lái)配置ZK集群中的各節(jié)點(diǎn)腊嗡,并建議id的值和myid保持一致着倾。
IP/Host: 服務(wù)器的 IP 或者是與 IP 地址做了映射的主機(jī)名
port1:Leader和Follower或Observer交換數(shù)據(jù)使用
port2:用于Leader選舉。
# 注意:需要每個(gè)節(jié)點(diǎn)都配策劃嗯自己的地址
clientPortAddress=master
dataDir=/usr/local/zookeeper/data
dataLogDir=/usr/local/zookeeper/datalog
server.1=master:2888:3888
server.2=slaver1:2888:3888
server.3=slaver2:2888:3888
# 默認(rèn)是開(kāi)啟的燕少,為避免同步延遲問(wèn)題卡者,ZK接收到數(shù)據(jù)后會(huì)立刻去講當(dāng)前狀態(tài)信息同步到磁盤(pán)日志文件中,同步完成后才會(huì)應(yīng)答客们。將此項(xiàng)關(guān)閉后崇决,客戶(hù)端連接可以得到快速響應(yīng)
# 關(guān)閉forceSync選項(xiàng)后,會(huì)存在潛在風(fēng)險(xiǎn)镶摘,雖然依舊會(huì)刷磁盤(pán)(log.flush()首先被執(zhí)行)嗽桩,但因?yàn)椴僮飨到y(tǒng)為提高寫(xiě)磁盤(pán)效率岳守,會(huì)先寫(xiě)緩存凄敢,當(dāng)機(jī)器異常后,可能導(dǎo)致一些zk狀態(tài)信息沒(méi)有同步到磁盤(pán)湿痢,從而帶來(lái)ZK前后信息不一樣問(wèn)題
#forceSync=no
# 客戶(hù)端上報(bào)的期望timeout一定要在服務(wù)端設(shè)置的上下界之間涝缝,如果越過(guò)邊界,則以邊界為準(zhǔn)譬重。
# 即一味加大客戶(hù)端掩飾是沒(méi)有用的拒逮,需要把zookeeper的最大超時(shí)時(shí)間加大才行
# 通常用于解決跟hbase和kafka由于網(wǎng)絡(luò)動(dòng)蕩超時(shí)導(dǎo)致斷開(kāi)連接的問(wèn)題
maxSessionTimeout=180000
特別注意:千萬(wàn)不要在配置文件中數(shù)字后邊多加空格,要不然啟動(dòng)的時(shí)候報(bào)莫名其妙的配置錯(cuò)誤
3.2.2 zkServer.sh (可選)
修改bin/zkServer.sh
, 使能JMX, 方便通過(guò)命令行查狀態(tài)臀规,非必要配置
# envi命令執(zhí)行報(bào)錯(cuò)提示:envi is not executed because it is not in the whitelist.
# 解決辦法 修改啟動(dòng)指令 zkServer.sh 滩援,往里面添加 :ZOOMAIN="-Dzookeeper.4lw.commands.whitelist=* ${ZOOMAIN}"
else
# 注意先找到這個(gè)位置
echo "JMX disabled by user request" >&2
ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain"
fi
# 添加如下信息,使能JMX
ZOOMAIN="-Dzookeeper.4lw.commands.whitelist=* ${ZOOMAIN}"
3.2.3 將配置好的zookeeper分發(fā)到slaver1,slaver2
cd /opt
scp -r apache-zookeeper-3.6.1-bin root@slaver1:/opt
scp -r apache-zookeeper-3.6.1-bin root@slaver2:/opt
分發(fā)完成之后塔嬉,需要ssh到各slaver節(jié)點(diǎn)上玩徊,修改對(duì)應(yīng)myid中的值租悄。設(shè)置slaver1中myid為2,設(shè)置slaver2中myid為3恩袱, 修改對(duì)應(yīng)的zoo.cfg泣棋。在個(gè)slaver節(jié)點(diǎn)上的/usr/local目錄下創(chuàng)建軟鏈接
3.3 ZooKeeper啟動(dòng)
分別在master,slaver1,slaver2執(zhí)行如下命令
cd /usr/local/zookeeper/bin
./zkServer.sh start
# ./zkServer.sh stop
各節(jié)點(diǎn)啟動(dòng)后,通過(guò)jps檢查啟動(dòng)狀態(tài)畔塔,會(huì)出現(xiàn)QuorumPeerMain進(jìn)程
root@master:# jps
6672 ResourceManager
6306 SecondaryNameNode
6050 NameNode
7434 QuorumPeerMain
7470 Jps
通過(guò) zkServer.sh status
查看各節(jié)點(diǎn)的運(yùn)行模式
root@master:/usr/local/zookeeper/bin# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/apache-zookeeper-3.5.6-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: master.
Mode: follower
root@slaver1:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: slaver1.
Mode: leader
root@slaver2:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: slaver2.
Mode: follower
3.4 ZooKeeper集群?jiǎn)?dòng)腳本
為避免跑到每臺(tái)機(jī)器上去啟動(dòng)潭辈,可在master節(jié)點(diǎn)/usr/local/zookeeper/bin目錄下添加腳本:
- 一鍵啟動(dòng)腳本:zk-cluster-start.sh
brokers="master slaver1 slaver2"
ZK_HOME="/usr/local/zookeeper"
ZK_NAME="zookeeper"
echo "INFO: Begin to start Zookeeper cluster ..."
# By default disable strict host key checking
if [ "$ZK_SSH_OPTS" = "" ]; then
ZK_SSH_OPTS="-o StrictHostKeyChecking=no"
fi
for broker in $brokers
do
echo "INFO:Start ${ZK_NAME} on ${broker} ..."
ssh $ZK_SSH_OPTS ${broker} "${ZK_HOME}/bin/zkServer.sh start"
if [[ $? -eq 0 ]]; then
echo "INFO: start ${ZK_NAME} on ${broker} is on !"
fi
done
echo "INFO:Zookeeper cluster started !"
- 一鍵關(guān)閉腳本 zk-cluster-stop.sh
brokers="master slaver1 slaver2"
ZK_HOME="/usr/local/zookeeper"
ZK_NAME="zookeeper"
echo "INFO: Begin to stop Zookeeper cluster ..."
# By default disable strict host key checking
if [ "$ZK_SSH_OPTS" = "" ]; then
ZK_SSH_OPTS="-o StrictHostKeyChecking=no"
fi
for broker in $brokers
do
echo "INFO:Shut down ${ZK_NAME} on ${broker} ..."
ssh $ZK_SSH_OPTS ${broker} "${ZK_HOME}/bin/zkServer.sh stop"
if [[ $? -eq 0 ]]; then
echo "INFO :Shut down ${ZK_NAME} on ${broker} is down !"
fi
done
echo "INFO:Zookeeper cluster shutdown completed !"
四、HBase集群搭建
4.1 HBase安裝
解壓澈吨,創(chuàng)建軟鏈接
# 解壓
cd /opt
tar xavf ./hbase-2.2.5-bin.tar.gz
# 創(chuàng)建軟鏈接
cd /usr/local
ln -s /opt/hbase-2.2.5 hbase
4.2 HBase配置
以下配置文件在/usr/local/hbase/conf目錄下
4.2.1 hbase-env.sh
#修改JAVA_HOME配置項(xiàng)為JDK安裝目錄
export JAVA_HOME=/usr/local/jdk
# 不使用hbase自帶的zookeeper
export HBASE_MANAGES_ZK=false
# 共享 Hadoop 內(nèi)置的壓縮器
export HBASE_LIBRARY_PATH=/usr/local/hadoop/lib/native
# HMASTER 堆內(nèi)存調(diào)整
# -Xms1g 最大堆內(nèi)存1G
# -Xmx1g 初始堆內(nèi)存設(shè)置與最大堆內(nèi)存一樣大把敢。如果Xms設(shè)置得比較小,當(dāng)遇到數(shù)據(jù)量較大時(shí)候谅辣,堆內(nèi)存會(huì)迅速增長(zhǎng)技竟,當(dāng)上升到最大又會(huì)回落,伸縮堆大小會(huì)帶來(lái)壓力
# -Xmn300m 新生代300M屈藐。新生代不能過(guò)小榔组,否則新生代中的生存周期較長(zhǎng)的數(shù)據(jù)會(huì)過(guò)早移到老生代,引起老生代產(chǎn)生大量?jī)?nèi)存碎片联逻;新生代也不能過(guò)大搓扯,否則回收新生代也會(huì)造成太長(zhǎng)的時(shí)間停頓,影響性能包归。
# -XX:+UseParNewGC 新生代采用 ParallelGC 回收器锨推。ParallelGC 將停止運(yùn)行Java 進(jìn)程去清空新生代堆,因?yàn)樾律苄」溃酝nD的時(shí)間也很短换可,需幾百毫秒。
# -XX:+UseConcMarkSweepGC 老年代采用CMS回收器(Concurrent Mark-Sweep Collector) . CMS 在不停止運(yùn)行Java進(jìn)程的情況下異步地完成垃圾回收厦幅,CMS會(huì)增加CPU的負(fù)載沾鳄,但是可以避免重寫(xiě)老年代堆碎片時(shí)候的停頓。老年代回收不可使用 ParallelGC 回收 機(jī)制确憨,因?yàn)槔仙亩芽臻g大译荞,ParallelGC會(huì)造成Java進(jìn)程長(zhǎng)時(shí)間停頓,使得RegionServer與ZooKeeper的會(huì)話(huà)超時(shí)休弃,該RegionServer會(huì)被誤認(rèn)為已經(jīng)奔潰并會(huì)被拋棄吞歼。
# -XX:CMSInitiatingOccupancyFraction=60 指定在內(nèi)存到60%時(shí)候, CMS 應(yīng)該被開(kāi)始垃圾回收
# -XX:+UseCMSInitiatingOccupancyOnly 如果沒(méi)有 -XX:+UseCMSInitiatingOccupancyOnly 這個(gè)參數(shù), 只有第一次會(huì)使用CMSInitiatingPermOccupancyFraction=60 這個(gè)值. 后面的情況會(huì)自動(dòng)調(diào)整
#export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Xmx1g -Xms1g -Xmn300m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly"
# HREGIONSERVER 堆內(nèi)存調(diào)整
# -Xmx5g 最大堆內(nèi)存5G
# -Xmn1g 新生代1G
# -XX:+CMSParallelRemarkEnabled 為了減少第二次暫停的時(shí)間,開(kāi)啟并行remark: -XX:+CMSParallelRemarkEnabled塔猾。如果remark還是過(guò)長(zhǎng)的話(huà)篙骡,可以開(kāi)啟-XX:+CMSScavengeBeforeRemark選項(xiàng),強(qiáng)制remark之前開(kāi)始一次minor gc,減少remark的暫停時(shí)間糯俗,但是在remark之后也將立即開(kāi)始又一次minor gc
# -XX:+UseCMSCompactAtFullCollection CMS是不會(huì)整理堆碎片的慎皱,因此為了防止堆碎片引起full gc,通過(guò)會(huì)開(kāi)啟CMS階段進(jìn)行合并碎片選項(xiàng):-XX:+UseCMSCompactAtFullCollection叶骨,開(kāi)啟這個(gè)選項(xiàng)一定程度上會(huì)影響性能
# -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log 寫(xiě)入日志
#export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Xmx5g -Xms5g -Xmn1g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
# G1NewSizePercent :G1的Young區(qū)大小是通過(guò)算法來(lái)自適應(yīng)確定的, 也就是根據(jù)之前Young區(qū)GC的耗時(shí)來(lái)確定之后的Young大小,如果耗時(shí)過(guò)長(zhǎng),則調(diào)小Young區(qū),耗時(shí)過(guò)短,則調(diào)大Young區(qū). 這個(gè)參數(shù)表示Young的最小百分比
# MaxTenuringThreshold: 當(dāng)一個(gè)對(duì)象gc的代數(shù)超過(guò)這個(gè)值的時(shí)候, 會(huì)將對(duì)象從young區(qū)挪到old區(qū).
# G1HeapRegionSize: 表示G1將每個(gè)Region切分成多大, 注意一定要寫(xiě)單位, 例如32m.
# G1MixedGCCountTarget: 當(dāng)占用內(nèi)存超過(guò)InitiatingHeapOccupancyPercent閥值時(shí), 最多通過(guò)多少次Mixed GC來(lái)將內(nèi)存控制在閥值之下
# InitiatingHeapOccupancyPercent: 當(dāng)占用內(nèi)存超過(guò)這個(gè)百分比的時(shí)候, G1開(kāi)始執(zhí)行多次Mixed GC來(lái)整理老年代內(nèi)存碎片.
# UseStringDeduplication 字符串去重茫多,提高性能
# ResizePLAB 減少gc線(xiàn)程間通信的東西,關(guān)閉動(dòng)態(tài)提升本地buffer
# PerfDisableSharedMem 關(guān)掉統(tǒng)計(jì)信息的內(nèi)存映射忽刽。開(kāi)啟在某些特殊場(chǎng)景下天揖,會(huì)極大增加gc暫停時(shí)間
export HBASE_OPTS="$HBASE_OPTS \
-Xmx5g -Xms5g \
-XX:+UseG1GC \
-XX:MaxDirectMemorySize=5g \
-XX:MaxGCPauseMillis=90 \
-XX:+UnlockExperimentalVMOptions \
-XX:+ParallelRefProcEnabled \
-XX:ConcGCThreads=4 \
-XX:ParallelGCThreads=16 \
-XX:G1NewSizePercent=2 \
-XX:G1MaxNewSizePercent=20 \
-XX:MaxTenuringThreshold=1 \
-XX:G1HeapRegionSize=32m \
-XX:G1MixedGCCountTarget=16 \
-XX:InitiatingHeapOccupancyPercent=60 \
-XX:G1OldCSetRegionThresholdPercent=5 \
-XX:SurvivorRatio=4 \
-XX:G1HeapWastePercent=10 \
-XX:+UseStringDeduplication \
-XX:-ResizePLAB \
-verbose:gc \
-XX:+PrintGCDetails \
-XX:+PrintGCDateStamps \
-XX:+PrintGCApplicationStoppedTime \
-XX:+PrintTenuringDistribution \
-Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
通過(guò)命令查看Hadoop是否支持Snappy:
hadoop checknative -a
測(cè)試 hbase snappy:
hbase org.apache.hadoop.hbase.util.CompressionTest file:///home/asin/Temp/test.txt snappy
打印SUCCESS即為成功
4.2.2 hbase-site.xml
hbase.rootdir: hbase存放數(shù)據(jù)目錄
hbase.zookerper.quorum: zookooper 服務(wù)啟動(dòng)的節(jié)點(diǎn),只能為奇數(shù)個(gè)
<configuration>
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<!-- 指定hbase在HDFS上存儲(chǔ)的路徑 -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<!-- 指定hbase是分布式的 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- 指定zk的地址跪帝,多個(gè)用“,”分割 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>master:2181,slaver1:2181,slaver2:2181</value>
</property>
<!-- 嘗試解決regionserver exception問(wèn)題 -->
<property>
<name>hbase.thrift.maxWorkerThreads</name>
<value>10000</value>
</property>
<property>
<name>hbase.thrift.maxQueuedRequests</name>
<value>10000</value>
</property>
<property>
<name>hbase.regionserver.executor.openregion.threads</name>
<value>10000</value>
</property>
<!-- 調(diào)優(yōu)測(cè)試-->
<property>
<name>hbase.hregion.memstore.block.multiplier</name>
<value>5</value>
</property>
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>268435456</value>
</property>
<property>
<name>hbase.regionserver.global.memstore.size</name>
<value>0.4</value>
</property>
<property>
<name>hbase.regionserver.global.memstore.size.lower.limit</name>
<value>0.625</value>
</property>
<property>
<name>hbase.hregion.compacting.memstore.type</name>
<value>BASIC</value>
</property>
<!-- 為保證正常測(cè)試今膊,定時(shí)刪數(shù)據(jù), 臨時(shí)用, 很耗資源, 真正生產(chǎn)環(huán)境可能要禁用這個(gè)功能-->
<!-- 設(shè)置6小時(shí)觸發(fā)一次HFile合并伞剑,觸發(fā)的時(shí)候會(huì)刪除過(guò)期數(shù)據(jù), 即超過(guò)創(chuàng)建表時(shí)設(shè)置的TTL時(shí)長(zhǎng)-->
<!-- 單位 ms-->
<!-- 設(shè)置為0禁用自動(dòng)合并功能-->
<property>
<name>hbase.hregion.majorcompaction</name>
<value>0</value>
</property>
</configuration>
4.2.3 regionservers
master
slaver1
slaver2
4.2.4 backup-masters
用于配置高可用HMaster節(jié)點(diǎn)
slaver1
4.2.5 將配置好的hbase分發(fā)到slaver1,slaver2
cd /opt
scp -r hbase-2.2.5 root@slaver1:/opt
scp -r hbase-2.2.5 root@slaver2:/opt
注意:分發(fā)完成后斑唬,需要在個(gè)slaver節(jié)點(diǎn)的/usr/local下創(chuàng)建對(duì)應(yīng)的軟鏈接,要不然啟動(dòng)hbase的時(shí)候有可能某些節(jié)點(diǎn)起不來(lái)
4.3 HBase啟動(dòng)
注意:?jiǎn)?dòng)hbase之前,要首先啟動(dòng)hadoop和zookeeper
cd /usr/local/hbase/bin
./start-hbase.sh
# stop-hbase.sh
通過(guò)jps檢查啟動(dòng)狀態(tài)黎泣,會(huì)出現(xiàn)HMaster, HRegionServer進(jìn)程
root@master:/usr/local/hbase/bin# jps
7969 QuorumPeerMain
6353 NameNode
8995 HRegionServer
7124 ResourceManager
7333 NodeManager
9285 Jps
6855 SecondaryNameNode
6584 DataNode
8747 HMaster
[root@slaver1 ~]# jps
25636 DataNode
27767 HRegionServer
28264 Jps
25802 NodeManager
26218 QuorumPeerMain
27932 HMaster
root@slaver2:~# jps
3776 QuorumPeerMain
3296 NodeManager
4489 Jps
3003 DataNode
4364 HRegionServer
啟動(dòng)問(wèn)題:
- HMaster: Failed to become active master恕刘, 解決辦法:把hbase.rootdir對(duì)應(yīng)的目錄刪掉即可, e.g.
hadoop fs -rm -r /hbase
- 通過(guò)登錄 http://master:16010查看web ui
4.4 phoenix安裝
將phoenix-5.0.0-HBase-2.0-server.jar和htrace-core-3.1.0-incubating.jar放入hbase/lib中,并重啟hbase
phoenix-5.0.0-HBase-2.0-server.jar:自己編譯
trace-core-3.1.0-incubating.jar :需要從網(wǎng)上下載
cp phoenix-5.0.0-HBase-2.0-server.jar htrace-core-3.1.0-incubating.jar /usr/local/hbase/lib
五、Spark集群搭建
5.1 Spark安裝
解壓抒倚,創(chuàng)建軟鏈接
# 解壓
cd /opt
tar xavf ./spark-3.0.0-bin-without-hadoop.tgz
# 創(chuàng)建軟連接
cd /usr/local
ln -s /opt/spark-3.0.0-bin-without-hadoop spark
5.2 Spark配置
以下配置在/usr/local/spark/conf目錄下
5.2.1 spark-env.sh
復(fù)制spark-env.sh.template成spark-env.sh褐着, 添加如下信息:
export JAVA_HOME=/usr/local/jdk
export SPARK_MASTER_HOST=master
export SPARK_MASTER_PORT=7077
export SPARK_HOME=/usr/local/spark
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
# 因?yàn)槲覀兿螺d是不帶hadoop依賴(lài)jar的spark版本,所以需要在spark配置中指定hadoop的classpath
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
5.2.2 slaves
復(fù)制slaves.template成slaves, 做如下修改
master
slaver1
slaver2
5.2.3 log4j.properties
拷貝log4j.properties.template生產(chǎn)log4j.properties托呕, 然后將其中的INFO改為WARN, 目的是方式長(zhǎng)時(shí)間跑導(dǎo)致hadoop下userlogs太大的問(wèn)題
# Set everything to be logged to the console
#log4j.rootCategory=INFO, console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
# Settings to quiet third party logs that are too verbose
log4j.logger.org.sparkproject.jetty=WARN
log4j.logger.org.sparkproject.jetty.util.component.AbstractLifeCycle=ERROR
#log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=WARN
#log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=WARN
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
5.2.4 將配置好的spark分發(fā)到slaver1,slaver2
cd /opt
scp -r spark-3.0.0-bin-without-hadoop root@slaver1:/opt
scp -r spark-3.0.0-bin-without-hadoop root@slaver2:/opt
分發(fā)完成后含蓉,需要在個(gè)slaver節(jié)點(diǎn)的/usr/local下創(chuàng)建對(duì)應(yīng)的軟鏈接
5.3 Spark啟動(dòng)
cd /usr/local/spark/sbin
./start-all.sh
# stop-all.sh
通過(guò)jps查看狀態(tài),發(fā)現(xiàn)多了Master, Worker進(jìn)程
root@master:/usr/local/spark/sbin# jps
1680 NameNode
1937 SecondaryNameNode
2097 QuorumPeerMain
10258 HMaster
12004 Jps
11879 Worker
11720 Master
可以通過(guò)登陸http://master:8080 查看狀態(tài)
5.4 Spark提交任務(wù)示例
5.4.1 yarn-client模式
#!/bin/bash
if [ $# -ne 1 ];then
echo "usage $0 configFile"
exit
fi
spark-submit \
--class com.jdsy.BdatApp \
--master yarn \
--deploy-mode client \
--num-executors 9 \
--executor-cores 2 \
--executor-memory 1600M \
--conf spark.default.parallelism=50 \
--conf spark.sql.shuffle.partitions=9 \
--files ./mysql.json \
--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 \
./bdat1.0-1.0-SNAPSHOT-jar-with-dependencies.jar \
$1
5.4.2 yarn-cluster模式
spark-submit \
--class com.jdsy.BdatApp \
--master yarn \
--deploy-mode cluster \
--driver-memory 600M \
--num-executors 9 \
--executor-cores 2 \
--executor-memory 1600M \
--conf spark.default.parallelism=50 \
--conf spark.sql.shuffle.partitions=9 \
--files ./mysql.json,$1#taskflow.xml \
--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 \
./bdat1.0-1.0-SNAPSHOT-jar-with-dependencies.jar
說(shuō)明:
- 上邊$1表示要從外邊把一個(gè)文件傳到spark程序中
- 可通過(guò) yarn logs -applicationId <app ID> 來(lái)查看自己打的日志信息以及崩潰信息等
- 通過(guò)yarn application -kill <app ID> 來(lái)殺掉這個(gè)程序
5.5 Spark On Yarn 動(dòng)態(tài)資源分配
5.5.1 配置
5.5.1.1 修改配置文件spark-defaults.conf
项郊,添加如下內(nèi)容
/usr/local/spark/conf/spark-defaults.conf
# 啟用External shuffle Service服務(wù)
spark.shuffle.service.enabled true
# 開(kāi)啟動(dòng)態(tài)資源分配
spark.dynamicAllocation.enabled true
#每個(gè)Application最小分配的executor數(shù)
spark.dynamicAllocation.minExecutors 1
#每個(gè)Application最大并發(fā)分配的executor數(shù)
spark.dynamicAllocation.maxExecutors 30
#初始給的executor個(gè)數(shù)
spark.dynamicAllocation.initialExecutors=3
#當(dāng)有task掛起或等待schedulerBacklogTimeout(默認(rèn)1s)時(shí)間的時(shí)候馅扣,會(huì)開(kāi)始動(dòng)態(tài)資源分配
spark.dynamicAllocation.schedulerBacklogTimeout 1s
#之后會(huì)每隔sustainedSchedulerBacklogTimeout(默認(rèn)1s)時(shí)間申請(qǐng)一次,直到申請(qǐng)到足夠的資源着降。每次申請(qǐng)的資源量是指數(shù)增長(zhǎng)的差油,即1,2,4,8等
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5s
5.5.1.2 將spark-xxx-yarn-shuffle.jar添加到NodeManager節(jié)點(diǎn)的classpath環(huán)境中
cp /usr/local/spark/yarn/spark-3.0.0-yarn-shuffle.jar /usr/local/hadoop/share/hadoop/yarn/lib/
5.5.1.3 修改配置文件yarn-site.xml
,添加如下內(nèi)容
/usr/local/hadoop/etc/hadoop/yarn-site.xml
<!-- 增加spark_shuffle, 使能動(dòng)態(tài)資源分配 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<!-- spark動(dòng)態(tài)資源分配相關(guān)參數(shù)-->
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
5.5.1.4 將修改好的spark配置文件鹊碍,hadoop配置文件分發(fā)到集群各節(jié)點(diǎn)
5.5.2 啟動(dòng)集群
注意:配置好動(dòng)態(tài)資源分配后厌殉,一定要先啟動(dòng)spark集群,再啟動(dòng)hadoop集群侈咕,反過(guò)來(lái)會(huì)造成spark的一個(gè)端口被占用,導(dǎo)致spark起不來(lái)
5.5.3 Spark提交任務(wù)
以yarn-client模式為例器紧,可以不用指定executor個(gè)數(shù)耀销,以及每個(gè)executor的相關(guān)參數(shù)
#!/bin/bash
if [ $# -ne 1 ];then
echo "usage $0 configFile"
exit
fi
spark-submit \
--class com.jdsy.BdatApp \
--master yarn \
--deploy-mode client \
--conf spark.default.parallelism=50 \
--conf spark.sql.shuffle.partitions=9 \
--files ./mysql.json \
--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 \
./bdat1.0-1.0-SNAPSHOT-jar-with-dependencies.jar \
$1
注意:以實(shí)時(shí)從kafka取數(shù)據(jù)為例,只有當(dāng)kafka沒(méi)有數(shù)據(jù)的時(shí)候才會(huì)釋放資源
5.6 歷史服務(wù)器
5.6.1配置修改
5.6.1.1 mapred-site.xml
修改/usr/local/hadoop/etc/hadoop/mapred-site.xml
, 添加如下內(nèi)容
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
5.6.1.2 yarn-site.xml
修改/usr/local/hadoop/etc/hadoop/yarn-site.xml
, 添加如下內(nèi)容
<!--開(kāi)啟了日志聚合-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--設(shè)置聚合日志放置的位置-->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<!--設(shè)置聚合日志存放的時(shí)間,單位為秒熊尉,會(huì)自動(dòng)移除-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
<!-- 配置日志服務(wù)器的地址,work節(jié)點(diǎn)使用 -->
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs</value>
</property>
5.6.1.3 spark-defaults.conf
在spark-defaults.conf
中添加如下內(nèi)容:
# 歷史服務(wù)器相關(guān)配置
spark.yarn.historyServer.address=master:18080
spark.yarn.historyServer.allowTracking=true
spark.eventLog.enabled=true
# 使支持壓縮罐柳,默認(rèn)用lz4
spark.eventLog.compress=true
# 使支持滾動(dòng)刪除日志,避免日志問(wèn)題太大狰住,默認(rèn)文件大小為128M
spark.eventLog.rolling.enabled=true
spark.eventLog.dir=hdfs://master:9000/spark/eventlogs
spark.history.fs.logDirectory=hdfs://master:9000/spark/eventlogs
5.6.1.4 將以上修改的配置拷貝到各slaver節(jié)點(diǎn)的對(duì)應(yīng)目錄下
5.6.2 在hdfs上創(chuàng)建對(duì)應(yīng)的日志目錄
cd /usr/local/hadoop/bin
./hdfs dfs -mkdir -p /spark/eventlogs
5.6.3 啟動(dòng)歷史服務(wù)器
cd /usr/local/hadoop/sbin
./mr-jobhistory-daemon.sh start historyserver
# ./mr-jobhistory-daemon.sh stop historyserver
cd /usr/local/spark/sbin
./start-history-server.sh
# ./stop-history-server.sh
通過(guò)jps命令發(fā)現(xiàn)會(huì)多了一個(gè)HistoryServer(spark)和JobHistoryServer(hadoop)進(jìn)程
root@master:/usr/local/hadoop/sbin# jps 15200 HistoryServer 22020 JobHistoryServer
當(dāng)應(yīng)用程序運(yùn)行完成后张吉,可通過(guò)點(diǎn)http://master:8088對(duì)應(yīng)Application的History看歷史日志,也可以通過(guò) http://master:18080看歷史日志
六催植、Kafka集群搭建
6.1 Kafka安裝
解壓肮蛹,然后在 /home目錄下創(chuàng)建kafka的log目錄, 在/usr/local下創(chuàng)建軟連接
# 解壓
cd /opt
tar xavf ./kafka_2.12-2.5.0.tgz
# 創(chuàng)建目錄
# 注意:盡量放到硬盤(pán)空間大的分區(qū)下,要不然有可能會(huì)因?yàn)橛脖P(pán)空間不足導(dǎo)致kafka崩潰掉
mkdir -p /home/kafka/data
# 創(chuàng)建軟連接
cd /usr/local
ln -s /opt/kafka_2.12-2.5.0 kafka
6.2 Kafka配置
6.2.1 server.properties (/usr/local/kafka/config目錄下)
# 指定代理id创南,borker.id可以任意指定伦忠,前提是保證集群內(nèi)每臺(tái)機(jī)器的broker.id唯一,第二臺(tái)機(jī)器設(shè)置為2...以此類(lèi)推
broker.id=0
# 提供給客戶(hù)端響應(yīng)的端口稿辙, master節(jié)點(diǎn)指定為master:9092, slaver1節(jié)點(diǎn)指定slaver1:9092, 依次類(lèi)推
listeners=PLAINTEXT://master:9092
# kafka數(shù)據(jù)的存放目錄昆码,而非Kafka的日志目錄
log.dirs=/home/kafka/data
# 最長(zhǎng)保留7天數(shù)據(jù),注意是每個(gè)topic的每個(gè)partition下的每個(gè)segment中最大時(shí)間戳超過(guò)7天認(rèn)為過(guò)期
# 注意:只有當(dāng)新產(chǎn)生segment的時(shí)候才會(huì)觸發(fā)數(shù)據(jù)回收
log.retention.hours=168
# 設(shè)置zookeeper集群地址
zookeeper.connect=master:2181,slaver1:2181,slaver2:2181
# broker處理消息的最大線(xiàn)程數(shù)
num.network.threads=9
# broker處理磁盤(pán)IO的線(xiàn)程數(shù)
num.io.threads=16
# 增加超時(shí)時(shí)間, 注意如果超過(guò)zookeep最大超時(shí)時(shí)間邻储,則需要對(duì)應(yīng)修改一下zookeep的配置
zookeeper.connection.timeout.ms=180000
上述配置對(duì)所有TOPIC都生效赋咽,如果想對(duì)某個(gè)TOPIC應(yīng)用單獨(dú)配置,可在創(chuàng)建的時(shí)候指定相關(guān)參數(shù)吨娜。e.g.
#!/bin/bash
if [ $# -ne 3 ]; then
echo "usage $0 topic 分區(qū)數(shù) 副本數(shù)"
exit
fi
kafka-topics.sh --create \
--zookeeper master:2181,slaver1:2181,slaver2:2181 \
--topic $1 \
--config compression.type=lz4 \
--config segment.bytes=4000 \
--config retention.ms=30000 \
--partitions $2 \
--replication-factor $3
6.2.2 修改啟動(dòng)腳本
以下配置位于 /usr/local/kafka/bin目錄下
6.2.2.1 kafka-server-start.sh
在最開(kāi)始添加如下內(nèi)容:
export JAVA_HOME=/usr/local/jdk
export JMX_PORT="9999"
6.2.2.2 kafka-server-stop.sh
在最開(kāi)始添加如下內(nèi)容:
export JAVA_HOME=/usr/local/jdk
6.2.2.3 kafka-run-class.sh
在最開(kāi)始添加如下內(nèi)容:
export JAVA_HOME=/usr/local/jdk
6.2.3 將配置好的kafka分發(fā)到slaver1,slaver2
cd /opt
scp -r kafka_2.12-2.5.0 root@slaver1:/opt
scp -r kafka_2.12-2.5.0 root@slaver2:/opt
注意:
- 需要修改slaver節(jié)點(diǎn)上broker.id冬耿, 比如slaver1為1, slaver2為2
- 修改對(duì)應(yīng)的listeners
- 需要在個(gè)slaver節(jié)點(diǎn)的/usr/local下創(chuàng)建對(duì)應(yīng)的軟鏈接
6.3 Kafka啟動(dòng)
注意:一定要先啟動(dòng)kafka, 再啟動(dòng)kafka-eagle, 如果反過(guò)來(lái)會(huì)造成9999端口被eagle占用導(dǎo)致kafka起不來(lái)!萌壳!
分別在master,slaver1,slaver2上執(zhí)行如下命令
cd /usr/local/kafka
./bin/kafka-server-start.sh -daemon ./config/server.properties
# ./bin/kafka-server-stop.sh
通過(guò)jps查看狀態(tài)亦镶, 會(huì)發(fā)現(xiàn)多了Kafka進(jìn)程
root@master:/usr/local/kafka# jps
1680 NameNode
1937 SecondaryNameNode
2097 QuorumPeerMain
10258 HMaster
11879 Worker
11720 Master
12618 Jps
12590 Kafka
root@slaver1:/usr/local/kafka# jps
6833 Jps
5414 HRegionServer
1575 DataNode
1751 QuorumPeerMain
6809 Kafka
6077 Worker
root@slaver2:/usr/local/kafka# jps
1456 DataNode
4768 HRegionServer
1626 QuorumPeerMain
6156 Kafka
5438 Worker
6175 Jps
6.3.1 集群?jiǎn)?dòng)/停止腳本
kafka/bin/kafka-cluster-start.sh
#!/bin/bash
brokers="master slaver1 slaver2"
KA_HOME="/usr/local/kafka"
KA_NAME="kafka"
echo "INFO: Begin to start kafka cluster ..."
# By default disable strict host key checking
if [ "$ZK_SSH_OPTS" = "" ]; then
ZK_SSH_OPTS="-o StrictHostKeyChecking=no"
fi
for broker in $brokers
do
echo "INFO:Start ${KA_NAME} on ${broker} ..."
ssh $ZK_SSH_OPTS ${broker} "${KA_HOME}/bin/kafka-server-start.sh -daemon ${KA_HOME}/config/server.properties"
echo ssh $ZK_SSH_OPTS ${broker} "${KA_HOME}/bin/kafka-server-start.sh -daemon ${KA_HOME}/config/server.properties"
if [[ $? -eq 0 ]]; then
echo "INFO: start ${KA_NAME} on ${broker} is on !"
fi
done
kafka/bin/kafka-cluster-stop.sh
#!/bin/bash
brokers="master slaver1 slaver2"
KAFKA_HOME="/usr/local/kafka"
KAFKA_NAME="kafka"
echo "INFO: Begin to stop kafka cluster ..."
# By default disable strict host key checking
if [ "$ZK_SSH_OPTS" = "" ]; then
ZK_SSH_OPTS="-o StrictHostKeyChecking=no"
fi
for broker in $brokers
do
echo "INFO:Shut down ${KAFKA_NAME} on ${broker} ..."
ssh $ZK_SSH_OPTS ${broker} "${KAFKA_HOME}/bin/kafka-server-stop.sh"
if [[ $? -eq 0 ]]; then
echo "INFO :Shut down ${KAFKA_NAME} on ${broker} is down !"
fi
done
echo "INFO:Kafka cluster shutdown completed !"
6.5 Kafka Eagle安裝
6.5.1 解壓
# 解壓
cd /opt
tar xavf kafka-eagle-bin-1.4.8.tar.gz
# 創(chuàng)建軟鏈接
cd /usr/local
ln -s /opt/kafka-eagle-bin-1.4.8/kafka-eagle-web-1.4.8 kafka-eagle
6.5.2 設(shè)置環(huán)境變量
修改 /etc/profile
, 添加如下配置
export KE_HOME=/usr/local/kafka-eagle
source /etc/profile
使配置生效
6.5.3 修改配置
kafka-eagle-web-1.4.8/conf/system-config.properties
#如果只有一個(gè)集群的話(huà),就寫(xiě)一個(gè)cluster1就行了
kafka.eagle.zk.cluster.alias=cluster1
cluster1.zk.list=master:2181,slaver1:2181,slaver2:2181
#如果kafka開(kāi)啟了sasl認(rèn)證袱瓮,需要在這個(gè)地方配置sasl認(rèn)證文件
kafka.eagle.sasl.enable=false
#下面兩項(xiàng)是配置數(shù)據(jù)庫(kù)的缤骨,默認(rèn)使用sqlite,如果量大尺借,建議使用mysql绊起,這里我使用的是sqlit
#如果想使用mysql建議在文末查看官方文檔
kafka.eagle.driver=org.sqlite.JDBC
kafka.eagle.url=jdbc:sqlite:/usr/local/kafka-eagle/db/ke.db
kafka.eagle.username=root
kafka.eagle.password=111111
6.5.4 啟動(dòng)kafka-eagle
cd /opt/kafka-eagle-web-1.4.8/bin
./ke.sh start
#./ke.sh stop
啟動(dòng)之后可通過(guò) http://master:8048/ke登錄查看信息,默認(rèn)用戶(hù)名密碼 admin/123456
七燎斩、集群遷移
以將192.168.100.100上的數(shù)據(jù)拷貝到192.168.100.200上為例
7.1 新集群準(zhǔn)備工作
參考 一虱歪、準(zhǔn)備
章節(jié)內(nèi)容
7.2 將現(xiàn)有集群上的軟件包拷貝到新集群上
以下操作是在192.168.100.100上進(jìn)行
7.2.1 清空現(xiàn)有集群上各軟件的log日志
# 進(jìn)到各軟件的安裝目錄
cd /opt/
# 刪除zookeeper日志
rm apache-zookeeper-3.6.1-bin/logs/* -rf
rm apache-zookeeper-3.6.1-bin/data/version-2/ -rf
rm apache-zookeeper-3.6.1-bin/datalog/version-2/ -rf
# 刪除hadoop日志
rm hadoop-3.1.3/logs/* -rf
# 刪除hbase日志
rm hbase-2.2.5/logs/* -rf
# 刪除spark日志
rm spark-3.0.0-bin-without-hadoop/logs/* -rf
# 刪除kafka日志
rm kafka_2.12-2.5.0/logs/* -rf
# 刪除kafka-eagle(如果有)
rm kafka-eagle-bin-1.4.8/kafka-eagle-web-1.4.8/logs/* -rf
7.2.2 將現(xiàn)有集群上的各軟件打包
tar cavf total.tar.gz \
jdk1.8.0_151/ \
apache-zookeeper-3.6.1-bin/ \
hadoop-3.1.3/ \
hbase-2.2.5/ \
hbase-operator-tools-1.0.0/ \
spark-3.0.0-bin-without-hadoop/ \
kafka_2.12-2.5.0/ \
kafka-eagle-bin-1.4.8/
7.2.3 將打好的壓縮包發(fā)送到新節(jié)點(diǎn)上
scp total.tar.gz root@192.168.100.200:/opt
7.3 在新集群節(jié)點(diǎn)上完成后續(xù)搭建工作
以下是在192.168.100.200上進(jìn)行
7.3.1 解壓
cd /opt
tar xavf total.tar.gz
7.3.2 參考各集群搭建細(xì)節(jié),創(chuàng)建對(duì)應(yīng)的數(shù)據(jù)目錄及軟連接等(如果有需要栅表,設(shè)置對(duì)應(yīng)的環(huán)境變量)笋鄙,完成集群搭建
注意:
創(chuàng)建軟鏈接的時(shí)候,源地址最后不要帶/
第一次啟動(dòng)時(shí)怪瓶,hadoop啟動(dòng)之前需要先格式化
第一次啟動(dòng)時(shí)萧落,hbase啟動(dòng)之前需要先把hdfs上的/hbase目錄刪掉,同時(shí)把zookeeper上的相關(guān)信息清掉
# 清hdfs上的hbase信息 cd /usr/local/hadoop/bin ./hdfs dfs -rm -r /hbase # 清zookeeper上hbase信息 cd /usr/local/hbase/bin ./hbase zkcli rmr /hbase
需要修改本機(jī)/etc/hosts, 加上對(duì)應(yīng)的master, slaver1, slaver2, 否則導(dǎo)致如下問(wèn)題:
- hadoop yarn application ui無(wú)法登陸
- 本地操作hbase不好使
- 從本地發(fā)kafka數(shù)據(jù)不好使
192.168.100.200 master 192.168.100.201 slaver1 192.168.100.202 slaver2