本文檔環(huán)境基于ubuntu16.04版本
一、準備
1.1 軟件版本
- Ubuntu 16.04.6 (ubuntu-16.04.6-server-amd64.iso)
- JDK 1.8 (jdk-8u201-linux-x64.tar.gz)
- Hadoop 2.7.7 (hadoop-2.7.7.tar.gz)
- Spark 2.1.0 (spark-2.1.0-bin-hadoop2.7.tgz)
1.2 網(wǎng)絡(luò)規(guī)劃
本文規(guī)劃搭建3臺機器組成集群模式,IP與計算機名分別為, 如果是單臺搭建,只需填寫一個即可
192.168.241.132 master
192.168.241.133 slave1
192.168.241.134 slave2
1.3 軟件包拷貝
可將上述軟件包拷貝到3臺機器的opt目錄下
- JDK 1.8
- Hadoop 2.7.7
- Spark 2.1.0
1.4 SSH設(shè)置
修改/etc/ssh/sshd_config文件,將以下三項開啟yes狀態(tài)
PermitRootLogin yes
PermitEmptyPasswords yes
PasswordAuthentication yes
重啟ssh服務(wù)
service ssh restart
這樣root用戶可直接登陸承璃,以及為后續(xù)ssh無密碼登錄做準備。
1.5 綁定IP和修改計算機名
1.5.1 修改/etc/hosts,添加IP綁定,并注釋127.0.1.1(不注釋會影響hadoop集群)
root@master:/opt# cat /etc/hosts
127.0.0.1 localhost
#127.0.1.1 ubuntu
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.241.132 master
192.168.241.133 slave1
192.168.241.134 slave2
1.5.2 修改/etc/hostname,為綁定計算機名互妓。(計算機名和上面hosts綁定名必須一致)
1.6 SSH無密碼登陸(需提前安裝ssh)
1.用rsa生成密鑰,一路回車坤塞。
ssh-keygen -t rsa
2.進到當前用戶的隱藏目錄(.ssh)
cd ~/.ssh
3.把公鑰復制一份冯勉,并改名為authorized_keys
cp id_rsa.pub authorized_keys
這步執(zhí)行完后,在當前機器執(zhí)行ssh localhost可以無密碼登錄本機了摹芙。
如本機裝有ssh-copy-id命令灼狰,可以通過
ssh-copy-id root@第二臺機器名
然后輸入密碼,在此之后在登陸第二臺機器浮禾,可以直接
ssh[空格]第二臺機器名
進行登錄交胚。初次執(zhí)行會提示確認,輸入yes和登陸密碼盈电,之后就沒提示了蝴簇。
1.7 JDK安裝(三臺機器可同步進行)
下載:jdk-8u201-linux-x64.tar.gz 包,放到/opt下解壓
1.7.1 將解壓后的文件夾重命名
mv jdk1.8.0_201 jdk
1.7.2 將JDK環(huán)境變量配置到/etc/profile中
export JAVA_HOME=/opt/jdk
export JRE_HOME=/opt/jdk/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$PATH
1.7.3 檢查JDK是否配置好
source /etc/profile
java -version
提示以下信息代表JDK安裝完成:
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
1.8 其他配置
1.8.1 網(wǎng)絡(luò)配置
修改為固定IP 挣轨,/etc/network/interfaces
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
#iface eth0 inet dhcp
iface eth0 inet static
address 192.168.241.132
netmask 255.255.255.0
gateway 192.168.20.1
重啟網(wǎng)絡(luò)
service networking restart
1.8.2 DNS配置
第一種方法军熏,永久改
修改/etc/resolvconf/resolv.conf.d/base(這個文件默認是空的)
nameserver 119.6.6.6
保存后執(zhí)行
resolvconf -u
查看resolv.conf 文件就可以看到我們的設(shè)置已經(jīng)加上
cat /etc/resolv.conf
重啟resolv
/etc/init.d/resolvconf restart
第二種方法,臨時改
修改 /etc/resolv.conf文件卷扮,增加
nameserver 119.6.6.6
重啟resolv
/etc/init.d/resolvconf restart
二荡澎、Hadoop部署
2.1 Hadoop安裝(三臺機器可同步進行)
- 下載hadoop2.7.7(hadoop-2.7.7.tar.gz)
- 解壓 tar -zxvf hadoop-2.7.7.tar.gz ,并在主目錄下創(chuàng)建tmp晤锹、dfs摩幔、dfs/name、dfs/node鞭铆、dfs/data
cd /opt/hadoop-2.7.7
mkdir tmp
mkdir dfs
mkdir dfs/name
mkdir dfs/node
mkdir dfs/data
2.2 Hadoop配置
以下操作都在hadoop-2.7.7/etc/hadoop下進行
2.2.1 編輯hadoop-env.sh文件或衡,修改JAVA_HOME配置項為JDK安裝目錄
export JAVA_HOME=/opt/jdk
2.2.2 編輯core-site.xml文件焦影,添加以下內(nèi)容
其中master為計算機名,/opt/hadoop-2.7.7/tmp為手動創(chuàng)建的目錄
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop-2.7.7/tmp</value>
<description>Abasefor other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.spark.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.spark.groups</name>
<value>*</value>
</property>
</configuration>
2.2.3 編輯hdfs-site.xml文件封断,添加以下內(nèi)容
其中master為計算機名斯辰,
file:/opt/hadoop-2.7.7/dfs/name和file:/opt/hadoop-2.7.7/dfs/data為手動創(chuàng)建目錄
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop-2.7.7/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop-2.7.7/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
復制mapred-site.xml.template并重命名為mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
2.2.4 編輯mapred-site.xml文件,添加以下內(nèi)容
其中master為計算機名
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
2.2.5 編輯yarn-site.xml文件坡疼,添加以下內(nèi)容
其中master為計算機名
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
2.2.6 修改slaves文件彬呻,添加集群節(jié)點(多機添加多個)
添加以下
master
slave1
slave2
2.2.7 Hadoop集群搭建
hadoop配置集群,可以將配置文件etc/hadoop下內(nèi)容同步到其他機器上柄瑰,既2.2.1-2.2.6無需在一個個配置闸氮。
cd /opt/hadoop-2.7.7/etc
scp -r hadoop root@另一臺機器名:/opt/hadoop-2.7.7/etc
2.3 Hadoop啟動
1.格式化一個新的文件系統(tǒng),進入到hadoop-2.7.7/bin下執(zhí)行:
./hadoop namenode -format
2.啟動hadoop教沾,進入到hadoop-2.7.7/sbin下執(zhí)行:
./start-all.sh
看到如下內(nèi)容說明啟動成功
root@master:/opt/hadoop-2.7.7/sbin# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-slave2.out
master: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-master.out
slave1: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-resourcemanager-master.out
slave2: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-slave2.out
slave1: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-slave1.out
master: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-master.out
2.4 Hadoop集群檢查
方法1:檢查hadoop集群蒲跨,進入hadoop-2.7.7/bin下執(zhí)行
./hdfs dfsadmin -report
查看Live datanodes 節(jié)點個數(shù),例如:Live datanodes (3)授翻,則表示3臺都啟動成功
root@master:/opt/hadoop-2.7.7/bin# ./hdfs dfsadmin -report
Configured Capacity: 621051420672 (578.40 GB)
Present Capacity: 577317355520 (537.67 GB)
DFS Remaining: 577317281792 (537.67 GB)
DFS Used: 73728 (72 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (3):
方法2:訪問8088端口或悲,
http://192.168.241.132:8088/cluster/nodes
方法3:訪問50070端口
http://192.168.241.132:50070/
三、Spark部署
3.1 Spark安裝(三臺機器可同步進行)
- 下載spark-2.1.0-bin-hadoop2.7.tgz藏姐,放到opt下解壓隆箩。
- 將spark環(huán)境變量配置到/etc/profile中
export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7
export PATH=$JAVA_HOME/bin:$SPARK_HOME/bin:$PATH
3.2 Spark配置
1.進入spark-2.1.0-bin-hadoop2.7/conf復制spark-env.sh.template并重命名為spark-env.sh
cp spark-env.sh.template spark-env.sh
編輯spark-env.sh文件,添加以下內(nèi)容
export JAVA_HOME=/opt/jdk
export SPARK_MASTER_IP=192.168.241.132
export SPARK_WORKER_MEMORY=8g
export SPARK_WORKER_CORES=4
export SPARK_EXECUTOR_MEMORY=4g
export HADOOP_HOME=/opt/hadoop-2.7.7/
export HADOOP_CONF_DIR=/opt/hadoop-2.7.7/etc/hadoop
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/jdk/jre/lib/amd64
2.把slaves.template拷貝為slaves,并編輯 slaves文件
cp slaves.template slaves
編輯slaves文件羔杨,添加以下內(nèi)容(多機添加多個)
master
slave1
slave2
3.3 配置Spark集群
可以將配置文件spark-2.1.0-bin-hadoop2.7/conf下內(nèi)容同步到其他機器上捌臊,既3.2無需在一個個配置。
scp -r conf root@另一臺機器名:/opt/spark-2.1.0-bin-hadoop2.7
3.4 Spark啟動
啟動spark兜材,進入spark-2.1.0-bin-hadoop2.7/sbin下執(zhí)行
./start-all.sh
3.5 Spark集群檢查
訪問http://192.168.241.134:8080/
==注意:配置Spark集群理澎,需要保證子節(jié)點內(nèi)容和主節(jié)點內(nèi)容一致。==
這樣Hadoop集群和Spark集群就都搭建好了曙寡。
(轉(zhuǎn)發(fā)請注明出處:https://www.zhangyongli.cc/ 如發(fā)現(xiàn)有錯糠爬,請留言,謝謝)