docker
spark
1.準(zhǔn)備工作
三個(gè) docker 容器缤灵,操作系統(tǒng)為:Ubuntu 14.04
ip | 機(jī)器名稱 | 集群節(jié)點(diǎn) | 登錄用戶 |
---|---|---|---|
17.172.192.108 | Hadoop1 | master/slave | tank |
17.172.192.123 | Hadoop2 | slave | tank |
17.172.192.124 | Hadoop3 | slave | tank |
2.安裝jdk并配置環(huán)境變量
1)解壓縮文件
tar -zxvf jdk-8u141-linux-x64.tar.gz /usr/local/java
2)配置環(huán)境變量
- 打開 vi
sudo vi /etc/profile
- 在打開的profile末尾添加環(huán)境變量
export JAVA_HOME=/usr/local/java/jdk.1.8.0_141
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$PATH:${JAVA_HOME}/bin
- 讓文件生效
source /etc/profile
- 驗(yàn)證 Java 環(huán)境配置
java -version
3.安裝和配置Scala
1)下載Scala安裝包
wget https://downloads.lightbend.com/scala/2.12.7/scala-2.12.7.tgz
2)解壓
tar -zxvf scala-2.12.7.tgz
3)復(fù)制到/usr下面
docker mv scala-2.12.7 /usr
4)配置環(huán)境變量
vi /etc/profile
export SCALA_HOME=/usr/scala-2.12.7
export PATH=$SCALA_HOME/bin:$PATH
5)保存后刷新配置
source /etc/profile
6)驗(yàn)證是否配置成功
scala -version
4.配置SSH免密登錄
1)生成ssh秘鑰
ssh -keygen
2) 將秘鑰導(dǎo)入authorized_keys,配置成免密碼登錄本地
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
3)測(cè)試免密碼登錄本機(jī)
ssh localhost
注:docker容器之間通信爆安,不用防火墻
5.安裝 Hadoop
1)解壓縮下載之后的hadoop文件
tar -zxvf hadoop-2.7.3.tar.gz /usr/local/hadoop/
2) 配置core-site.xml
<!-- 指定HDFS老大(namenode)的通信地址 -->
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop1:9000</value><!-- 主節(jié)點(diǎn)寫localhost 從節(jié)點(diǎn)寫hadoop1 -->
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/tank/hadoop/tmp</value>
</property>
3)配置hdfs-site.xml
<configuration>
<property> <!--此項(xiàng)非必須配-->
<name>dfs.namenode.secondary.http-address</name>
<value>master:50900</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/tank/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/tank/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>10</value><!--namenode通信線程數(shù)氢烘,太小會(huì)導(dǎo)致通信阻塞-->
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>10737418240</value><!--硬盤保留空間怀偷,10G,單位字節(jié)-->
</property>
</configuration>
4)配置mapred-site.xml
<configuration>
<property>
<name>mapred.child.java.opts</name><!--map或red的JVM堆大小播玖,應(yīng)<=mapreduce.*.memory.mb-->
<value>-Xmx1000m</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name><!--map任務(wù)容器的內(nèi)存大小-->
<value>1024MB</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name><!--reduce任務(wù)容器的內(nèi)存大小-->
<value>1024MB</value>
</property>
<property>
<name>mapreduce.job.reduce.slowstart.completedmaps</name><!--調(diào)度reduce之前map完成進(jìn)度-->
<value>0.5</value>
</property>
<property>
<name>mapreduce.jobtracker.taskscheduler</name><!--任務(wù)調(diào)度算法,默認(rèn)FIFO-->
<value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value>
</property>
<property>
<name>mapreduce.map.maxattempts</name><!--map最大嘗試次數(shù)-->
<value>3</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop1:19888</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hadoop1:9001</value>
</property>
</configuration>
5)配置yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value> <!-- 逗號(hào)分隔的輔助服務(wù)列表-->
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value> <!-- 可分配給容器的物理內(nèi)存總和-->
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value> <!-- 啟動(dòng)容器需要向資源管理器申請(qǐng)的最小內(nèi)存量-->
</property>
<property>
<name>yarn.scheduler.maxmum-allocation-mb</name>
<value>8192</value> <!-- 啟動(dòng)容器需要向資源管理器申請(qǐng)的最大內(nèi)存量-->
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
</configuration>
6)修改hadoop-env.sh,配置jdk路徑
export JAVA_HOME=/usr/local/java/jdk1.8.0_141
7)添加hadoop環(huán)境變量
sudo vi /etc/profile
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.3
export PATH=$PATH:${HADOOP_HOME}/bin
8)在集群所有節(jié)點(diǎn)進(jìn)行前15步操作,并進(jìn)行ssh互相免密碼登錄設(shè)置
- 修改各個(gè)節(jié)點(diǎn)的/etc/hosts文件,添加
17.172.192.108 hadoop1
17.172.192.123 hadoop2
17.172.192.124 hadoop3
- 將主節(jié)點(diǎn)的id_rsa.pub遠(yuǎn)程發(fā)送至所有葉子節(jié)點(diǎn),命名為master.pub
rcp id_rsa.pub hadoop@hadoop2:~/.ssh/master.pub
rcp id_rsa.pub hadoop@hadoop3:~/.ssh/master.pu
- 將主節(jié)點(diǎn)的master.pub追加到所有葉子節(jié)點(diǎn)的authorized_keys文件中椎工,最終結(jié)果為主節(jié)點(diǎn)可以免密碼登錄到所有葉子節(jié)點(diǎn)
9)配置集群從節(jié)點(diǎn)
修改$HADOOP_HOME/etc/hadoop目錄下的slaves文件,改為一下內(nèi)容蜀踏,代表三臺(tái)機(jī)器都作為從節(jié)點(diǎn)參與任務(wù)
hadoop1
hadoop2
hadoop3
10)啟動(dòng)hadoop集群
cd $HADOOP_HOME
sbin/start-all.sh
11)查看集群運(yùn)行狀態(tài)
jps
NodeManager
Jps
NameNode
ResourceManager
SecondaryNameNode
DataNode
12)啟動(dòng)jobhistory進(jìn)程
sbin/mr-jobhistory-daemon.sh start historyserver
jps
NodeManager
Jps
NameNode
ResourceManager
JobHistoryServer
SecondaryNameNode
DataNode
JobHistoryServer
//子節(jié)點(diǎn)上的進(jìn)程
Jps
NodeManage
DataNode
6.Spark2.1.0完全分布式環(huán)境搭建
以下操作都在Master節(jié)點(diǎn)(Hadoop1)進(jìn)行
1)下載二進(jìn)制包spark-2.3.2-bin-hadoop2.7.tgz
2)解壓并移動(dòng)到相應(yīng)目錄维蒙,命令如下:
tar -zxvf spark-2.3.2-bin-hadoop2.7.tgz
mv spark-2.3.2-bin-hadoop2.7.tgz /opt
3)修改相應(yīng)的配置文件
- /etc/profie
export SPARK_HOME=/opt/spark-2.3.2-bin-hadoop2.7/
export PATH=$PATH:$SPARK_HOME/bin
- 復(fù)制spark-env.sh.template成spark-env.sh
cp spark-env.sh.template spark-env.sh
- 修改$SPARK_HOME/conf/spark-env.sh,添加如下內(nèi)容:
export JAVA_HOME=/usr/local/jdk1.8.0_141
export SCALA_HOME=/usr/scala-2.12.7
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.3
export HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-2.7.3/etc/hadoop
export SPARK_MASTER_IP=172.17.192.108
export SPARK_MASTER_HOST=172.17.192.108
export SPARK_LOCAL_IP=172.17.192.108
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_CORES=2
export SPARK_HOME=/opt/spark-2.3.2-bin-hadoop2.7
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/hadoop-2.7.3/bin/hadoop classpath)
- 復(fù)制slaves.template成slaves
cp slaves.template slaves
5)修改Slave1和Slave2配置
在Slave1和Slave2上分別修改/etc/profile果覆,增加Spark的配置颅痊,過程同Master一樣。
在Slave1和Slave2修改$SPARK_HOME/conf/spark-env.sh局待,將export > >SPARK_LOCAL_IP=172.17.192.108改成Slave1和Slave2對(duì)應(yīng)節(jié)點(diǎn)的IP斑响。
6)在Master節(jié)點(diǎn)啟動(dòng)集群
/opt/spark-2.3.2-bin-hadoop2.7/sbin/start-all.sh
7)查看集群是否啟動(dòng)成功
jps
Master在Hadoop的基礎(chǔ)上新增了:
Master
Slave在Hadoop的基礎(chǔ)上新增了:
Worker