Spark的安裝模式一般分為三種:1.偽分布模式:即在一個節(jié)點上模擬一個分布式環(huán)境糕韧,master和worker共用一個節(jié)點,這種模式一般用于開發(fā)和測試Spark程序儡羔;2.全分布模式:即真正的集群模式萌壳,master和worker部署在不同的節(jié)點之上场梆,一般至少需要3個節(jié)點(1個master和2個worker),這種模式一般用于實際的生產(chǎn)環(huán)境晴弃;3.HA集群模式:即高可用集群模式掩幢,一般至少需要4臺機器(1個主master,1個備master上鞠,2個worker)际邻,這種模式的優(yōu)點是在主master宕機之后,備master會立即啟動擔(dān)任master的職責(zé)芍阎,可以保證集群高效穩(wěn)定的運行世曾,這種模式就是實際生產(chǎn)環(huán)境中多采用的模式。本小節(jié)來介紹Spark的全分布模式的安裝和配置谴咸。
1.linux環(huán)境準(zhǔn)備和搭建Hadoop全分布環(huán)境
Hadoop全分布模式的搭建過程請參看前面的文章:
linux環(huán)境和Hadoop環(huán)境搭建
2.安裝Scala
由于Scala只是一個應(yīng)用軟件轮听,只需要安裝在master節(jié)點即可。
1.上傳scala安裝包:scala-2.11.8
2.解壓scala安裝包:tar -zxvf scala-2.11.8.tgz
3.環(huán)境變量配置(三臺機器都做一遍):
]# vim /root/.bash_profile
SCALA_HOME=/usr/local/src/scala-2.11.8
export SCALA_HOME
PATH=$SCALA_HOME/bin:$PATH
export PATH
4.使環(huán)境變量生效:[root@master scala-2.11.8]# source /root/.bash_profile
5.驗證Scala是否安裝成功:
輸入scala命令寿冕,如下進入scala環(huán)境蕊程,則證明scala安裝成功:
Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144).
Type in expressions for evaluation. Or try :help.
scala>
5.分發(fā)到從節(jié)點:
scp -rp /usr/local/src/scala-2.11.8 slave1:/usr/local/src
scp -rp /usr/local/src/scala-2.11.8 slave2:/usr/local/src
3.安裝spark
1.上傳spark安裝包:
[root@master spark-2.1.0-bin-hadoop2.7]# pwd
/usr/local/src/spark-2.1.0-bin-hadoop2.7
2.解壓spark安裝包:
]# tar -zxvf spark-2.1.0-bin-hadoop2.7.tgz
3.配置spark環(huán)境變量(三臺機器都做一遍)
[root@master spark-2.1.0-bin-hadoop2.7]# vim /root/.bash_profile
SPARK_HOME=/usr/local/src/spark-2.1.0-bin-hadoop2.7
export SPARK_HOME
PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
export PATH
使環(huán)境變量生效:]# source /root
4.配置spark參數(shù):
4.1配置spark-env.sh文件:
]# cp spark-env.sh.template spark-env.sh
]# vim spark-env.sh
export JAVA_HOME=/usr/local/src/jdk1.8.0_162
export HADOOP_HOOME=/usr/local/src/hadoop-2.7.3
export HADOOP_CONF_DIR=/usr/local/src/hadoop-2.7.3/etc/hadoop
export SCALA_HOME=/usr/local/src/scala-2.11.8
export SPARK_MASTER_HOST=master
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=1G
4.2配置slaves文件:
]# cp slaves.template slaves
]# vim slaves
slave1
slave2
4.3分發(fā)安裝包給從節(jié)點:
將master上配置好的Spark安裝目錄分別復(fù)制給兩個從節(jié)點slave1和slave2,并驗證是否成功驼唱。
scp -rp spark-2.1.0-bin-hadoop2.7/ slave1:/usr/local/src/
scp -rp spark-2.1.0-bin-hadoop2.7/ slave2:/usr/local/src/
4.4在master節(jié)點上啟動Spark全分布模式(啟動之前已經(jīng)啟動了hadoop)
[root@master conf]# start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/src/spark-2.1.0-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark-2.1.0-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark-2.1.0-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
5.Spark的驗證
5.1進程查看
[root@master conf]# jps
2709 SecondaryNameNode
4839 Jps
2522 NameNode
4763 Master
2863 ResourceManager
[root@slave1 ~]# jps
2288 NodeManager
2180 DataNode
2905 Jps
2846 Worker
[root@slave2 ~]# jps
2262 NodeManager
2153 DataNode
3467 Worker
3531 Jps
5.2使用瀏覽器監(jiān)控Spark的狀態(tài):
瀏覽器訪問:master:8080
5.3進入spark-shell
使用spark-shell命令進入SparkContext(即Scala環(huán)境):
[root@master ~]# spark-shell
Setting default log level to "WARN".
……
Welcome to
____ __
/ / ___ _____/ /__
\ / _ / _ `/ / '/
// ./_,// //_\ version 2.1.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
啟動了spark-shell之后藻茂,可以使用4040端口訪問其Web控制臺頁面(注意:如果一臺機器上啟動了多個spark-shell,即運行了多個SparkContext,那么端口會自動連續(xù)遞增辨赐,如4041,4042,4043等等):
5.4 運行簡單示例
-
本地模式:
]# ./bin/run-example SparkPi 10 --master local[2]
-
集群模式 Spark Standalone:
]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077 examples/jars/spark-examples_2.11-2.1.0.jar 100
集群模式 Spark on Yarn集群上yarn-cluster模式:
]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster examples/jars/spark-examples_2.11-2.1.0.jar 10
訪問master:8088,找到運行完的application优俘,點擊查看logs日志如下:
5.5停止spark全分布式模式
[root@master ~]# stop-all.sh
slave2: stopping org.apache.spark.deploy.worker.Worker
slave1: stopping org.apache.spark.deploy.worker.Worker
stopping org.apache.spark.deploy.master.Master
Spark中常用的端口總結(jié):
查看spark運行具體情況master端口:7077
查看spark運行狀態(tài)master Web端口:8080
spark-shell 端口:4040
至此,Spark完全分布式環(huán)境搭建完成掀序!