本節(jié)主要內(nèi)容:
Spark環(huán)境部署
Spark擁有Hadoop MapReduce所具有的優(yōu)點(diǎn)赏寇;但不同于MapReduce的是--Job中間輸出結(jié)果可以保存在內(nèi)存中蜈首,從而不再需要讀寫HDFS溪王,因此Spark能更好地適用于數(shù)據(jù)挖掘與機(jī)器學(xué)習(xí)等需要迭代的MapReduce的算法磅叛。
spark5個(gè)組件
spark-core: spark核心包
spark-worker: spark-worker用的腳本
spark-master: spark-master用的腳本
spark-python: Spark的Python客戶端
spark-history-server: 任務(wù)歷史服務(wù)
1.系統(tǒng)環(huán)境:
OS:CentOS Linux release 7.5.1804 (Core)
CPU:2核心
Memory:1GB
運(yùn)行用戶:root
JDK版本:1.8.0_252
Hadoop版本:cdh5.16.2
2.集群各節(jié)點(diǎn)角色規(guī)劃為:
172.26.37.245 node1.hadoop.com---->namenode抽高,zookeeper,journalnode跛梗,hadoop-hdfs-zkfc寻馏,resourcenode,historyserver核偿,hbase诚欠,hbase-master,hive宪祥,hive-metastore聂薪,hive-server2,hive-hbase,sqoop蝗羊,impala,impala-server仁锯,impala-state-store耀找,impala-catalog,pig,spark-core野芒,spark-master蓄愁,spark-worker,spark-python
172.26.37.246 node2.hadoop.com---->datanode狞悲,zookeeper撮抓,journalnode,nodemanager摇锋,hadoop-client丹拯,mapreduce,hbase-regionserver荸恕,impala乖酬,impala-server,hive融求,spark-core咬像,spark-worker,spark-history-server生宛,spark-python
172.26.37.247? node3.hadoop.com---->datanode县昂,nodemanager,hadoop-client陷舅,mapreduce倒彰,hive,mysql-server蔑赘,impala狸驳,impala-server,
172.26.37.248? node4.hadoop.com---->namenode缩赛,zookeeper耙箍,journalnode,hadoop-hdfs-zkfc酥馍,hive辩昆,hive-server2,impala-shell
3.環(huán)境說明:
本次追加部署
172.26.37.245 node1.hadoop.com---->spark-core旨袒,spark-master汁针,spark-worker,spark-python
172.26.37.246 node2.hadoop.com---->spark-core砚尽,spark-worker施无,spark-history-server,spark-python
一.安裝
node1節(jié)點(diǎn)
? ? ? ? ? ?# yum install -y spark-core spark-master spark-worker spark-python
node2節(jié)點(diǎn)
? ? ? ? ? ?# yum install -y spark-core spark-worker spark-history-server spark-python
二.配置
node1必孤、node2節(jié)點(diǎn)
? ? ? ? ? ?# cp -p /etc/spark/conf/spark-env.sh /etc/spark/conf/spark-env.sh.20200705
? ? ? ? ? ?# vi /etc/spark/conf/spark-env.sh
插入以下內(nèi)容
export STANDALONE_SPARK_MASTER_HOST=‘node1.hadoop.com‘? #注意猾骡,這里要改為單引號瑞躺。
Spark History Server需要的hdfs文件夾 /user/spark/applicationHistory/
? ? ? ? ? ?#?sudo -u hdfs hadoop fs -mkdir -p /user/spark/applicationHistory
? ? ? ? ? ?#?sudo -u hdfs hadoop fs -chown -R spark:spark /user/spark
? ? ? ? ? ?# sudo -u hdfs hadoop fs -chmod 1777 /user/spark/applicationHistory
node1、node2節(jié)點(diǎn)
在spark worker端兴想,修改/etc/spark/conf/spark-defaults.conf
? ? ? ? ? ?# cp -p /etc/spark/conf/spark-defaults.conf /etc/spark/conf/spark-defaults.conf.20200705
? ? ? ? ? ?# vi /etc/spark/conf/spark-defaults.conf
插入以下內(nèi)容:
spark.eventLog.dir? hdfs://cluser1/user/spark/applicationHistory
spark.eventLog.enabled=true
node1幢哨、node2節(jié)點(diǎn)復(fù)制hdfs-site.xml到/etc/spark/conf下
? ? ? ? ? ?# cp /etc/hadoop/conf/hdfs-site.xml /etc/spark/conf/
三.啟動(dòng)Spark
Node1節(jié)點(diǎn)
# service spark-master start
# service spark-master status
# service spark-worker start
# service spark-worker status
Node2節(jié)點(diǎn)
# service spark-worker start
# service spark-worker status
# service spark-history-server start
# service spark-history-server status
打開瀏覽器訪問 http://172.26.37.245:18080 可以看到Spark的管理界面
使用 spark-shell 命令進(jìn)入spark shell
# sudo -u spark spark-shell
Setting default log level to "WARN".
scala>