hadoop集群搭建完畢望众,spark-standalone集群搭建結(jié)束后碉钠,能正常啟動
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://hadoop002:7077 \
./examples/jars/spark-examples_2.12-3.0.0.jar \
10
這時候開始配置歷史服務(wù):
spark-shell停止后驳糯,集群監(jiān)控hadoop002:4040就看不到歷史任務(wù)的運(yùn)行情況吼鳞,所以開發(fā)時配置歷史服務(wù)器記錄任務(wù)運(yùn)行情況奄抽。
修改【spark-defaults.conf.template】去掉template蔼两,并加上spark.eventLog.enabled true spark.eventLog.dir hdfs://hadoop002:8020/directory #directory要保證事先存在于hdfs上
修改【spark-env.sh】文件,添加日志配置
export SPARK_HISTORY_OPTS=" -Dspark.history.ui.port=18080 -Dspark.history.fs.logDirectory=hdfs://hadoop002:8020/directory -Dspark.history.retainedApplications=30"
[root]分發(fā)配置:standlone目錄下逞度,xsync conf/
[childe-h]啟動hadoop集群额划,并保證directory目錄的存在sbin/start-dfs.sh #存在則不用創(chuàng)建。首次創(chuàng)建后档泽,之后對dfs進(jìn)行-format格式化才需要再次創(chuàng)建 hadoop fs -mkdir /directory
[childe-h]啟動集群:standlone目錄下俊戳,
sbin/start-all.sh sbin/start-history-server.sh #啟動歷史服務(wù)器
啟動報錯
21/5/19 16:30:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/5/19 16:30:33 ERROR cluster.StandaloneSchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
21/5/19 16:30:33 ERROR netty.Inbox: Ignoring error
org.apache.spark.SparkException: Exiting due to error from cluster scheduler: Master removed our application: FAILED
at org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:459)
at org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.dead(StandaloneSchedulerBackend.scala:139)
at ...
19/11/15 16:28:31 ERROR spark.SparkContext: Error initializing SparkContext.
...
有個worker日志,內(nèi)容如下:
java.io.IOException: Failed to create directory /soft/spark/work/app-...
at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:450)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
...
worker無法創(chuàng)建日志馆匿,到目錄下一看抑胎,worker是root所屬文件,而其他是自己用戶的所屬文件甜熔。
兩種解決方法圆恤。
一個是把這個文件改用戶,改成要使用的用戶
chown xxx.xxx worker -R
另一個是修改spark-env.sh文件并分發(fā),我在這個文件中指定了work目錄盆昙,如下
# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS The scheduling priority for daemons. (Default: 0)
# - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
#主節(jié)點(diǎn)的IP
export SPARK_MASTER_IP=b1
#主節(jié)點(diǎn)的端口號羽历,用來與worker通信
export SPARK_MASTER_PORT=7077
#每一個worker進(jìn)程所能管理的核數(shù)
export SPARK_WORKER_CORES=2
#每一個worker進(jìn)程所能管理的內(nèi)存數(shù)
export SPARK_WORKER_MEMORY=1G
#worker的工作目錄區(qū)
export SPARK_WORKER_DIR=/usr/local/spark/spark-standlone