阿里騰訊云 hadoop+spark集群搭建(2)
linux版本: centos7
hadoop版本: 3.1.1
spark版本: 2.3.2
在1中已經(jīng)搭建好了hadoop,接下來(lái)就是spark岩喷。
為方便起見(jiàn),用shell腳本完成下載spark、hive(后面再搭东羹,先把spark跑起來(lái))的操作枫攀。
download_file.sh
-------------------------------
#!/bin/bash
TARGET=files
HADOOP_VERSION=3.1.1
HIVE_VERSION=2.3.3
SPARK_VERSION=2.3.2
HADOOP_FILE=hadoop-$HADOOP_VERSION.tar.gz
HIVE_FILE=apache-hive-$HIVE_VERSION-bin.tar.gz
SPARK_FILE=spark-$SPARK_VERSION-bin-hadoop2.7.tgz
if [ ! -f "$HADOOP_FILE" ]; then
echo "https://www-us.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/$HADOOP_FILE is downloading"
curl -O https://www-us.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/$HADOOP_FILE
fi
echo "Hadoop is completed!"
if [ ! -f "$HIVE_FILE" ]; then
echo "https://www-us.apache.org/dist/hive/hive-$HIVE_VERSION/$HIVE_FILE is downloading"
curl -O https://www-us.apache.org/dist/hive/hive-$HIVE_VERSION/$HIVE_FILE
fi
echo "HIVE is completed!"
if [ ! -f "$SPARK_FILE" ]; then
echo "https://www-us.apache.org/dist/spark/spark-$SPARK_VERSION/$SPARK_FILE is downloading"
curl -O https://www-us.apache.org/dist/spark/spark-$SPARK_VERSION/$SPARK_FILE
fi
echo "$SPARK_FILE completed!"
運(yùn)行腳本下載spark和hive
-
解壓縮到~/hadoop下(hadoop用戶身份)
$ cd ~/hadoop $ ls hadoop-3.1.1 hive-2.3.3 spark-2.3.2
-
下載Anaconda3
$ cd ~/hadoop $ ls hadoop-3.1.1 hive-2.3.3 spark-2.3.2
由于我打算使用pyspark蛀缝,且想使用python3,所以安裝Anaconda3橄抹。
$ curl -O https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-5.3.0-Linux-x86_64.sh $ ./Anaconda3-5.3.0-Linux-x86_64.sh
-
配置文件
- spark-env.sh
追加如下內(nèi)容
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64 export JRE_HOME=${JAVA_HOME}/jre export HADOOP_HOME=/home/hadoop/hadoop/hadoop-3.1.1 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export SPARK_MASTER_PORT=7077 export SPARK_MASTER_HOST=master
- slaves
slave1 slave2
- log4j.properties
若想日志信息僅記載警告信息,則將
log4j.rootCategory=INFO, console
改為log4j.rootCategory=WARN, console
- /etc/profile.d/spark-2.3.2.sh
export SPARK_HOME=/home/hadoop/hadoop/spark-2.3.2 export PATH=$SPARK_HOME/bin:$PATH
$ source /etc/profile
以上內(nèi)容master和slaves機(jī)器都要進(jìn)行惕味,master配置好拷貝過(guò)去就可以了楼誓。
-
master機(jī)器運(yùn)行spark
$ cd $SPARK_HOME/sbin $ ./start-all.sh <!--先執(zhí)行start-master再執(zhí)行start-slave--> $ jps 16000 Master 15348 NameNode 15598 SecondaryNameNode 16158 Jps <!--只啟動(dòng)了hdfs和spark-->
打開(kāi)http://master:8080(公網(wǎng)),顯示Alive Workers:2名挥,完成疟羹。