之前用CDH5.2進行集群的搭建端逼,現(xiàn)需要將CDH支持spark-sql,具體搭建請見CDH離線安裝
一:準備環(huán)境
jdk1.7.0_79
scala2.10.4
maven3.3.9
spark-1.1.0.tgz
配置環(huán)境變量如下,并使其生效:source /etc/profile
export JAVA_HOME=/usr/local/jdk1
export M2_HOME=/usr/local/maven
export SCALA_HOME=/usr/local/scala
export PATH=$JAVA_HOME/bin:$M2_HOME/bin:$SCALA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
現(xiàn)已有編譯好的spark
二:編譯spark源碼
1. 重新設置maven編譯所占空間坟比,因為編譯過程復雜本姥、時間長
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
2解壓源碼并進行編譯
nohup mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=hadoop2.5.0-cdh5.2.0 -Dscala-2.10.4 -Phive -Phive-thriftserver -DskipTests clean package > ./spark-mvn-date +%Y%m%d%H
.log 2>&1 &
三.安裝spark assembly
1.拷貝assembly jar包
將編譯好的assembly包拷貝到指向CDH的jars目錄下
$cp spark-assembly-1.1.0-hadoop2.4.0.jar /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/
2.替換CDH中spark下的assembly jar包
修改軟鏈接
$ cd /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark/assembly/lib
$ ln -s ../../../jars/spark-assembly-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar spark-assembly-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar
$ ln -s spark-assembly-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar spark-assembly.jar
3.拷貝spark-sql運行文件
從spark源文件的bin下拷貝到CDH的spark的bin目錄下
$ mv /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql.bak
$ cp /root/spark-1.1.0-bin-hadoop2.4/bin/spark-sql /opt/cloudera/parcels/CDH/lib/spark/bin/
4.配置環(huán)境變量
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HADOOP_CMD=/opt/cloudera/parcels/CDH/bin/hadoop
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export SCALA_HOME=/usr/local/scala-2.10.4
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SCALA_HOME/bin
5.拷貝assembly jar包拷貝到HDFS
首先需要將assembly jar拷貝到HDFS的/user/spark/share/lib目錄下,修改文件權限為755
6.在CM上配置
登陸CM,修改spark的服務范圍為assembly jar在HDFS中的路徑
修改服務范圍
修改高級配置
修改客戶端配置
7.運行spark-sql
運行sql
四:關閉spark-sql的INFO信息
1.備份log4j.properties
進入$SPARK_HOME/conf目錄下
$cp /opt/cloudera/parcels/CDH/lib/spark/conf/log4j.properties /opt/cloudera/parcels/CDH/lib/spark/conf/log4j.properties.bak
2. 進入log4j.properties文件焦辅,將其中的INFO修改為WARN(第二行位置)博杖,內(nèi)容如下:
修改文件
3 local class incompatible: stream classdesc serialVersionUID = 5017373498943810947, local class serialVersionUID = 18257903091306170
解決方案:client端類版本與server端不一致,將client端的jar包上傳到hdfs上并配置spark文件