前言:因為線上生產(chǎn)環(huán)境和實際業(yè)務(wù)需求的復(fù)雜性,不可避免地需要修改spark源碼,重新編譯并測試完成后應(yīng)用于線上生產(chǎn)環(huán)境。本文主要介紹作者在Linux(centos 6.5)上重新編譯spark-2.2.1源碼的過程,以及部署編譯環(huán)境所遇到的坑憎乙。
一. 下載源碼
git clone git://github.com/apache/spark.git -b branch-2.2.1 (可能下載不到舊版本)
執(zhí)行完成后,spark源碼會下載在/home/${user_name}/spark
目錄叉趣。
也可通過以下方式獲取指定版本源碼包:
wget https://archive.apache.org/dist/spark/spark-2.2.1/spark-2.2.1.tgz
二. 編譯源碼
./build/mvn -Phadoop-2.7 -Pyarn -Dhadoop.version=2.7.3 -Phive -Phive-thriftserver clean package -Dmaven.test.skip=true
參數(shù)介紹:
-Phadoop:Hadoop版本號泞边,默認(rèn)版本2.6.5;
-Dhadoop.version: 同-Phadoop疗杉;
-Pyarn :是否支持Hadoop YARN阵谚;
-Phive:是否在Spark SQL 中支持hive,hive默認(rèn)版本1.2.1;
-Phive-thriftserver:同-Phive梢什;
-Dmaven.test.skip=true:不執(zhí)行測試用例奠蹬,也不編譯測試用例類;
【填坑一】 SSL connect error
報錯信息如下:
[root@cbas-virt-20 spark]# ./build/mvn -Phadoop-2.7 -Pyarn -Dhadoop.version=2.7.3 -Phive -Phive-thriftserver clean package -Dmaven.test.skip=true
exec: curl --progress-bar -L https://downloads.typesafe.com/zinc/0.3.15/zinc-0.3.15.tgz
curl: (35) SSL connect error
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
exec: curl --progress-bar -L https://downloads.typesafe.com/scala/2.11.8/scala-2.11.8.tgz
curl: (35) SSL connect error
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
./build/mvn: line 119: cd: /root/spark/build/scala-2.11.8/bin/../lib: 沒有那個文件或目錄
./build/mvn: line 120: cd: /root/spark/build/scala-2.11.8/bin/../lib: 沒有那個文件或目錄
./build/mvn: line 143: /root/spark/build/zinc-0.3.15/bin/zinc: 沒有那個文件或目錄
./build/mvn: line 145: /root/spark/build/zinc-0.3.15/bin/zinc: 沒有那個文件或目錄
Using `mvn` from path: /root/spark/build/apache-maven-3.3.9/bin/mvn
報錯分析:
通過./build/mvn
編譯源碼時嗡午,下載zinc-0.3.15和scala-2.11.8報錯囤躁,編譯服務(wù)器無法與https://downloads.typesafe.com
建立SSL連接。
驗證(通過wget下載):
[root@cbas-virt-20 ~]# wget https://downloads.typesafe.com/zinc/0.3.15/zinc-0.3.15.tgz
--2018-04-24 13:46:02-- https://downloads.typesafe.com/zinc/0.3.15/zinc-0.3.15.tgz
正在解析主機 downloads.typesafe.com... 54.230.129.25, 54.230.129.53, 54.230.129.138, ...
正在連接 downloads.typesafe.com|54.230.129.25|:443... 已連接翼馆。
無法建立 SSL 連接割以。
解決方案:
進入./build/mvn
修改對應(yīng)url:
方案一:
curl --progress-bar -L http://downloads.typesafe.com/scala/2.11.8/scala-2.11.8.tgz
curl --progress-bar -L http://downloads.typesafe.com/zinc/0.3.15/zinc-0.3.15.tgz
方案二:
curl --progress-bar -L http://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz
curl --progress-bar -L http://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz
【填坑二】無效的源發(fā)行版
報錯信息如下:
[INFO] Using zinc server for incremental compilation
[info] 'compiler-interface' not yet compiled for Scala 2.11.8. Compiling...
[info] Compilation completed in 12.407 s
[warn] Pruning sources from previous analysis, due to incompatible CompileSetup.
[info] Compiling 2 Scala sources and 6 Java sources to /root/spark/common/tags/target/scala-2.11/classes...
[error] javac: 無效的源發(fā)行版: 1.8
[error] 用法: javac <options> <source files>
[error] -help 用于列出可能的選項
[error] Compile failed at 2018-4-24 14:48:20 [14.062s]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [03:33 min]
[INFO] Spark Project Tags ................................. FAILURE [ 36.154 s]
報錯分析:
JDK版本不對金度,spark-2.2.1編譯不再支持JDK1.7应媚,所有的編譯和運行都只能在JDK1.8上進行。
解決方案:
從官網(wǎng)下載JDK1.8:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
mv jdk-8u171-linux-x64.tar.gz /usr/java/
tar -zxvf jdk-8u171-linux-x64.tar.gz
vim /etc/profile => export JAVA_HOME=/usr/java/jdk1.8.0_171
source /etc/profile
java -version
javac -version
如果發(fā)現(xiàn)Linux上的JDK并未更新為JDK1.8猜极,需要再進行如下操作:
查看新下載的JDK是否在Linux JDK菜單中:
update-alternatives --config java
update-alternatives --config javac
如果不在中姜,則按如下方式添加:
update-alternatives --install /usr/bin/java java /usr/java/jdk1.8.0_171/bin/java 300
update-alternatives --install /usr/bin/javac javac /usr/java/jdk1.8.0_171/bin/javac 300
再選擇相應(yīng)序號進行系統(tǒng)JDK版本切換(回車確認(rèn)):
update-alternatives --config java
update-alternatives --config javac
驗證:
java -version
javac -version
三. 編譯成功確認(rèn)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 4.242 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 3.027 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 9.362 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 3.107 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 6.164 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 10.022 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 3.123 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 5.299 s]
[INFO] Spark Project Core ................................. SUCCESS [01:31 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [02:34 min]
[INFO] Spark Project GraphX ............................... SUCCESS [ 23.556 s]
[INFO] Spark Project Streaming ............................ SUCCESS [ 31.368 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [03:23 min]
[INFO] Spark Project SQL .................................. SUCCESS [05:03 min]
[INFO] Spark Project ML Library ........................... SUCCESS [01:35 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 9.692 s]
[INFO] Spark Project Hive ................................. SUCCESS [01:02 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 5.806 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 12.565 s]
[INFO] Spark Project YARN ................................. SUCCESS [ 32.809 s]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 24.966 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 4.969 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:00 min]
[INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [ 15.672 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 25.670 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 5.885 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 20:10 min
[INFO] Finished at: 2018-04-24T19:37:26+08:00
[INFO] Final Memory: 85M/1177M
[INFO] ------------------------------------------------------------------------