這個(gè)錯(cuò)誤是個(gè)老哥布林了炎疆,原因一般上就是 Spark 編譯時(shí)的版本以及scala版本和運(yùn)行環(huán)境上不一致導(dǎo)致的;但總是會(huì)動(dòng)不動(dòng)就踩一下這個(gè)錯(cuò)誤偶芍;記錄一下這次錯(cuò)誤纬黎;
錯(cuò)誤日志
在本地開(kāi)發(fā)調(diào)試 ccp 的過(guò)程中,一次部署后出了問(wèn)題闽晦,應(yīng)用提交和交互式操作應(yīng)用都出現(xiàn)了這個(gè)錯(cuò)誤扳碍;
WARN ] 2020-07-24 16:13:09,468(252275) --> [SchedulerFactory4] com.cgws.ccp.interactive.socket.InteractiveServer.onStatusChange(InteractiveServer.java:379): Job section_1595578364911_390576819 is finished, status: ERROR, exception: null, result: %text java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.kafka010.KafkaSourceProvider could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at com.cgws.ccp.spark.util.ExternalTools$.loadJDBCReader(ExternalTools.scala:189)
at com.cgws.ccp.spark.util.ExternalTools$.readJDBCMysql(ExternalTools.scala:205)
at com.cgws.ccp.spark.util.ExternalTools$.readJDBC(ExternalTools.scala:122)
at com.cgws.ccp.spark.util.SparkUtils$.loadJdbcSource(SparkUtils.scala:341)
at com.cgws.ccp.spark.util.SparkUtils$.loadSource(SparkUtils.scala:297)
at com.cgws.ccp.spark.job.SparkScript$$anonfun$createTempViewForExternalSource$1.apply(SparkScript.scala:89)
at com.cgws.ccp.spark.job.SparkScript$$anonfun$createTempViewForExternalSource$1.apply(SparkScript.scala:80)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at com.cgws.ccp.spark.job.SparkScript.createTempViewForExternalSource(SparkScript.scala:79)
at com.cgws.ccp.spark.job.SparkScript$$anonfun$post$2.apply(SparkScript.scala:62)
at com.cgws.ccp.spark.job.SparkScript$$anonfun$post$2.apply(SparkScript.scala:53)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at com.cgws.ccp.spark.job.SparkScript.post(SparkScript.scala:53)
at com.cgws.ccp.spark.interpreter.CCPSparkSqlInterpreter.internalInterpret(CCPSparkSqlInterpreter.scala:72)
at com.cgws.ccp.interpreter.interpreter.AbstractInterpreter.interpret(AbstractInterpreter.java:47)
at com.cgws.ccp.interpreter.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:110)
at com.cgws.ccp.interpreter.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:685)
at com.cgws.ccp.interpreter.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:578)
at com.cgws.ccp.interpreter.scheduler.Job.run(Job.java:172)
at com.cgws.ccp.interpreter.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:130)
at com.cgws.ccp.interpreter.scheduler.ParallelScheduler.lambda$runJobInScheduler$0(ParallelScheduler.java:39)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
at org.apache.spark.sql.kafka010.KafkaSourceProvider.<init>(KafkaSourceProvider.scala:44)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 43 more
分析原因
- 1 馬上檢查代碼中 spark 版本和 scala 版本;沒(méi)有問(wèn)題仙蛉;2.4.3 + 2.11 和 deploy 的spark 一致笋敞;
- 2 這中間有一些代碼修改,回退代碼重新對(duì)項(xiàng)目打包部署荠瘪,運(yùn)行 OK夯巷;
- 3 手工打包 spark uber jar;替換部署之后發(fā)現(xiàn)問(wèn)題重現(xiàn)哀墓;
到這一步趁餐,這個(gè) uber jar 就很可疑了,他和完整打包的 uber 不一樣篮绰;
- 4 jar -tf 將其內(nèi)容進(jìn)行對(duì)比
他們的 scala 版本不一致澎怒;
打開(kāi) uber jar 發(fā)現(xiàn),同時(shí)出現(xiàn)了 spark*2.11 和 2.12 的依賴(lài)阶牍;導(dǎo)致出現(xiàn)本文的錯(cuò)誤 喷面;
常見(jiàn)的 mvn 依賴(lài)問(wèn)題;1 是管理好代碼中的mvn dep走孽,2 是管理好本地 repo惧辈;
- 5 分析 mvn 依賴(lài);
mvn dependency:tree -Dverbose -Dincludes=org.apache.spark:spark-tags_2.12
mvn dependency:tree -Dverbose -Dincludes=org.apache.spark:spark-tags_2.11
- 6 檢查 ccp-spark module 的 spark 依賴(lài)磕瓷,尋找 spark***2.12
<!-- Spark dependency start -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_${scala.binary.version}
</artifactId>
<exclusions>
<exclusion>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- Spark dependency end -->
這里使用的是 parent 的 scala.binary.version盒齿,我們一開(kāi)始就已經(jīng)檢查了代碼,是2.11困食;繼續(xù)檢查 mvn 本地倉(cāng)庫(kù)边翁;
- 7 檢查 mvn local repo;
vim /Users/apple/.m2/repository/com/cgws/ccp/ccp/1.1.0/ccp-1.1.0.pom
結(jié)果發(fā)現(xiàn)和我們 code 里面的 pom 不一致硕盹;mvn 編譯的老問(wèn)題了符匾;
<!-- scala -->
<scala.binary.version>2.12</scala.binary.version>
<scala.version>2.12.8</scala.version>
出現(xiàn)原因
之前調(diào)試過(guò) 將 spark 版本切換到 3.0.0 2.12,可能當(dāng)時(shí)有過(guò) install 的操作瘩例,將 parent 裝到本地了啊胶;