Caused by: java.lang.NullPointerException
at org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:839)
at org.apache.spark.sql.catalyst.ScalaReflection$.localTypeOf(ScalaReflection.scala:39)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:751)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:715)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:714)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:711)
at org.apache.spark.sql.functions$.udf(functions.scala:3398)
...omitted...
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
)
目前懷疑是scala bug所致酪穿,https://github.com/scala/bug/issues/10766
Spark在注冊用戶的UDF的時候會根據(jù)UDF的輸入類型和返回類型通過Scala反射的方式映射到Spark SQL內(nèi)部的類型表示受神,比如輸入類型是Int會得到IntegerType,返回類型是String亦可得到StringType類型抬吟。
這個過程通過 localTypeOf[某個類型]和預(yù)設(shè)的localTypeOf[java.lang.Integer]等的規(guī)則匹配判斷渐行,而這里用的scala的操作符 <:< 由于https://github.com/scala/bug/issues/10766的bug并不是線程安全的,可能觸發(fā)這個詭異問題叠骑。
udf注冊加同步李皇,看看是否能夠規(guī)避這個問題