java.time.Instant
在Spark 3.0中, java8 time API 被用到Spark datetime相關(guān)的內(nèi)部計(jì)算和用戶API中,比如Instant
對象就被Mapping到Spark SQL類型TimestampType
仅政。TimestampType
內(nèi)部實(shí)際以Long
表示microsecond
, 如其文檔所述:
/**
* The timestamp type represents a time instant in microsecond precision.
* Valid range is [0001-01-01T00:00:00.000000Z, 9999-12-31T23:59:59.999999Z] where
* the left/right-bound is a date and time of the proleptic Gregorian
* calendar in UTC+00:00.
*
* Please use the singleton `DataTypes.TimestampType` to refer the type.
* @since 1.3.0
*/
而Instant用一個Long類型字段存儲EPOCH時刻相加減的秒數(shù),和一個Int類型字段存儲納秒數(shù)默伍,最大值和最小值遠(yuǎn)超Spark TimestampType所能承受的范圍传货。
scala> Instant.MAX
res22: java.time.Instant = +1000000000-12-31T23:59:59.999999999Z
scala> Instant.MIN
res25: java.time.Instant = -1000000000-01-01T00:00:00Z
scala> Instant.EPOCH
res26: java.time.Instant = 1970-01-01T00:00:00Z
那么Spark 能支持多大的Instant實(shí)例呢惫东?
scala> val t = Instant.ofEpochSecond(9223372036854L, 775807999)
t: java.time.Instant = +294247-01-10T04:00:54.775807999Z
scala> Seq(t).toDF
res20: org.apache.spark.sql.DataFrame = [value: timestamp]
scala> val t = Instant.ofEpochSecond(9223372036854L, 775808000)
t: java.time.Instant = +294247-01-10T04:00:54.775808Z
scala> Seq(t).toDF
java.lang.RuntimeException: Error while encoding: java.lang.ArithmeticException: long overflow
staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, TimestampType, instantToMicros, input[0, java.time.Instant, true], true, false) AS value#67
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215)
at org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:466)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:466)
at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:353)
at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:231)
... 47 elided
Caused by: java.lang.ArithmeticException: long overflow
at java.lang.Math.addExact(Math.java:809)
at org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:411)
at org.apache.spark.sql.catalyst.util.DateTimeUtils.instantToMicros(DateTimeUtils.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:211)
... 56 more
顯然和Spark API文檔描述的不一致莉给,實(shí)際可以表示的Timestamp值最大為·+294247-01-10T04:00:54.775807999Z遠(yuǎn)比
9999-12-31T23:59:59.999999Z`(文檔值)大不少,也比Instant的實(shí)際最大值小了不少個數(shù)量級廉沮。
最小值雷同颓遏,不再贅述。