spark 3.2.3
hudi 0.11.0
spark 寫hudi钓瞭,commit失敗。.hoodie目錄下,有commit.request和inflight翅敌,沒有commit文件
-rw-r--r--@ 1 lqq staff 1572 5 23 09:54 20230512145004274.rollback
-rw-r--r--@ 1 lqq staff 0 5 23 09:54 20230512145004274.rollback.inflight
-rw-r--r--@ 1 lqq staff 1384 5 23 09:54 20230512145004274.rollback.requested
-rw-r--r--@ 1 lqq staff 0 5 23 09:54 20230522173618331.commit.requested
-rw-r--r--@ 1 lqq staff 3123 5 23 09:54 20230522173618331.inflight
查看log,發(fā)現(xiàn)有錯誤日志惕蹄,但是沒有打印具體的錯誤信息
ERROR HoodieSparkSqlWriter$: UPSERT failed with errors
繼續(xù)查看源碼查蚯涮,發(fā)現(xiàn)打印具體錯誤日志為TRACE級別
private def commitAndPerformPostOperations(spark: SparkSession,
schema: StructType,
writeResult: HoodieWriteResult,
parameters: Map[String, String],
client: SparkRDDWriteClient[HoodieRecordPayload[Nothing]],
tableConfig: HoodieTableConfig,
jsc: JavaSparkContext,
tableInstantInfo: TableInstantInfo
): (Boolean, common.util.Option[java.lang.String], common.util.Option[java.lang.String]) = {
23/05/25 11:57:45 TRACE HoodieSparkSqlWriter$: Printing out the top 100 errors
........
} else {
log.error(s"${tableInstantInfo.operation} failed with errors")
if (log.isTraceEnabled) {
log.trace("Printing out the top 100 errors")
writeResult.getWriteStatuses.rdd.filter(ws => ws.hasErrors)
.take(100)
.foreach(ws => {
log.trace("Global error :", ws.getGlobalError)
if (ws.getErrors.size() > 0) {
ws.getErrors.foreach(kt =>
log.trace(s"Error for key: ${kt._1}", kt._2))
}
})
}
(false, common.util.Option.empty(), common.util.Option.empty())
}
降低日志級別(參考http://www.reibang.com/u/c2bc3695bc47),重跑程序卖陵,打印出了具體的錯誤日志
org.apache.hudi.exception.SchemaCompatibilityException: Unable to validate the rewritten record {"gender": "male", "id": 708075384135690, "count": null} against schema {{"name":"id","type":["null","long"],{"name":"gender","type":["null","string"],"default":null},{"name":"count","type":["null","int"],"default":null}}
原因: schema不兼容遭顶。 count字段,之前寫入hudi的是int類型泪蔫,新寫一批寫入是指定為long類型棒旗,導致寫入失敗
解決方法:改回int類型或者刪除hudi表重新寫入