測試用例參考原文:https://blog.csdn.net/Mathieu66/article/details/110389575
Hudi社區(qū)已經(jīng)正式發(fā)布了hudi-0.7.0版本。hudi on flink 最早在0.6.1本版出現(xiàn)侨赡,當(dāng)時還是在master分支仍稀,最終切換到了0.7.0分支并經(jīng)過0.7.0-rc1和0.7.0-rc2兩個版本的迭代后畢業(yè)拜隧。無疑hudi on flink 的支持烁焙,使得flink cdc+hudi實時數(shù)倉方案得到落實窥浪,為此對hudi on flink的開始使用調(diào)研及遇到的問題做以下分享张足。
一蛙讥、環(huán)境介紹
這里以下環(huán)境版本示例:
1、hadoop-2.7.2
2遇西、kafka_2.11-2.1.0
3馅精、flink-1.11.2
4、hudi-0.7.0
二粱檀、官方源碼編譯
官方地址:https://hudi.apache.org/
gihub地址:https://github.com/apache/hudi
2.1洲敢、 源碼下載
git clone https://github.com/apache/hudi.git && cd hudi
2.2、 源碼編譯
修改hudi/pom.xml
1> 切換release-0.7.0分支茄蚯,修改pom.xml中hadoop對應(yīng)集群版本
2> window環(huán)境下注釋掉<module>hudi-integ-test</module>和<module>packaging/hudi-integ-test-bundle</module>
3> flink-1.11.2對應(yīng)parquet版本為1.11.1压彭,可修改pom.xml到對應(yīng)版本(若flink/lib下存在其他本版以該版本保持一致),本次測試不修改保留1.10.1版本
mvn clean package -DskipTests
三渗常、創(chuàng)建測試用例
3.1哮塞、創(chuàng)建hudi-conf.properties配置文件
hoodie.datasource.write.recordkey.field=uuid
hoodie.datasource.write.partitionpath.field=ts
bootstrap.servers=hadoop01:9092,hadoop02:9092,hadoop03:9092
hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator
hoodie.embed.timeline.server=false
hoodie.deltastreamer.schemaprovider.source.schema.file=hdfs:///tmpdir/hudi/test/config/flink/schema.avsc
hoodie.deltastreamer.schemaprovider.target.schema.file=hdfs:///tmpdir/hudi/test/config/flink/schema.avsc
3.2、創(chuàng)建schema.avsc文件
{
"type":"record",
"name":"stock_ticks",
"fields":[{
"name": "uuid",
"type": "string"
}, {
"name": "ts",
"type": "long"
}, {
"name": "symbol",
"type": "string"
},{
"name": "year",
"type": "int"
},{
"name": "month",
"type": "int"
},{
"name": "high",
"type": "double"
},{
"name": "low",
"type": "double"
},{
"name": "key",
"type": "string"
},{
"name": "close",
"type": "double"
}, {
"name": "open",
"type": "double"
}, {
"name": "day",
"type":"string"
}
]}
3.3凳谦、上傳到hdfs文件系統(tǒng)
sudo -u hdfs hadoop fs -mkdir -p /tmpdir/hudi/test/config/flink
hadoop fs -put schema.avsc /tmpdir/hudi/test/config/flink/
hadoop fs -put hudi-conf.properties /tmpdir/hudi/test/config/flink/
3.4忆畅、創(chuàng)建kafka測試topic
#創(chuàng)建主題
/opt/apps/kafka/bin/kafka-topics.sh --create --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181/kafka --replication-factor 2 --partitions 3 --topic mytest
#查看主題
/opt/apps/kafka/bin/kafka-topics.sh --list --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181/kafka
#生產(chǎn)者(測試)
/opt/apps/kafka/bin/kafka-console-producer.sh --broker-list hadoop01:9092,hadoop02:9092,hadoop03:9092 --topic mytest
測試數(shù)據(jù)
{"close":0.27172637588467297,"day":"2","high":0.4493211149337879,"key":"840ef1","low":0.030714155934507215,"month":11,"open":0.7762668153935262,"symbol":"77c-40d6-8412-6859d4757727","ts":1608361070161,"uuid":"840ef1ff-b77c-40d6-8412-6859d4757727","year":120}
6、提交命令及參數(shù)介紹
/opt/flink-1.11.2/bin/flink run -c org.apache.hudi.HoodieFlinkStreamer -m yarn-cluster -d -yjm 2048 -ytm 3096 -ys 4 -ynm hudi_on_flink_test \
-p 1 -yD env.java.opts=" -XX:+TraceClassPaths -XX:+TraceClassLoading" hudi-flink-bundle_2.11-0.7.0.jar --kafka-topic mytest --kafka-group-id hudi_on_flink \
--kafka-bootstrap-servers hadoop01:9092,hadoop02:9092,hadoop03:9092 --table-type COPY_ON_WRITE --target-base-path hdfs:///tmpdir/hudi/test/data/hudi_on_flink \
--target-table hudi_on_flink --props hdfs:///tmpdir/hudi/test/config/flink/hudi-conf.properties --checkpoint-interval 3000 \
--flink-checkpoint-path hdfs:///flink/hudi/hudi_on_flink_cp
HoodieFlinkStreamer參數(shù)介紹
參數(shù) | 描述 |
---|---|
--kafka-topic | 必選尸执,kafka source主題 |
--kafka-group-id | 必選家凯,kafka消費者組 |
--kafka-bootstrap-servers | 必選,kafka bootstrap.servers 如 node1:9092,node2:9092,node3:9092 |
--flink-checkpoint-path | 可選如失,flink checkpoint 路徑 |
--flink-block-retry-times | 可選绊诲,默認(rèn)10,當(dāng)最近的hoodie instant未完成時重試的次數(shù) |
--flink-block-retry-interval | 可選褪贵,默認(rèn)1掂之,當(dāng)最近的hoodie instant未完成時,兩次嘗試之間的秒數(shù) |
--target-base-path | 必選脆丁,目標(biāo)hoodie表的基本路徑(如果路徑不存在將被創(chuàng)建) |
--target-table | 必選世舰,Hive中目標(biāo)表的名稱 |
--table-type | 必選,表的類型槽卫。COPY_ON_WRITE 或 MERGE_ON_READ |
--props | 可選跟压,屬性配置文件的路徑(local或hdfs)。有hoodie客戶端歼培、schema提供者震蒋、鍵生成器和數(shù)據(jù)源的配置茸塞。 |
--hoodie-conf | 可選,可以在屬性文件中設(shè)置的任何配置(使用參數(shù)"--props")也可以使用此參數(shù)傳遞命令行查剖。 |
--source-ordering-field | 可選钾虐,以決定如何打破輸入數(shù)據(jù)中具有相同鍵的記錄之間的聯(lián)系字段。默認(rèn)值:'ts'保存記錄的unix時間戳 |
--payload-class | 可選笋庄,HoodieRecordPayload的子類禾唁,它在GenericRecord上工作。實現(xiàn)你自己的无切,如果你想做一些事情而不是覆蓋現(xiàn)有的值 |
--op | 可選,接受以下值之一:UPSERT(默認(rèn))丐枉,INSERT(當(dāng)輸入純粹是新數(shù)據(jù)/插入時使用哆键,以提高速度) |
--filter-dupes | 可選,默認(rèn)false瘦锹。是否應(yīng)該在插入/大容量插入之前刪除/過濾源中的重復(fù)記錄 |
--commit-on-errors | 可選籍嘹,默認(rèn)false。即使某些記錄寫入失敗也要提交 |
--checkpoint-interval | 可選弯院,默認(rèn)5000毫秒辱士。Flink checkpoint間隔 |
四、發(fā)現(xiàn)問題
新搭建的測試環(huán)境未做過jar包的兼容性整合听绳。job能正常啟動后颂碘,當(dāng)消費消息write_process時會發(fā)生如下異常:
java.io.IOException: Could not perform checkpoint 6 for operator write_process (1/1).
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:892)
at org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:113)
at org.apache.flink.streaming.runtime.io.CheckpointBarrierAligner.processBarrier(CheckpointBarrierAligner.java:137)
at org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:93)
at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:158)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:351)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:566)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:536)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete snapshot 6 for operator write_process (1/1). Failure reason: Checkpoint was declined.
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:215)
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:156)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:314)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:614)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:540)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:507)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:266)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:921)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:911)
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:879)
... 13 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20210125212048
at org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:62)
at org.apache.hudi.table.action.commit.FlinkUpsertCommitActionExecutor.execute(FlinkUpsertCommitActionExecutor.java:47)
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.upsert(HoodieFlinkCopyOnWriteTable.java:66)
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.upsert(HoodieFlinkCopyOnWriteTable.java:58)
at org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:110)
at org.apache.hudi.operator.KeyedWriteProcessFunction.snapshotState(KeyedWriteProcessFunction.java:121)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:120)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:101)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90)
at org.apache.hudi.operator.KeyedWriteProcessOperator.snapshotState(KeyedWriteProcessOperator.java:58)
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:186)
... 23 more
Caused by: java.lang.RuntimeException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/hudi/avro/HoodieAvroWriteSupport
at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.lambda$execute$0(BaseFlinkCommitActionExecutor.java:120)
at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684)
at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:118)
at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:68)
at org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:55)
... 33 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/hudi/avro/HoodieAvroWriteSupport
at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:73)
at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:38)
at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
... 39 more
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/hudi/avro/HoodieAvroWriteSupport
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:69)
... 41 more
Caused by: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/hudi/avro/HoodieAvroWriteSupport
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
... 42 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hudi/avro/HoodieAvroWriteSupport
at org.apache.hudi.io.storage.HoodieFileWriterFactory.newParquetFileWriter(HoodieFileWriterFactory.java:59)
at org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriter(HoodieFileWriterFactory.java:47)
at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:85)
at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:66)
at org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:34)
at org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:83)
at org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:40)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
五、定位問題
5.1椅挣、查看編譯的jar包
發(fā)現(xiàn)存在該類头岔,于是猜測與集群中有jar沖突
5.2、是否和集群中有hudi相關(guān)jar沖突(該類屬于hudi-common包)
通過 find /opt -name "*hudi*.jar" 查找鼠证。并未發(fā)現(xiàn)有相關(guān)的jar(曾切換到不同版本的hadoop集群測試結(jié)果一致異常)
結(jié)論:不是jar沖突引起峡竣。
5.3、打開classloadtrace查看類加載情況
添加 -yD env.java.opts="-XX:+TraceClassLoading -XX:+TraceClassPaths"參數(shù)量九,查看HoodieAvroWriteSupport 沒有被加載适掰。于是開始定位異常代碼位置,猜測為靜態(tài)代碼中初始化該類時就已經(jīng)異常導(dǎo)致荠列,將 new AvroSchemaConverter().convert(schema)寫到外面來定位具體的缺失類
結(jié)果:異常信息變?yōu)椋?code>Caused by: java.lang.ClassNotFoundException: org.apache.parquet.schema.Typemac
5.4类浪、通過查看hudi-flink-bundle編譯包,除了parquet-avro,其他parquet相關(guān)的依賴并沒有被編譯進(jìn)來肌似。
到編譯的本地maven依賴倉庫找出以下parquet相關(guān)jar拷貝到flink/lib下嘗試解決
結(jié)果: 異常信息變?yōu)?br>
Caused by: java.lang.NoClassDefFoundError: org/apache/parquet/hadoop/ParquetInputFormat
at org.apache.parquet.HadoopReadOptions$Builder.<init>(HadoopReadOptions.java:87)
所以由parquet相關(guān)包的異常導(dǎo)致NoClassDefFoundError: org/apache/hudi/avro/HoodieAvroWriteSupport
5.5戚宦、根據(jù)HadoopReadOptions類加載信息判斷
查看classloadtrace信息已加載HadoopReadOptions來確認(rèn)parquet-hadoop包已經(jīng)被加載,所以確定ParquetInputFormat類(來自parquet-hadoop)的異常由其他缺失造成锈嫩,于是查看HadoopReadOptions類中引入的類及ParquetInputFormat繼承的父類等受楼。最終發(fā)現(xiàn)ParquetInputFormat繼承的父類org.apache.hadoop.mapreduce.lib.input.FileInputFormat
(所屬hadoop-mapreduce-client-coreye-x.x.x.jar)并未被加載垦搬。
5.6、將hadoop-mapreduce-client-core-x.x.x.jar拷貝到flink/lib下解決
重新提交任務(wù)艳汽。若發(fā)現(xiàn)有如下異常,刪除hdfs表目錄下的文件重新提交(由于之前步驟的異常造成的損壞)猴贰。
java.io.IOException: Could not perform checkpoint 3 for operator instant_generator (1/1).
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:892)
at org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:113)
at org.apache.flink.streaming.runtime.io.CheckpointBarrierAligner.processBarrier(CheckpointBarrierAligner.java:137)
at org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:93)
at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:158)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:351)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:566)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:536)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException: Last instant costs more than 10 second, stop task now
at org.apache.hudi.operator.InstantGenerateOperator.doCheck(InstantGenerateOperator.java:199)
at org.apache.hudi.operator.InstantGenerateOperator.prepareSnapshotPreBarrier(InstantGenerateOperator.java:119)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.prepareSnapshotPreBarrier(OperatorChain.java:266)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:249)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:921)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:911)
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:879)
... 13 more
5.7 解決問題
問題總結(jié)
上述問題基本是由于集群沒有相關(guān)jar包導(dǎo)致,根據(jù)上面的問題定位方法能解決大部分在遇到NoClassDefFoundError河狐、ClassNotFoundException等異常問題米绕。