1粮宛、HiveSource-xxxx.xxxx's parallelism (200) is higher than the max parallelism (128). Please lower the parallelism or increase the max parallelism.
(1)報錯
這是sql-cli 連接hive,查一張表報的錯
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.runtime.JobException: Vertex Source: HiveSource-xxxx.xxxx's parallelism (200) is higher than the max parallelism (128). Please lower the parallelism or increase the max parallelism.
(2)解決
只需要改動flink包下的/conf包里sql-client-defaults.yaml這個文件里的max-parallelism改為300即可
execution:
max-parallelism: 300
2懈万、flink sql讀取hive表時建議手動配置table.exec.hive.fallback-mapred-reader: true生效
(1)報錯
不管用sql-cli,還是把sql放在代碼里蹄葱,執(zhí)行以下sql都是下面的結(jié)果,同時報錯都是報Caused by: java.lang.IllegalArgumentException杜顺。
而我用Spark Sql跑下面的Sql都是正常的裁僧。
(1)First:
SELECT vid From table_A WHERE datekey = '20210112' AND event = 'XXX' AND vid = 'aaaaaa'; (**OK**)
SELECT vid From table_A WHERE datekey = '20210112' AND vid = 'aaaaaa'; (**Error**)
(2)Second:
SELECT vid From table_B WHERE datekey = '20210112' AND event = 'YYY' AND vid = 'bbbbbb'; (**OK**)
SELECT vid From table_B WHERE datekey = '20210112' AND vid = 'bbbbbb'; (**Error**)
報錯原文:
[ERROR] Could not execute SQL statement. Reason:
java.lang.RuntimeException: SplitFetcher thread 22 received unexpected exception while polling the records
java.lang.RuntimeException: One or more fetchers have encountered exception
at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:199)
at org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:154)
at org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:116)
at org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:273)
at org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:67)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:395)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:609)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:573)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: SplitFetcher thread 22 received unexpected exception while polling the records
at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:146)
at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:101)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:244)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:153)
at java.nio.ByteBuffer.get(ByteBuffer.java:715)
at org.apache.flink.hive.shaded.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes(Binary.java:422)
at org.apache.flink.hive.shaded.formats.parquet.vector.reader.BytesColumnReader.readBatchFromDictionaryIds(BytesColumnReader.java:79)
at org.apache.flink.hive.shaded.formats.parquet.vector.reader.BytesColumnReader.readBatchFromDictionaryIds(BytesColumnReader.java:33)
at org.apache.flink.hive.shaded.formats.parquet.vector.reader.AbstractColumnReader.readToVector(AbstractColumnReader.java:199)
at org.apache.flink.hive.shaded.formats.parquet.ParquetVectorizedInputFormat$ParquetReader.nextBatch(ParquetVectorizedInputFormat.java:359)
at org.apache.flink.hive.shaded.formats.parquet.ParquetVectorizedInputFormat$ParquetReader.readBatch(ParquetVectorizedInputFormat.java:328)
at org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:67)
at org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:56)
at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:138)
... 6 more
(2)解決
昨天提交了一個issue:https://issues.apache.org/jira/browse/FLINK-20951个束,云邪大佬幫忙叫Rui Li大佬幫忙看了一下,需要配置table.exec.hive.fallback-mapred-reader: true聊疲。
我昨天翻遍了官網(wǎng)也看到了這個配置茬底,官方文檔說是默認(rèn)開啟的,所以還是建議手動將這個配置配置上获洲。官網(wǎng)的解釋是啟動這個配置是啟用hive表的向量化讀取阱表,當(dāng)Format是ORC 或者 Parquet類型,同時沒有hive的復(fù)雜類型贡珊。
官網(wǎng)鏈接:https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#vectorized-optimization-upon-read
a最爬、用sql-cli可以配置在flink包下的/conf包里sql-client-defaults.yaml這個文件里
configuration:
table.exec.hive.fallback-mapred-reader: true
b、如果是在代碼里提交flink sql门岔,像下面這樣配置Configuration就好:
Configuration configuration = tableEnv.getConfig().getConfiguration();
configuration.setString("table.exec.hive.fallback-mapred-reader", "true");
3爱致、如果你的hive表的分區(qū)非常多,flink的默認(rèn)配置會幫你開啟很多的Taskmanager
(1)報錯
可以看到一下子給你分配1000寒随,當(dāng)時看到時候被嚇到了
(2)解決
翻了一下官網(wǎng)糠悯,原來是Flink將根據(jù)文件數(shù)和每個文件中的塊數(shù)為其Hive讀取器推斷最佳并行度,不過目前看起來并不是很良好妻往。
可以關(guān)閉這個配置然后根據(jù)自己任務(wù)進(jìn)行配置(這個參數(shù)會影響所有的hive作業(yè)互艾,我建議自己啟任務(wù)前啟動一個合適的并行度設(shè)置在代碼里,或者sql-cli可以在sql-client-defaults.yaml配置)
configuration.setString("table.exec.hive.infer-source-parallelism.max", "100");
configuration.setString("table.exec.hive.fallback-mapred-reader", "true");