概述
HDFS 客戶端在使用過程中竹握,有下面兩個(gè)過程:
- 向 NameNode 進(jìn)行 RPC 請(qǐng)求
- 向 DataNode 進(jìn)行 IO 讀寫画株。
無論哪個(gè)過程,如果出現(xiàn)異常,一般都不會(huì)導(dǎo)致業(yè)務(wù)失敗谓传,也都有重試機(jī)制蜈项,實(shí)際上,業(yè)務(wù)想要失敗是很難的续挟。在實(shí)際使用過程中紧卒,客戶端和 NN 之間的 RPC 交互一般不會(huì)有什么報(bào)錯(cuò),大部分報(bào)錯(cuò)都出現(xiàn)在和 DN 的 IO 交互過程中诗祸,這篇文章主要總結(jié)一下常見的 DN IO 報(bào)錯(cuò)跑芳。
客戶端常見的 IO 報(bào)錯(cuò)
-
客戶端寫過程中,因?yàn)榉N種原因直颅,無法成功建立流水線博个,此時(shí)會(huì)放棄出錯(cuò)的 DN,重新申請(qǐng)新的 DN 并建立流水線功偿,幾個(gè)典型的情況如下:
- 流水線中的第一個(gè) DN 掛掉盆佣,導(dǎo)致流水線建立失敗,由于這個(gè) DN 是和客戶端直連的械荷,因此客戶端能拿到具體的出錯(cuò)原因:
21/02/22 15:34:23 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741830_1006
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:254)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1740)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/02/22 15:34:23 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741830_1006
21/02/22 15:34:23 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]- 流水線中的第一個(gè) DN 負(fù)載超標(biāo)共耍,導(dǎo)致流水線建立失敗,日志如下:
21/02/22 16:03:12 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741842_1019
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:461)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1776)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/02/22 16:03:12 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741842_1019
21/02/22 16:03:12 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]- 流水線中的其它 DN 出現(xiàn)問題(掛掉或者負(fù)載超標(biāo))吨瞎,導(dǎo)致流水線建立失敗痹兜,由于這些 DN 并不和客戶端直連,因此客戶端往往拿不到具體的出錯(cuò)原因颤诀,只能知道出錯(cuò) DN 的 IP:
21/02/22 15:51:21 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741835_1012
java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 192.168.202.12:9003
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:121)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1792)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/02/22 15:51:21 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741835_1012
21/02/22 15:51:21 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.12:9003,DS-b76f5779-927e-4f8c-b4fe-9db592ecadfa,DISK]- 由于某些原因(如 DN IO 并發(fā)負(fù)載實(shí)在太高導(dǎo)致嚴(yán)重爭(zhēng)鎖)佃蚜,導(dǎo)致流水線建立超時(shí)(3副本默認(rèn)75s 超時(shí)),日志如下:
21/06/17 15:51:28 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073742830_2006
java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.13:56994 remote=/192.168.202.13:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1776)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/06/17 15:51:28 WARN hdfs.DataStreamer: Abandoning BP-358940719-192.168.202.11-1623894544733:blk_1073742830_2006
21/06/17 15:51:28 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.13:50010,DS-5bfd7a2e-9963-40b0-9f5d-50ffecde15c1,DISK] 寫過程中着绊,流水線中的某個(gè) DN 突然掛掉谐算,此時(shí)會(huì)進(jìn)行一次錯(cuò)誤恢復(fù),日志如下:
21/02/22 15:47:39 WARN hdfs.DataStreamer: Exception for BP-239523849-192.168.202.11-1613727437316:blk_1073741834_1010
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:461)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1092)
21/02/22 15:47:39 WARN hdfs.DataStreamer: Error Recovery for BP-239523849-192.168.202.11-1613727437316:blk_1073741834_1010 in pipeline [DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK], DatanodeInfoWithStorage[192.168.202.14:9003,DS-6424283e-fad1-4b9a-aaed-dc6683e55a4d,DISK], DatanodeInfoWithStorage[192.168.202.13:9003,DS-c211a421-b13d-4d46-9c28-e52426509f8a,DISK]]: datanode 0(DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]) is bad.
- 寫過程中归露,若客戶端寫一個(gè) packet 之后洲脂,收到 ack(即回應(yīng))的時(shí)間超過 30s,則打印慢速告警剧包,日志如下:
[2021-06-17 15:22:58,929] WARN Slow ReadProcessor read fields took 37555ms (threshold=30000ms); ack: seqno: 343 reply: SUCCESS reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 16503757088 flag: 0 flag: 0 flag: 0, targets: [DatanodeInfoWithStorage[9.10.146.124:9003,DS-cdab7fb8-c6ec-4f6b-8b6a-2a0c92aed6b6,DISK], DatanodeInfoWithStorage[9.10.146.98:9003,DS-346a7f42-4b12-4bac-8e58-8b33d972eb79,DISK], DatanodeInfoWithStorage[9.180.22.26:9003,DS-ad6cbeb4-9ce8-495b-b978-5c7aac66686f,DISK]]
- 寫過程中恐锦,若客戶端寫一個(gè) packet 之后 75s 還未收到 ack(2副本寫的閾值為70s)疆液,則超時(shí)出錯(cuò)一铅,并開始錯(cuò)誤恢復(fù),日志如下:
21/02/22 16:09:35 WARN hdfs.DataStreamer: Exception for BP-239523849-192.168.202.11-1613727437316:blk_1073741844_1021
java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:44868 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1092)
21/02/22 16:09:35 WARN hdfs.DataStreamer: Error Recovery for BP-239523849-192.168.202.11-1613727437316:blk_1073741844_1021 in pipeline [DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK], DatanodeInfoWithStorage[192.168.202.13:9003,DS-c211a421-b13d-4d46-9c28-e52426509f8a,DISK], DatanodeInfoWithStorage[192.168.202.14:9003,DS-6424283e-fad1-4b9a-aaed-dc6683e55a4d,DISK]]: datanode 0(DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]) is bad.
- 關(guān)閉文件(或手動(dòng)調(diào)用 hflush堕油、hsync)時(shí)潘飘,客戶端會(huì)將尚未寫入集群的數(shù)據(jù)全部 flush 到集群肮之,如果 flush 的時(shí)間超過 30s,則打印慢速告警卜录,日志如下:
20/12/15 11:22:25 WARN DataStreamer: Slow waitForAckedSeqno took 45747ms (threshold=30000ms). File being written: /stage/interface/TEG/g_teg_common_teg_plan_bigdata/plan/exportBandwidth/origin/company/2020/1215/1059.parquet/_temporary/0/_temporary/attempt_20201215112121_0008_m_000021_514/part-00021-94e67782-be1b-48ae-b736-204624fa498c-c000.snappy.parquet, block: BP-1776336001-100.76.59.150-1482408994930:blk_16194984410_15220615717, Write pipeline datanodes: [DatanodeInfoWithStorage[100.76.29.36:9003,DS-4a301194-a232-46c6-b606-44b15a83ebed,DISK], DatanodeInfoWithStorage[100.76.60.168:9003,DS-24645191-aa52-4643-9c97-213b2a0bb41d,DISK], DatanodeInfoWithStorage[100.76.60.160:9003,DS-27ca6eb7-75b9-47a2-ae9d-de6d720f4d9a,DISK]].
- 寫過程中戈擒,由于 DN 增量上報(bào)太慢,導(dǎo)致客戶端沒法及時(shí)分配出新的 block艰毒,會(huì)打印一些日志筐高,并重試:
21/02/22 16:16:53 INFO hdfs.DFSOutputStream: Exception while adding a block
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException): Not replicated yet: /a.COPYING
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2572)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:885)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:540)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:806)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2286)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2541)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1520)
at org.apache.hadoop.ipc.Client.call(Client.java:1466)
at org.apache.hadoop.ipc.Client.call(Client.java:1376)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:472)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1074)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1880)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1683)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/02/22 16:16:53 WARN hdfs.DFSOutputStream: NotReplicatedYetException sleeping /a.COPYING retries left 4
- 寫過程中,由于 DN 增量上報(bào)太慢丑瞧,導(dǎo)致客戶端無法及時(shí) close 文件柑土,會(huì)打印一些日志,并重試:
2021-02-22 16:19:23,259 INFO hdfs.DFSClient: Could not complete /a.txt retrying...
- 讀過程中绊汹,若目標(biāo) DN 已經(jīng)提前掛掉稽屏,會(huì)打印一些連接異常日志,然后嘗試其他 DN:
21/02/22 16:29:33 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3039)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:814)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:739)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.read(DataInputStream.java:100)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:29:33 WARN hdfs.DFSClient: Failed to connect to /192.168.202.11:9003 for file /a.txt for block BP-239523849-192.168.202.11-1613727437316:blk_1073741852_1030, add to deadNodes and continue.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3039)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:814)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:739)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.read(DataInputStream.java:100)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:29:34 INFO hdfs.DFSClient: Successfully connected to /192.168.202.14:9003 for BP-239523849-192.168.202.11-1613727437316:blk_1073741852_1030
- 讀過程中灸促,和目標(biāo) DN 建立 TCP 連接超時(shí)導(dǎo)致出錯(cuò),會(huì)打印一些日志涵卵,然后嘗試其它 DN:
2021-02-25 23:57:11,000 WARN org.apache.hadoop.hdfs.DFSClient: Connection failure: Failed to connect to /9.10.34.27:9003 for file /data/SPARK/part-r-00320.tfr.gz for block BP-1815681714-100.76.60.19-1523177824331:blk_10324215185_9339836899:org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/9.10.34.27:9003]
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/9.10.34.27:9003]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3450)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:777)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:694)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1173)
at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1094)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1449)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1412)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89)
- 讀過程中浴栽,和目標(biāo) DN 建立讀取通道時(shí),DN 超時(shí)無響應(yīng)(默認(rèn)閾值60s)轿偎,會(huì)打印一些日志典鸡,然后嘗試其它 DN:
21/02/22 16:52:32 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45318 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:407)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:845)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:742)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:52:32 WARN hdfs.DFSClient: Failed to connect to /192.168.202.11:9003 for file /a.txt for block BP-239523849-192.168.202.11-1613727437316:blk_1073741891_1069, add to deadNodes and continue.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45318 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:407)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:845)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:742)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:52:32 INFO hdfs.DFSClient: Successfully connected to /192.168.202.14:9003 for BP-239523849-192.168.202.11-1613727437316:blk_1073741891_1069
- 讀過程中,已經(jīng)開始傳輸數(shù)據(jù)坏晦,但傳輸太慢導(dǎo)致超時(shí)(默認(rèn)閾值60s)萝玷,會(huì)打印一些日志,然后嘗試其他 DN:
21/02/22 16:44:30 WARN hdfs.DFSClient: Exception while reading from BP-239523849-192.168.202.11-1613727437316:blk_1073741889_1067 of /a.txt from DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45254 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:256)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:207)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:183)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:142)
at org.apache.hadoop.hdfs.ByteArrayStrategy.readFromBlock(ReaderStrategy.java:118)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:703)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:764)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
- 讀過程中昆婿,在所有 DN 上都找不到目標(biāo) block(即遇到了 missing block)球碉,報(bào)錯(cuò)如下:
2021-02-22 16:57:59,009 WARN hdfs.DFSClient: No live nodes contain block BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 after checking nodes = [], ignoredNodes = null
2021-02-22 16:57:59,009 WARN hdfs.DFSClient: Could not obtain block: BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 file=/a No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
2021-02-22 16:57:59,010 WARN hdfs.DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 file=/a
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1053)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1036)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1015)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:647)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:926)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:23)