HDFS 客戶端常見報(bào)錯(cuò)整理

概述

HDFS 客戶端在使用過程中竹握,有下面兩個(gè)過程:

  1. 向 NameNode 進(jìn)行 RPC 請(qǐng)求
  2. 向 DataNode 進(jìn)行 IO 讀寫画株。
    無論哪個(gè)過程,如果出現(xiàn)異常,一般都不會(huì)導(dǎo)致業(yè)務(wù)失敗谓传,也都有重試機(jī)制蜈项,實(shí)際上,業(yè)務(wù)想要失敗是很難的续挟。在實(shí)際使用過程中紧卒,客戶端和 NN 之間的 RPC 交互一般不會(huì)有什么報(bào)錯(cuò),大部分報(bào)錯(cuò)都出現(xiàn)在和 DN 的 IO 交互過程中诗祸,這篇文章主要總結(jié)一下常見的 DN IO 報(bào)錯(cuò)跑芳。

客戶端常見的 IO 報(bào)錯(cuò)

  1. 客戶端寫過程中,因?yàn)榉N種原因直颅,無法成功建立流水線博个,此時(shí)會(huì)放棄出錯(cuò)的 DN,重新申請(qǐng)新的 DN 并建立流水線功偿,幾個(gè)典型的情況如下:

    1. 流水線中的第一個(gè) DN 掛掉盆佣,導(dǎo)致流水線建立失敗,由于這個(gè) DN 是和客戶端直連的械荷,因此客戶端能拿到具體的出錯(cuò)原因:

    21/02/22 15:34:23 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741830_1006
    java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
    at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:254)
    at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1740)
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
    21/02/22 15:34:23 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741830_1006
    21/02/22 15:34:23 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]

    1. 流水線中的第一個(gè) DN 負(fù)載超標(biāo)共耍,導(dǎo)致流水線建立失敗,日志如下:

    21/02/22 16:03:12 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741842_1019
    java.io.EOFException: Unexpected EOF while trying to read response from server
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:461)
    at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1776)
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
    21/02/22 16:03:12 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741842_1019
    21/02/22 16:03:12 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]

    1. 流水線中的其它 DN 出現(xiàn)問題(掛掉或者負(fù)載超標(biāo))吨瞎,導(dǎo)致流水線建立失敗痹兜,由于這些 DN 并不和客戶端直連,因此客戶端往往拿不到具體的出錯(cuò)原因颤诀,只能知道出錯(cuò) DN 的 IP:

    21/02/22 15:51:21 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741835_1012
    java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 192.168.202.12:9003
    at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:121)
    at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1792)
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
    21/02/22 15:51:21 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741835_1012
    21/02/22 15:51:21 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.12:9003,DS-b76f5779-927e-4f8c-b4fe-9db592ecadfa,DISK]

    1. 由于某些原因(如 DN IO 并發(fā)負(fù)載實(shí)在太高導(dǎo)致嚴(yán)重爭(zhēng)鎖)佃蚜,導(dǎo)致流水線建立超時(shí)(3副本默認(rèn)75s 超時(shí)),日志如下:

    21/06/17 15:51:28 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073742830_2006
    java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.13:56994 remote=/192.168.202.13:9003]
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
    at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1776)
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
    21/06/17 15:51:28 WARN hdfs.DataStreamer: Abandoning BP-358940719-192.168.202.11-1623894544733:blk_1073742830_2006
    21/06/17 15:51:28 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.13:50010,DS-5bfd7a2e-9963-40b0-9f5d-50ffecde15c1,DISK]

  2. 寫過程中着绊,流水線中的某個(gè) DN 突然掛掉谐算,此時(shí)會(huì)進(jìn)行一次錯(cuò)誤恢復(fù),日志如下:

21/02/22 15:47:39 WARN hdfs.DataStreamer: Exception for BP-239523849-192.168.202.11-1613727437316:blk_1073741834_1010
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:461)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1092)
21/02/22 15:47:39 WARN hdfs.DataStreamer: Error Recovery for BP-239523849-192.168.202.11-1613727437316:blk_1073741834_1010 in pipeline [DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK], DatanodeInfoWithStorage[192.168.202.14:9003,DS-6424283e-fad1-4b9a-aaed-dc6683e55a4d,DISK], DatanodeInfoWithStorage[192.168.202.13:9003,DS-c211a421-b13d-4d46-9c28-e52426509f8a,DISK]]: datanode 0(DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]) is bad.

  1. 寫過程中归露,若客戶端寫一個(gè) packet 之后洲脂,收到 ack(即回應(yīng))的時(shí)間超過 30s,則打印慢速告警剧包,日志如下:

[2021-06-17 15:22:58,929] WARN Slow ReadProcessor read fields took 37555ms (threshold=30000ms); ack: seqno: 343 reply: SUCCESS reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 16503757088 flag: 0 flag: 0 flag: 0, targets: [DatanodeInfoWithStorage[9.10.146.124:9003,DS-cdab7fb8-c6ec-4f6b-8b6a-2a0c92aed6b6,DISK], DatanodeInfoWithStorage[9.10.146.98:9003,DS-346a7f42-4b12-4bac-8e58-8b33d972eb79,DISK], DatanodeInfoWithStorage[9.180.22.26:9003,DS-ad6cbeb4-9ce8-495b-b978-5c7aac66686f,DISK]]

  1. 寫過程中恐锦,若客戶端寫一個(gè) packet 之后 75s 還未收到 ack(2副本寫的閾值為70s)疆液,則超時(shí)出錯(cuò)一铅,并開始錯(cuò)誤恢復(fù),日志如下:

21/02/22 16:09:35 WARN hdfs.DataStreamer: Exception for BP-239523849-192.168.202.11-1613727437316:blk_1073741844_1021
java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:44868 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1092)
21/02/22 16:09:35 WARN hdfs.DataStreamer: Error Recovery for BP-239523849-192.168.202.11-1613727437316:blk_1073741844_1021 in pipeline [DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK], DatanodeInfoWithStorage[192.168.202.13:9003,DS-c211a421-b13d-4d46-9c28-e52426509f8a,DISK], DatanodeInfoWithStorage[192.168.202.14:9003,DS-6424283e-fad1-4b9a-aaed-dc6683e55a4d,DISK]]: datanode 0(DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]) is bad.

  1. 關(guān)閉文件(或手動(dòng)調(diào)用 hflush堕油、hsync)時(shí)潘飘,客戶端會(huì)將尚未寫入集群的數(shù)據(jù)全部 flush 到集群肮之,如果 flush 的時(shí)間超過 30s,則打印慢速告警卜录,日志如下:

20/12/15 11:22:25 WARN DataStreamer: Slow waitForAckedSeqno took 45747ms (threshold=30000ms). File being written: /stage/interface/TEG/g_teg_common_teg_plan_bigdata/plan/exportBandwidth/origin/company/2020/1215/1059.parquet/_temporary/0/_temporary/attempt_20201215112121_0008_m_000021_514/part-00021-94e67782-be1b-48ae-b736-204624fa498c-c000.snappy.parquet, block: BP-1776336001-100.76.59.150-1482408994930:blk_16194984410_15220615717, Write pipeline datanodes: [DatanodeInfoWithStorage[100.76.29.36:9003,DS-4a301194-a232-46c6-b606-44b15a83ebed,DISK], DatanodeInfoWithStorage[100.76.60.168:9003,DS-24645191-aa52-4643-9c97-213b2a0bb41d,DISK], DatanodeInfoWithStorage[100.76.60.160:9003,DS-27ca6eb7-75b9-47a2-ae9d-de6d720f4d9a,DISK]].

  1. 寫過程中戈擒,由于 DN 增量上報(bào)太慢,導(dǎo)致客戶端沒法及時(shí)分配出新的 block艰毒,會(huì)打印一些日志筐高,并重試:

21/02/22 16:16:53 INFO hdfs.DFSOutputStream: Exception while adding a block
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException): Not replicated yet: /a.COPYING
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2572)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:885)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:540)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:806)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2286)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2541)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1520)
at org.apache.hadoop.ipc.Client.call(Client.java:1466)
at org.apache.hadoop.ipc.Client.call(Client.java:1376)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:472)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1074)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1880)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1683)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/02/22 16:16:53 WARN hdfs.DFSOutputStream: NotReplicatedYetException sleeping /a.COPYING retries left 4

  1. 寫過程中,由于 DN 增量上報(bào)太慢丑瞧,導(dǎo)致客戶端無法及時(shí) close 文件柑土,會(huì)打印一些日志,并重試:

2021-02-22 16:19:23,259 INFO hdfs.DFSClient: Could not complete /a.txt retrying...

  1. 讀過程中绊汹,若目標(biāo) DN 已經(jīng)提前掛掉稽屏,會(huì)打印一些連接異常日志,然后嘗試其他 DN:

21/02/22 16:29:33 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3039)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:814)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:739)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.read(DataInputStream.java:100)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:29:33 WARN hdfs.DFSClient: Failed to connect to /192.168.202.11:9003 for file /a.txt for block BP-239523849-192.168.202.11-1613727437316:blk_1073741852_1030, add to deadNodes and continue.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3039)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:814)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:739)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.read(DataInputStream.java:100)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:29:34 INFO hdfs.DFSClient: Successfully connected to /192.168.202.14:9003 for BP-239523849-192.168.202.11-1613727437316:blk_1073741852_1030

  1. 讀過程中灸促,和目標(biāo) DN 建立 TCP 連接超時(shí)導(dǎo)致出錯(cuò),會(huì)打印一些日志涵卵,然后嘗試其它 DN:

2021-02-25 23:57:11,000 WARN org.apache.hadoop.hdfs.DFSClient: Connection failure: Failed to connect to /9.10.34.27:9003 for file /data/SPARK/part-r-00320.tfr.gz for block BP-1815681714-100.76.60.19-1523177824331:blk_10324215185_9339836899:org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/9.10.34.27:9003]
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/9.10.34.27:9003]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3450)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:777)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:694)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1173)
at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1094)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1449)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1412)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89)

  1. 讀過程中浴栽,和目標(biāo) DN 建立讀取通道時(shí),DN 超時(shí)無響應(yīng)(默認(rèn)閾值60s)轿偎,會(huì)打印一些日志典鸡,然后嘗試其它 DN:

21/02/22 16:52:32 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45318 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:407)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:845)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:742)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:52:32 WARN hdfs.DFSClient: Failed to connect to /192.168.202.11:9003 for file /a.txt for block BP-239523849-192.168.202.11-1613727437316:blk_1073741891_1069, add to deadNodes and continue.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45318 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:407)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:845)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:742)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:52:32 INFO hdfs.DFSClient: Successfully connected to /192.168.202.14:9003 for BP-239523849-192.168.202.11-1613727437316:blk_1073741891_1069

  1. 讀過程中,已經(jīng)開始傳輸數(shù)據(jù)坏晦,但傳輸太慢導(dǎo)致超時(shí)(默認(rèn)閾值60s)萝玷,會(huì)打印一些日志,然后嘗試其他 DN:

21/02/22 16:44:30 WARN hdfs.DFSClient: Exception while reading from BP-239523849-192.168.202.11-1613727437316:blk_1073741889_1067 of /a.txt from DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45254 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:256)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:207)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:183)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:142)
at org.apache.hadoop.hdfs.ByteArrayStrategy.readFromBlock(ReaderStrategy.java:118)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:703)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:764)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)

  1. 讀過程中昆婿,在所有 DN 上都找不到目標(biāo) block(即遇到了 missing block)球碉,報(bào)錯(cuò)如下:

2021-02-22 16:57:59,009 WARN hdfs.DFSClient: No live nodes contain block BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 after checking nodes = [], ignoredNodes = null
2021-02-22 16:57:59,009 WARN hdfs.DFSClient: Could not obtain block: BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 file=/a No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
2021-02-22 16:57:59,010 WARN hdfs.DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 file=/a
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1053)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1036)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1015)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:647)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:926)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:23)

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
禁止轉(zhuǎn)載,如需轉(zhuǎn)載請(qǐng)通過簡(jiǎn)信或評(píng)論聯(lián)系作者仓蛆。
  • 序言:七十年代末睁冬,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子看疙,更是在濱河造成了極大的恐慌豆拨,老刑警劉巖,帶你破解...
    沈念sama閱讀 218,546評(píng)論 6 507
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件能庆,死亡現(xiàn)場(chǎng)離奇詭異施禾,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)搁胆,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,224評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門弥搞,熙熙樓的掌柜王于貴愁眉苦臉地迎上來邮绿,“玉大人,你說我怎么就攤上這事拓巧∷孤担” “怎么了?”我有些...
    開封第一講書人閱讀 164,911評(píng)論 0 354
  • 文/不壞的土叔 我叫張陵肛度,是天一觀的道長(zhǎng)傻唾。 經(jīng)常有香客問我,道長(zhǎng)承耿,這世上最難降的妖魔是什么冠骄? 我笑而不...
    開封第一講書人閱讀 58,737評(píng)論 1 294
  • 正文 為了忘掉前任,我火速辦了婚禮加袋,結(jié)果婚禮上凛辣,老公的妹妹穿的比我還像新娘。我一直安慰自己职烧,他們只是感情好扁誓,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,753評(píng)論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著蚀之,像睡著了一般蝗敢。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上足删,一...
    開封第一講書人閱讀 51,598評(píng)論 1 305
  • 那天寿谴,我揣著相機(jī)與錄音,去河邊找鬼失受。 笑死讶泰,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的拂到。 我是一名探鬼主播痪署,決...
    沈念sama閱讀 40,338評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼,長(zhǎng)吁一口氣:“原來是場(chǎng)噩夢(mèng)啊……” “哼兄旬!你這毒婦竟也來了惠桃?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,249評(píng)論 0 276
  • 序言:老撾萬榮一對(duì)情侶失蹤辖试,失蹤者是張志新(化名)和其女友劉穎辜王,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體罐孝,經(jīng)...
    沈念sama閱讀 45,696評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡呐馆,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,888評(píng)論 3 336
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了莲兢。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片汹来。...
    茶點(diǎn)故事閱讀 40,013評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡续膳,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出收班,到底是詐尸還是另有隱情坟岔,我是刑警寧澤,帶...
    沈念sama閱讀 35,731評(píng)論 5 346
  • 正文 年R本政府宣布摔桦,位于F島的核電站社付,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏邻耕。R本人自食惡果不足惜鸥咖,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,348評(píng)論 3 330
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望兄世。 院中可真熱鬧啼辣,春花似錦、人聲如沸御滩。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,929評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽削解。三九已至富弦,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間钠绍,已是汗流浹背舆声。 一陣腳步聲響...
    開封第一講書人閱讀 33,048評(píng)論 1 270
  • 我被黑心中介騙來泰國(guó)打工花沉, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留柳爽,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 48,203評(píng)論 3 370
  • 正文 我出身青樓碱屁,卻偏偏與公主長(zhǎng)得像磷脯,于是被迫代替她去往敵國(guó)和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子娩脾,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,960評(píng)論 2 355

推薦閱讀更多精彩內(nèi)容