HiveServer2連接ZooKeeper出現(xiàn)Too many connections問題的解決
作者:大圓那些事| 文章可以轉(zhuǎn)載埂材,請以超鏈接形式標(biāo)明文章原始出處和作者信息
HiveServer2支持多客戶端的并發(fā)訪問蚓炬,使用ZooKeeper來管理Hive表的讀寫鎖。實際環(huán)境中惊搏,遇到了HiveServer2連接ZooKeeper出現(xiàn)Too many connections的問題,這里是對這一問題的排查和解決過程九妈。
問題描述
HiveServer2服務(wù)無法執(zhí)行hive命令飒房,日志中提示如下錯誤:
2013-03-2212:54:43,946WARN? zookeeper.ClientCnxn (ClientCnxn.java:run(1089)) - Session0x0forserver hostname/***.***.***.***:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
? ? ? ? at sun.nio.ch.FileDispatcher.read0(Native Method)
? ? ? ? at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
? ? ? ? at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
? ? ? ? at sun.nio.ch.IOUtil.read(IOUtil.java:200)
? ? ? ? at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
? ? ? ? at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
? ? ? ? at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
? ? ? ? at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
問題排查
1. 首先,根據(jù)HiveServer2的錯誤日志葱轩,提示是由于Connection reset by peer睦焕,即連接被ZooKeeper拒絕。
2. 進(jìn)一步查看HiveServer2上所配置的ZooKeeper集群日志(用戶Hive表的讀寫鎖管理)靴拱,發(fā)現(xiàn)如下錯誤信息:
2013-03-2212:52:48,938[myid:] - WARN? [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /***.***.***.***- max is50
3. 結(jié)合HiveServer2的日志垃喊,可見是由于HiveServer2所在機(jī)器對ZooKeeper的連接數(shù)超過了ZooKeeper設(shè)置允許的單個client最大連接數(shù)(這里是50)。
4. 我們進(jìn)一步確認(rèn)了是不是完全都是HiveServer2占用了這50個連接袜炕,顯示確實是HiveServer2進(jìn)程內(nèi)部占用了這50個連接(進(jìn)程號26871即為HiveServer2進(jìn)程):
[user@hostname ~]$sudonetstat -nap? |grep2181tcp? ? 00***.***.***.***:58089***.***.***.***:2181ESTABLISHED26871/java? ? ? ? ?
tcp? ? 00***.***.***.***:57837***.***.***.***:2181ESTABLISHED26871/java? ? ? ? ?
tcp? ? 00***.***.***.***:57853***.***.***.***:2181ESTABLISHED26871/java? ? ? ?
……
(共計50個)
5. 為什么HiveServer2會占用這么多連接本谜?而實際并發(fā)請求量并沒有這么多。只能從HiveServer2的實現(xiàn)原理找找線索偎窘,由于HiveServer2是通過Thrift實現(xiàn)的乌助,懷疑是不是其內(nèi)部維護(hù)連接池導(dǎo)致的?經(jīng)過查看hive-default.xml中發(fā)現(xiàn)陌知,其中默認(rèn)配置了工作線程數(shù)(這里猜測每個工作線程會維護(hù)一個與ZooKeeper的連接他托,有待從代碼級別進(jìn)行驗證):
hive.server2.thrift.min.worker.threads5Minimum number of Thrift worker threadshive.server2.thrift.max.worker.threads100Maximum number of Thrift worker threads
問題解決
方法一:
通過在hive-site.xml中修改HiveServer2的Thrift工作線程數(shù),減少與ZooKeeper的連接請求數(shù)仆葡。這樣可能降低HiveServer2的并發(fā)處理能力赏参。
方法二:
通過修改ZooKeeper的zoo.cfg文件中的maxClientCnxns選項,調(diào)大對于單個Client的連接數(shù)限制。
以上兩個方法登刺,需要根據(jù)自己的實際生產(chǎn)情況進(jìn)行合理設(shè)置。
相關(guān)的配置選項:
1)hive-site.xml中:
hive.server2.thrift.min.worker.threads10Minimum number of Thrift worker threadshive.server2.thrift.max.worker.threads200Maximum number of Thrift worker threadshive.zookeeper.session.timeout60000Zookeeper client's session timeout. The client is disconnected, and as a result, all locks released, if a heartbeat is not sent in the timeout.
2)zoo.cfg中:
# Limits the number of concurrent connections (at the socket level) that a single client, identified by IP address
maxClientCnxns=200# The minimum session timeout in milliseconds that the server will allow the client to negotiate
minSessionTimeout=1000# The maximum session timeout in milliseconds that the server will allow the client to negotiate
maxSessionTimeout=60000