生產(chǎn)現(xiàn)象
java服務(wù)啟動(dòng)5分鐘左右野揪,top命令查看發(fā)現(xiàn)CPU飆升到300%,居高不下
解決步驟
- 根據(jù)以前的經(jīng)驗(yàn)凿掂,先找高消耗的進(jìn)程號(hào)伴榔,再找高消耗的線(xiàn)程號(hào),查看線(xiàn)程信息找問(wèn)題庄萎。
參考線(xiàn)上服務(wù)CPU100%問(wèn)題快速定位實(shí)戰(zhàn) - 根據(jù)線(xiàn)程號(hào)找到j(luò)stack里面的信息如下:
"https-jsse-nio-9000-ClientPoller-1" #85 daemon prio=5 os_prio=0 tid=0x00007f3a3d657000 nid=0x1a23 runnable [0x00007f3a14ed9000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x000000008740ef68> (a sun.nio.ch.Util$2)
- locked <0x000000008740ef58> (a java.util.Collections$UnmodifiableSet)
- locked <0x000000008740edb0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:791)
at java.lang.Thread.run(Thread.java:745)
"https-jsse-nio-9000-ClientPoller-0" #84 daemon prio=5 os_prio=0 tid=0x00007f3a3d655800 nid=0x1a22 runnable [0x00007f3a14fda000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x000000008740f5d8> (a sun.nio.ch.Util$2)
- locked <0x000000008740f5c8> (a java.util.Collections$UnmodifiableSet)
- locked <0x000000008740f420> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:791)
at java.lang.Thread.run(Thread.java:745)
發(fā)現(xiàn)兩個(gè)線(xiàn)程都是tomcat的NIO問(wèn)題踪少,可以肯定不是我們的業(yè)務(wù)代碼引起的,但不是特別敢懷疑是tomcat的bug糠涛。根據(jù)以往的經(jīng)驗(yàn)援奢,首先懷疑是locked里面的東西互鎖了,但是查看這些locked的id并沒(méi)有找到其他線(xiàn)程有一樣的id脱羡。
- 正當(dāng)沒(méi)有頭緒的時(shí)候萝究,發(fā)現(xiàn)整個(gè)jstack里面只有這兩個(gè)線(xiàn)程是屬于"ClientPoller"類(lèi)型的免都,從此入手,在網(wǎng)上找到一篇阿里中間件團(tuán)隊(duì)的一篇文章:斷網(wǎng)故障時(shí)Mtop觸發(fā)tomcat高并發(fā)場(chǎng)景下的BUG排查和修復(fù)帆竹,跟我們線(xiàn)上環(huán)境比較像绕娘,開(kāi)始懷疑是tomcat版本問(wèn)題。但是文章中的tomcat版本是7.0.54栽连,阿里已經(jīng)跟apache反饋該bug险领,按理說(shuō)在7版本應(yīng)該已經(jīng)修復(fù)了,而我們的tomcat版本是8.5.4版本秒紧,難道這個(gè)問(wèn)題到8.5.4版本還沒(méi)修復(fù)绢陌?
后來(lái)百度實(shí)在找不到答案,google來(lái)解決熔恢。找到如下文章脐湾,正好確認(rèn)是8.5.4版本問(wèn)題
High CPU load with the JSSE client poller on Tomcat 8.5
- 最后,我們升級(jí)到8.5.11版本之后叙淌,問(wèn)題得到了解決秤掌。
總結(jié)
遇到問(wèn)題要敢于懷疑,即使是大神寫(xiě)的代碼鹰霍,在中間某些版本修復(fù)了某些問(wèn)題闻鉴,不見(jiàn)得不會(huì)在后面版本迭代更新之后再次出現(xiàn)。