http://blog.csdn.net/u013991521/article/details/52781423
1、故障現(xiàn)象
客服同事反饋平臺系統(tǒng)運行緩慢府寒,網(wǎng)頁卡頓嚴(yán)重串纺,多次重啟系統(tǒng)后問題依然存在,使用top命令查看服務(wù)器情況椰棘,發(fā)現(xiàn)CPU占用率過高纺棺。
2.1邪狞、定位問題進程
使用top命令查看資源占用情況祷蝌,發(fā)現(xiàn)pid為14063的進程占用了大量的CPU資源,CPU占用率高達776.1%帆卓,內(nèi)存占用率也達到了29.8%
[ylp@ylp-web-01~]$ toptop -14:51:10up233days,11:40,7users,? load average:6.85,5.62,3.97Tasks:192total,2running,190sleeping,0stopped,0zombie%Cpu(s):97.3us,0.3sy,0.0ni,2.5id,0.0wa,0.0hi,0.0si,0.0stKiB Mem :16268652total,5114392free,6907028used,4247232buff/cacheKiB Swap:4063228total,3989708free,73520used.8751512avail Mem? PID USER? ? ? PR? NI? ? VIRT? ? RES? ? SHR S%CPU%MEMTIME+ COMMAND14063ylp20092604884.627g11976S776.129.8117:41.66java
1
2
3
4
5
6
7
8
9
2.2巨朦、定位問題線程
使用ps -mp pid -o THREAD,tid,time命令查看該進程的線程情況,發(fā)現(xiàn)該進程的多個線程占用率很高
[ylp@ylp-web-01 ~]$ps-mp14063-oTHREAD,tid,timeUSER%CPUPRISCNTWCHANUSERSYSTEMTIDTIMEylp361------02:05:58ylp0.019-futex_--14063 00:00:00ylp0.019-poll_s--14064 00:00:00ylp44.519----14065 00:15:30ylp44.519----14066 00:15:30ylp44.419----14067 00:15:29ylp44.519----14068 00:15:30ylp44.519----14069 00:15:30ylp44.519----14070 00:15:30ylp44.519----14071 00:15:30ylp44.619----14072 00:15:32ylp2.219-futex_--14073 00:00:46ylp0.019-futex_--14074 00:00:00ylp0.019-futex_--14075 00:00:00ylp0.019-futex_--14076 00:00:00ylp0.719-futex_--14077 00:00:15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
從輸出信息可以看出剑令,14065~14072之間的線程CPU占用率都很高
2.3糊啡、查看問題線程堆棧
挑選TID為14065的線程,查看該線程的堆棧情況吁津,先將線程id轉(zhuǎn)為16進制棚蓄,使用printf “%x\n” tid命令進行轉(zhuǎn)換
[ylp@ylp-web-01~]$printf"%x\n"1406536f1
1
2
再使用jstack命令打印線程堆棧信息,命令格式:jstack pid |grep tid -A 30
[ylp@ylp-web-01~]$ jstack14063|grep36f1 -A30"GC task thread#0 (ParallelGC)"prio=10tid=0x00007fa35001e800nid=0x36f1runnable"GC task thread#1 (ParallelGC)"prio=10tid=0x00007fa350020800nid=0x36f2runnable"GC task thread#2 (ParallelGC)"prio=10tid=0x00007fa350022800nid=0x36f3runnable"GC task thread#3 (ParallelGC)"prio=10tid=0x00007fa350024000nid=0x36f4runnable"GC task thread#4 (ParallelGC)"prio=10tid=0x00007fa350026000nid=0x36f5runnable"GC task thread#5 (ParallelGC)"prio=10tid=0x00007fa350028000nid=0x36f6runnable"GC task thread#6 (ParallelGC)"prio=10tid=0x00007fa350029800nid=0x36f7runnable"GC task thread#7 (ParallelGC)"prio=10tid=0x00007fa35002b800nid=0x36f8runnable"VM Periodic Task Thread"prio=10tid=0x00007fa3500a8800nid=0x3700waitingoncondition JNIglobalreferences:392
1
2
3
4
5
6
7
8
9
10
11
12
從輸出信息可以看出碍脏,此線程是JVM的gc線程梭依。此時可以基本確定是內(nèi)存不足或內(nèi)存泄露導(dǎo)致gc線程持續(xù)運行,導(dǎo)致CPU占用過高典尾。?
所以接下來我們要找的內(nèi)存方面的問題
3.1、使用jstat -gcutil命令查看進程的內(nèi)存情況
[ylp@ylp-web-01~]$ jstat -gcutil14063200010S0? ? S1EOP? ? YGC? ? YGCT? ? FGC? ? FGCT? ? GCT0.000.00100.0099.9926.314221.9172181484.8301506.7470.000.00100.0099.9926.314221.9172181484.8301506.7470.000.00100.0099.9926.314221.9172191496.5671518.4840.000.00100.0099.9926.314221.9172191496.5671518.4840.000.00100.0099.9926.314221.9172191496.5671518.4840.000.00100.0099.9926.314221.9172191496.5671518.4840.000.00100.0099.9926.314221.9172191496.5671518.4840.000.00100.0099.9926.314221.9172201505.4391527.3550.000.00100.0099.9926.314221.9172201505.4391527.3550.000.00100.0099.9926.314221.9172201505.4391527.355```? 從輸出信息可以看出钾埂,Eden區(qū)內(nèi)存占用100%河闰,Old區(qū)內(nèi)存占用99.99%科平,F(xiàn)ullGC的次數(shù)高達220次,并且頻繁FullGC姜性,F(xiàn)ullGC的持續(xù)時間也特別長匠抗,平均每次FullGC耗時6.8秒(1505.439/220)。根據(jù)這些信息污抬,基本可以確定是程序代碼上出現(xiàn)了問題汞贸,可能存在不合理創(chuàng)建對象的地方####3.2、分析堆棧使用jstat命令查看進程的堆棧情況
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[ylp@ylp-web-01 ~]$ jstat 14063 >>jstat.out?
“`?
把jstat.out文件從服務(wù)器拿到本地后印机,用編輯器查找?guī)в许椖磕夸洸⑶揖€程狀態(tài)是RUNABLE的相關(guān)信息矢腻,從圖中可以看出ActivityUtil.java類的447行正在使用HashMap.put()方法?
3.3、代碼定位
打開項目工程射赛,找到ActivityUtil類的477行多柑,代碼如下:?
找到相關(guān)同事了解后,這段代碼會從數(shù)據(jù)庫中獲取配置楣责,并根據(jù)數(shù)據(jù)庫中remain的值進行循環(huán)竣灌,在循環(huán)中會一直對HashMap進行put操作。
查詢數(shù)據(jù)庫中的配置秆麸,發(fā)現(xiàn)remain的數(shù)量巨大?
至此初嘹,問題定位完畢。