接口頻繁超時定位

現(xiàn)象

接口頻繁499, 域名報警逐漸增多.

排查

  1. 查看線上日志:調(diào)用服務(wù)B接口,超時非常多,最長響應(yīng)時間幾千ms.
  2. 找到服務(wù)B負(fù)責(zé)人進(jìn)行查看,因為沒有采用相同trace方式,只能給出大概時間范圍,但是響應(yīng)時間并未有超過100ms的.考慮是網(wǎng)絡(luò)原因.
  3. 找OP查看網(wǎng)絡(luò)是否出現(xiàn)問題,自己查看ping延遲不高.等待OP回復(fù).
  4. OP說網(wǎng)絡(luò)沒問題,但是暫時沒其他思路,沒考慮應(yīng)用內(nèi)部導(dǎo)致.
  5. 偶然間查看調(diào)用服務(wù)B接口的配置,發(fā)現(xiàn)兩個參數(shù)minConnectionmaxConnection后產(chǎn)生新思路,因為引用的是服務(wù)B的sdk,內(nèi)部是不是采用了連接池,當(dāng)時配置數(shù)量較小,難道是流量上來之后獲取連接導(dǎo)致了阻塞?
  6. 找到服務(wù)B的同學(xué)確認(rèn)內(nèi)部實現(xiàn),經(jīng)確認(rèn)后并未采用連接池,這兩個參數(shù)沒作用,是保留參數(shù).
  7. 又沒什么其他思路了,考慮可能應(yīng)用內(nèi)部問題,這塊補充說明下在開始是看了gc log的發(fā)現(xiàn)一次minor gc時間在50-100ms所以排除是gc導(dǎo)致的.
  8. 實在沒其他猜測,gclog再看一遍吧!把gc log拿下來,使用gceasy.io進(jìn)行解析.
  9. 分析后發(fā)現(xiàn)產(chǎn)生了,幾次full gc,我們使用的是G1 gc,產(chǎn)生full gc是比較嚴(yán)重的現(xiàn)象.


    fullgc.png
  10. 找到發(fā)生full gc之前逐行查看,可以看到堆內(nèi)存基本上已經(jīng)滿了,對于G1 gc來說已經(jīng)是非常不健康的狀態(tài)了.
2020-03-20T11:08:24.499+0800: 1433856.095: [SoftReference, 0 refs, 0.0031353 secs]2020-03-20T11:08:24.502+0800: 1433856.098: [WeakReference, 0 refs, 0.0010686 secs]2020-03-20T11:08:24.503+0800: 1433856.099: [FinalReference, 0 refs, 0.0012377 secs]2020-03-20T11:08:24.504+0800: 1433856.101: [PhantomReference, 0 refs, 0 refs, 0.0023456 secs]2020-03-20T11:08:24.507+0800: 1433856.103: [JNI Weak Reference, 0.0000457 secs], 0.0284242 secs]
   [Parallel Time: 15.4 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 1433856077.7, Avg: 1433856077.9, Max: 1433856078.1, Diff: 0.3]
      [Ext Root Scanning (ms): Min: 7.9, Avg: 8.8, Max: 15.2, Diff: 7.3, Sum: 157.8]
      [Update RS (ms): Min: 0.0, Avg: 0.1, Max: 1.0, Diff: 1.0, Sum: 1.1]
         [Processed Buffers: Min: 0, Avg: 0.1, Max: 1, Diff: 1, Sum: 2]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 3.0]
      [Termination (ms): Min: 0.0, Avg: 6.0, Max: 6.8, Diff: 6.8, Sum: 107.2]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 18]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
      [GC Worker Total (ms): Min: 14.9, Avg: 15.1, Max: 15.2, Diff: 0.3, Sum: 271.2]
      [GC Worker End (ms): Min: 1433856093.0, Avg: 1433856093.0, Max: 1433856093.0, Diff: 0.0]
   [Code Root Fixup: 0.2 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 12.6 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 9.7 ms]
      [Ref Enq: 0.2 ms]
      [Redirty Cards: 0.3 ms]
      [Humongous Register: 0.4 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 0.0 ms]
   [Eden: 0.0B(204.0M)->0.0B(204.0M) Survivors: 0.0B->0.0B Heap: 4010.2M(4096.0M)->4010.2M(4096.0M)]
# 之前指關(guān)注了時間30ms還不錯,但是沒關(guān)注內(nèi)存的分布情況 因為設(shè)置目標(biāo)暫停時間最大為200ms,為了實現(xiàn)這個暫停時間eden區(qū)域已經(jīng)減少了非常多
 [Times: user=0.20 sys=0.00, real=0.03 secs] 
2020-03-20T11:08:24.510+0800: 1433856.106: Total time for which application threads were stopped: 0.0454852 seconds, Stopping threads took: 0.0002360 seconds
# 嘗試擴(kuò)展堆大小 但是已經(jīng)是最大了不能在擴(kuò)展了
 1433856.121: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 24 bytes]
 1433856.121: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 2097152 bytes, attempted expansion amount: 2097152 bytes]
 1433856.121: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded]
# 發(fā)生full gc原因是Allocation Failure分配失敗.
2020-03-20T11:08:24.525+0800: 1433856.122: [Full GC (Allocation Failure) 2020-03-20T11:08:25.194+0800: 1433856.790: [SoftReference, 4841 refs, 0.0011466 secs]2020-03-20T11:08:25.195+0800: 1433856.791: [WeakReference, 136779 refs, 0.0211710 secs]2020-03-20T11:08:25.216+0800: 1433856.813: [FinalReference, 2718 refs, 0.0040877 secs]2020-03-20T11:08:25.220+0800: 1433856.817: [PhantomReference, 0 refs, 1140 refs, 0.0003886 secs]2020-03-20T11:08:25.221+0800: 1433856.817: [JNI Weak Reference, 0.0000759 secs] 4010M->788M(4096M), 2.6614008 secs]
  1. 繼續(xù)向上尋找蛛絲馬跡
2020-03-20T11:08:22.214+0800: 1433853.810: [SoftReference, 0 refs, 0.0065309 secs]2020-03-20T11:08:22.221+0800: 1433853.817: [WeakReference, 0 refs, 0.0045773 secs]2020-03-20T11:08:22.225+0800: 1433853.822: [FinalReference, 0 refs, 0.0010537 secs]2020-03-20T11:08:22.226+0800: 1433853.823: [PhantomReference, 0 refs, 0 refs, 0.0019795 secs]2020-03-20T11:08:22.228+0800: 1433853.825: [JNI Weak Reference, 0.0000575 secs] 1433854.362: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 3837788160 bytes, allocation request: 0 bytes, threshold: 2147483600 bytes (50.00 %), source: end of GC]
 (to-space exhausted), 1.3927102 secs]
# 比較要命的一句話,代表已經(jīng)沒有region給我們copy對象了
   [Parallel Time: 837.7 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 1433852972.1, Avg: 1433852972.2, Max: 1433852972.3, Diff: 0.2]
      [Ext Root Scanning (ms): Min: 4.9, Avg: 6.1, Max: 13.7, Diff: 8.8, Sum: 110.1]
      [Update RS (ms): Min: 0.9, Avg: 13.2, Max: 20.0, Diff: 19.2, Sum: 237.1]
         [Processed Buffers: Min: 2, Avg: 88.1, Max: 159, Diff: 157, Sum: 1586]
      [Scan RS (ms): Min: 1.7, Avg: 3.3, Max: 3.8, Diff: 2.1, Sum: 60.3]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 799.5, Avg: 814.1, Max: 827.7, Diff: 28.1, Sum: 14653.6]
      [Termination (ms): Min: 0.0, Avg: 0.6, Max: 0.9, Diff: 0.9, Sum: 10.5]
         [Termination Attempts: Min: 1, Avg: 1.2, Max: 3, Diff: 2, Sum: 22]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.5]
      [GC Worker Total (ms): Min: 837.2, Avg: 837.4, Max: 837.6, Diff: 0.3, Sum: 15073.2]
      [GC Worker End (ms): Min: 1433853809.5, Avg: 1433853809.6, Max: 1433853809.7, Diff: 0.2]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.7 ms]
   [Other: 554.2 ms]
      [Evacuation Failure: 533.0 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 14.8 ms]
      [Ref Enq: 0.5 ms]
      [Redirty Cards: 0.8 ms]
      [Humongous Register: 0.2 ms]
      [Humongous Reclaim: 0.6 ms]
      [Free CSet: 0.7 ms]
   [Eden: 2312.0M(2316.0M)->0.0B(204.0M) Survivors: 38.0M->0.0B Heap: 3941.0M(4096.0M)->3610.3M(4096.0M)]
 [Times: user=7.13 sys=0.00, real=1.39 secs] 
 1433854.363: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 3839885312 bytes, allocation request: 0 bytes, threshold: 2147483600 bytes (50.00 %), source: STW humongous allocation]

For G1 GC, an evacuation failure is very expensive -

  • For successfully copied objects, G1 needs to update the references and the regions have to be tenured.
  • For unsuccessfully copied objects, G1 will self-forward them and tenure the regions in place.
  1. 繼續(xù)向上尋找蛛絲馬跡
2020-03-20T11:08:27.709+0800: 1433859.305: [SoftReference, 0 refs, 0.0019565 secs]2020-03-20T11:08:27.711+0800: 1433859.307: [WeakReference, 13 refs, 0.0009893 secs]2020-03-20T11:08:27.712+0800: 1433859.308: [FinalReference, 102 refs, 0.0014151 secs]2020-03-20T11:08:27.713+0800: 1433859.309: [PhantomReference, 0 refs, 64 refs, 0.0021785 secs]2020-03-20T11:08:27.715+0800: 1433859.312: [JNI Weak Reference, 0.0000454 secs] 1433859.315: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 68.78 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)]
, 0.0483084 secs]
# GC占用時間已經(jīng)到68%這么高的地步了,證明應(yīng)用執(zhí)行時間很短了,此時吞吐量是嚴(yán)重下降的.

總結(jié)

基于以上現(xiàn)象,考慮到幾種方式解決:

  1. 增加堆內(nèi)存大小, 給GC“喘息”的時間;
  2. 減小XX:InitiatingHeapOccupancyPercent參數(shù),我們設(shè)置的是50,可以減小來提前進(jìn)行concurrent marking cycle,提前mix gc;
  3. 增加處理GC的線程數(shù) XX:ConcGCThreads,加快處理GC速度,減少時間;
  4. 避免發(fā)生to-space exhausted問題,增加預(yù)留空間XX:G1ReservePercent
  5. 因為發(fā)現(xiàn)問題時正好時線上摘掉了一臺機(jī)器,那么整體來看是單臺機(jī)器流量變大導(dǎo)致的對象分配速度大于回收的速度,最終導(dǎo)致了full gc,那么我們采用了最簡單粗暴的方式擴(kuò)容,問題消失;
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末递雀,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 218,941評論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件纽什,死亡現(xiàn)場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)昭躺,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,397評論 3 395
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來伪嫁,“玉大人领炫,你說我怎么就攤上這事≌趴龋” “怎么了帝洪?”我有些...
    開封第一講書人閱讀 165,345評論 0 356
  • 文/不壞的土叔 我叫張陵,是天一觀的道長脚猾。 經(jīng)常有香客問我葱峡,道長,這世上最難降的妖魔是什么龙助? 我笑而不...
    開封第一講書人閱讀 58,851評論 1 295
  • 正文 為了忘掉前任砰奕,我火速辦了婚禮,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘军援。我一直安慰自己仅淑,他們只是感情好,可當(dāng)我...
    茶點故事閱讀 67,868評論 6 392
  • 文/花漫 我一把揭開白布胸哥。 她就那樣靜靜地躺著涯竟,像睡著了一般。 火紅的嫁衣襯著肌膚如雪空厌。 梳的紋絲不亂的頭發(fā)上庐船,一...
    開封第一講書人閱讀 51,688評論 1 305
  • 那天,我揣著相機(jī)與錄音嘲更,去河邊找鬼筐钟。 笑死,一個胖子當(dāng)著我的面吹牛哮内,可吹牛的內(nèi)容都是我干的盗棵。 我是一名探鬼主播,決...
    沈念sama閱讀 40,414評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼北发,長吁一口氣:“原來是場噩夢啊……” “哼纹因!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起琳拨,我...
    開封第一講書人閱讀 39,319評論 0 276
  • 序言:老撾萬榮一對情侶失蹤瞭恰,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后狱庇,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體惊畏,經(jīng)...
    沈念sama閱讀 45,775評論 1 315
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,945評論 3 336
  • 正文 我和宋清朗相戀三年密任,在試婚紗的時候發(fā)現(xiàn)自己被綠了颜启。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 40,096評論 1 350
  • 序言:一個原本活蹦亂跳的男人離奇死亡浪讳,死狀恐怖缰盏,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情淹遵,我是刑警寧澤口猜,帶...
    沈念sama閱讀 35,789評論 5 346
  • 正文 年R本政府宣布,位于F島的核電站透揣,受9級特大地震影響济炎,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜辐真,卻給世界環(huán)境...
    茶點故事閱讀 41,437評論 3 331
  • 文/蒙蒙 一须尚、第九天 我趴在偏房一處隱蔽的房頂上張望崖堤。 院中可真熱鬧,春花似錦恨闪、人聲如沸倘感。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,993評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至淤年,卻和暖如春钧敞,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背麸粮。 一陣腳步聲響...
    開封第一講書人閱讀 33,107評論 1 271
  • 我被黑心中介騙來泰國打工溉苛, 沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人弄诲。 一個月前我還...
    沈念sama閱讀 48,308評論 3 372
  • 正文 我出身青樓愚战,卻偏偏與公主長得像,于是被迫代替她去往敵國和親齐遵。 傳聞我的和親對象是個殘疾皇子寂玲,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 45,037評論 2 355

推薦閱讀更多精彩內(nèi)容