Resource Manager YGC停頓過長(zhǎng)分析

動(dòng)機(jī)

昨天又收到有同學(xué)因數(shù)據(jù)(代碼)問題被rccd的郵件轻抱,領(lǐng)導(dǎo)也一再提醒數(shù)據(jù)的安全問題。于是審視了一下硬盤文件,以免存在無心之過… 審視過程中發(fā)現(xiàn)了部分日志信息及排錯(cuò)過程中的一些進(jìn)程狀態(tài)信息,雖不是什么敏感數(shù)據(jù),為了不必要的麻煩決定予以清理雁比,但其中部分信息棄之有些可惜,如排查Yarn RM YGC問題的一些jmap撤嫩,jstack等信息偎捎,由于該問題通過更換了更高性能的機(jī)器得以解決,未繼續(xù)進(jìn)行深入排查序攘,這部分來之不易信息也就并未派上用場(chǎng)茴她。因此,在清理之前決定結(jié)合該部分信息重新梳理分析一下Yarn RM YGC問題程奠。

重溫問題解決之道

//未保存現(xiàn)場(chǎng)丈牢,聊天記錄中截取
2018-01-29T19:09:58.294+0800: 15752.474: [GC (Allocation Failure) [PSYoungGen: 40755358K->2823487K(50331648K)] 56773140K->18863096K(71303168K), 12.4035418 secs] [Times: user=222.88 sys=0.03, real=12.40 secs]
2018-01-29T19:10:38.945+0800: 15793.125: [GC (Allocation Failure) [PSYoungGen: 40572223K->2694856K(50331648K)] 56611832K->18751652K(71303168K), 11.5971746 secs] [Times: user=208.62 sys=0.03, real=11.60 secs]
2018-01-29T19:11:23.713+0800: 15837.893: [GC (Allocation Failure) [PSYoungGen: 40443592K->2834225K(50331648K)] 56500388K->18915647K(71303168K), 12.2938357 secs] [Times: user=221.24 sys=0.00, real=12.29 secs]
2018-01-29T19:12:05.063+0800: 15879.243: [GC (Allocation Failure) [PSYoungGen: 40582961K->2755339K(50331648K)] 56664383K->18858245K(71303168K), 11.9166865 secs] [Times: user=214.42 sys=0.02, real=11.91 secs]

另外, RM日志中存在event-queue堆集問題

如上所示瞄沙,Yarn RM YGC時(shí)長(zhǎng)達(dá)10s赡麦,高峰期可達(dá)20s以上。因此帕识,性能問題不言而喻泛粹。

先前問題解決方案的制定主要從GC調(diào)優(yōu)角度進(jìn)行優(yōu)化,JVM堆內(nèi)存從40G變遷到了80G肮疗,為使對(duì)象盡可能止步在YGC晶姊,Young區(qū)(Young Gen)內(nèi)存也經(jīng)歷了10g-20g-40g-60g幾次飛躍。GC參數(shù)也換了一茬又一茬伪货,然而并未從根本上解決問題们衙。由于認(rèn)定是由于CPU性能問題導(dǎo)致GC效率低下,因此碱呼,超線程利器也上陣了蒙挑,而問題依舊;最后愚臀,辛勤的OP同學(xué)換了更高性能的服務(wù)器忆蚀,GC時(shí)長(zhǎng)得以控制在毫秒級(jí)。

目前姑裂,GC配制情況如下:

[#9#xg-ResouceManager2#yarn@rmserver ~/cluster-data/logs]$ jcmd 27736 VM.flags
27736:
-XX:CICompilerCount=15 -XX:GCLogFileSize=536870912  
-XX:InitialHeapSize=85899345920  
-XX:MaxHeapSize=85899345920
-XX:MaxNewSize=64424509440  //young區(qū) 60g
-XX:MinHeapDeltaBytes=524288 -XX:NewSize=64424509440 
-XX:NumberOfGCLogFiles=2 -XX:OldSize=21474836480  
-XX:ParGCCardsPerStrideChunk=4096 -XX:+PrintGC  
-XX:+PrintGCDateStamps -XX:+PrintGCDetails  
-XX:+PrintGCTimeStamps  
-XX:SurvivorRatio=3   //單個(gè)survivor 12g
-XX:+UnlockDiagnosticVMOptions  
-XX:-UseAdaptiveSizePolicy  
-XX:+UseFastUnorderedTimeStamps  
-XX:+UseGCLogFileRotation  
-XX:+UseParallelGC  // parallel Scavenge + Serial Old(單線程)

如此的參數(shù)配制竟然有效馋袜,真是讓我大開眼界,也顛覆我傳統(tǒng)的認(rèn)知:Old區(qū)比Young區(qū)大舶斧,那我只能說盡信書則不如無書欣鳖。

重新梳理問題

問題緩解之后,又轉(zhuǎn)而去解決其它問題了茴厉,沒有去深究該問題泽台,在檢查數(shù)據(jù)安全的背景下什荣,又將該問題拉回了視野中,一直感覺換機(jī)不是根本解決之道怀酷,因此決定重新梳理問題稻爬,所幸參與解決問題過程中,為方便分析問題一些日志拉到了本地胰坟,因此也得以保存因篇,遺憾的是保存的信息量不全泞辐,也非同一日的完整信息體笔横。雖如此,問題外顯特征一直相同咐吼,屬于同一問題吹缔,因此也能說明一定問題。

Young區(qū)大小與YGC時(shí)長(zhǎng)的關(guān)系

通過分析得以存留的GC日志信息锯茄,發(fā)現(xiàn)YGC時(shí)長(zhǎng)與YGC后Young區(qū)的Young區(qū)的大小存在正比關(guān)系厢塘,即剩余的量越大,YGC越長(zhǎng)肌幽,為了更直觀的說明該結(jié)論晚碾,我從GC日志中截取了一段連續(xù)的存在波動(dòng)的日志信息。

2018-01-24T15:05:12.055+0800: 2780.121: [GC (Allocation Failure) [PSYoungGen: 8344459K->484310K(9175040K)] 13672674K->5830336K(40632320K), 0.6421833 secs] [Times: user=6.34 sys=0.03, real=0.64 secs] 
2018-01-24T15:05:18.403+0800: 2786.468: [GC (Allocation Failure) [PSYoungGen: 8348630K->493586K(9175040K)] 13694656K->5852127K(40632320K), 0.6354490 secs] [Times: user=6.30 sys=0.01, real=0.64 secs] 
2018-01-24T15:05:24.810+0800: 2792.876: [GC (Allocation Failure) [PSYoungGen: 8357906K->507046K(9175040K)] 13716447K->5876773K(40632320K), 0.7877401 secs] [Times: user=7.85 sys=0.01, real=0.79 secs] 
2018-01-24T15:05:31.743+0800: 2799.808: [GC (Allocation Failure) [PSYoungGen: 8371366K->537442K(9175040K)] 13741093K->5919236K(40632320K), 0.8498489 secs] [Times: user=8.47 sys=0.01, real=0.85 secs] 
2018-01-24T15:05:38.765+0800: 2806.831: [GC (Allocation Failure) [PSYoungGen: 8401762K->567214K(9175040K)] 13783556K->5962118K(40632320K), 0.9487537 secs] [Times: user=9.44 sys=0.00, real=0.95 secs] 
2018-01-24T15:05:45.791+0800: 2813.856: [GC (Allocation Failure) [PSYoungGen: 8431534K->583819K(9175040K)] 13826438K->5988521K(40632320K), 1.0380007 secs] [Times: user=10.35 sys=0.00, real=1.03 secs] 
2018-01-24T15:05:52.966+0800: 2821.031: [GC (Allocation Failure) [PSYoungGen: 8448139K->556379K(9175040K)] 13852841K->5976631K(40632320K), 1.1442949 secs] [Times: user=11.43 sys=0.00, real=1.14 secs] 
2018-01-24T15:06:00.030+0800: 2828.095: [GC (Allocation Failure) [PSYoungGen: 8420699K->555026K(9175040K)] 13840951K->5984419K(40632320K), 1.1077380 secs] [Times: user=11.06 sys=0.00, real=1.11 secs] 
2018-01-24T15:06:06.335+0800: 2834.400: [GC (Allocation Failure) [PSYoungGen: 8419346K->590867K(9175040K)] 13848739K->6037725K(40632320K), 1.2386294 secs] [Times: user=12.38 sys=0.00, real=1.23 secs] 
2018-01-24T15:06:13.638+0800: 2841.703: [GC (Allocation Failure) [PSYoungGen: 8455187K->563957K(9175040K)] 13902045K->6021894K(40632320K), 1.1187463 secs] [Times: user=11.17 sys=0.00, real=1.12 secs] 
2018-01-24T15:06:21.030+0800: 2849.095: [GC (Allocation Failure) [PSYoungGen: 8428277K->462155K(9175040K)] 13886214K->5928377K(40632320K), 0.6891712 secs] [Times: user=6.86 sys=0.02, real=0.69 secs] 
2018-01-24T15:06:27.777+0800: 2855.842: [GC (Allocation Failure) [PSYoungGen: 8326475K->494792K(9175040K)] 13792697K->5968171K(40632320K), 0.7356112 secs] [Times: user=7.33 sys=0.00, real=0.73 secs] 
2018-01-24T15:06:34.361+0800: 2862.426: [GC (Allocation Failure) [PSYoungGen: 8359112K->472731K(9175040K)] 13832491K->5956080K(40632320K), 0.6683025 secs] [Times: user=6.64 sys=0.00, real=0.67 secs] 
2018-01-24T15:06:41.069+0800: 2869.134: [GC (Allocation Failure) [PSYoungGen: 8337051K->483344K(9175040K)] 13820400K->5978220K(40632320K), 0.6570753 secs] [Times: user=6.51 sys=0.02, real=0.65 secs] 
2018-01-24T15:06:47.772+0800: 2875.837: [GC (Allocation Failure) [PSYoungGen: 8347664K->465064K(9175040K)] 13842540K->5974059K(40632320K), 0.5663590 secs] [Times: user=5.63 sys=0.01, real=0.56 secs] 
2018-01-24T15:06:54.389+0800: 2882.455: [GC (Allocation Failure) [PSYoungGen: 8329384K->454076K(9175040K)] 13838379K->5973293K(40632320K), 0.5559584 secs] [Times: user=5.52 sys=0.03, real=0.56 secs] 
2018-01-24T15:07:01.635+0800: 2889.700: [GC (Allocation Failure) [PSYoungGen: 8318396K->521880K(9175040K)] 13837613K->6056761K(40632320K), 1.0041698 secs] [Times: user=10.02 sys=0.00, real=1.00 secs] 
2018-01-24T15:07:08.663+0800: 2896.729: [GC (Allocation Failure) [PSYoungGen: 8386200K->520605K(9175040K)] 13921081K->6065095K(40632320K), 1.0463873 secs] [Times: user=10.45 sys=0.00, real=1.04 secs] 
2018-01-24T15:07:15.796+0800: 2903.861: [GC (Allocation Failure) [PSYoungGen: 8384925K->529044K(9175040K)] 13929415K->6087207K(40632320K), 1.0115040 secs] [Times: user=10.01 sys=0.00, real=1.01 secs] 
2018-01-24T15:07:23.222+0800: 2911.288: [GC (Allocation Failure) [PSYoungGen: 8393364K->534805K(9175040K)] 13951527K->6113778K(40632320K), 1.0747669 secs] [Times: user=10.73 sys=0.01, real=1.08 secs] 
2018-01-24T15:07:30.332+0800: 2918.397: [GC (Allocation Failure) [PSYoungGen: 8399125K->489447K(9175040K)] 13978098K->6079306K(40632320K), 0.9554071 secs] [Times: user=9.54 sys=0.00, real=0.96 secs] 
2018-01-24T15:07:37.296+0800: 2925.361: [GC (Allocation Failure) [PSYoungGen: 8353767K->442958K(9175040K)] 13943626K->6041227K(40632320K), 0.6691608 secs] [Times: user=6.68 sys=0.00, real=0.67 secs] 
2018-01-24T15:07:44.914+0800: 2932.979: [GC (Allocation Failure) [PSYoungGen: 8307278K->437538K(9175040K)] 13905547K->6050795K(40632320K), 0.7124929 secs] [Times: user=7.07 sys=0.01, real=0.71 secs] 
2018-01-24T15:07:51.793+0800: 2939.858: [GC (Allocation Failure) [PSYoungGen: 8301858K->473921K(9175040K)] 13915115K->6098001K(40632320K), 0.7684869 secs] [Times: user=7.64 sys=0.00, real=0.77 secs] 
2018-01-24T15:07:58.598+0800: 2946.663: [GC (Allocation Failure) [PSYoungGen: 8338241K->478071K(9175040K)] 13962321K->6110603K(40632320K), 0.7639746 secs] [Times: user=7.63 sys=0.00, real=0.77 secs] 
2018-01-24T15:08:04.382+0800: 2952.448: [GC (Allocation Failure) [PSYoungGen: 8342391K->479205K(9175040K)] 13974923K->6122021K(40632320K), 0.8109380 secs] [Times: user=8.09 sys=0.00, real=0.81 secs] 
2018-01-24T15:08:11.056+0800: 2959.121: [GC (Allocation Failure) [PSYoungGen: 8343525K->457732K(9175040K)] 13986341K->6113836K(40632320K), 0.7351115 secs] [Times: user=7.33 sys=0.01, real=0.73 secs] 
2018-01-24T15:08:17.686+0800: 2965.751: [GC (Allocation Failure) [PSYoungGen: 8322052K->452644K(9175040K)] 13978156K->6117093K(40632320K), 0.6676740 secs] [Times: user=6.65 sys=0.01, real=0.67 secs] 
2018-01-24T15:08:24.360+0800: 2972.425: [GC (Allocation Failure) [PSYoungGen: 8316964K->448014K(9175040K)] 13981413K->6125855K(40632320K), 0.6466176 secs] [Times: user=6.45 sys=0.01, real=0.65 secs] 
2018-01-24T15:08:30.842+0800: 2978.907: [GC (Allocation Failure) [PSYoungGen: 8312334K->443836K(9175040K)] 13990175K->6132684K(40632320K), 0.6180497 secs] [Times: user=6.12 sys=0.01, real=0.62 secs] 
2018-01-24T15:08:37.439+0800: 2985.504: [GC (Allocation Failure) [PSYoungGen: 8308156K->428507K(9175040K)] 13997004K->6126109K(40632320K), 0.5974813 secs] [Times: user=5.94 sys=0.02, real=0.60 secs] 
2018-01-24T15:08:44.154+0800: 2992.219: [GC (Allocation Failure) [PSYoungGen: 8292827K->447938K(9175040K)] 13990429K->6152131K(40632320K), 0.6153070 secs] [Times: user=6.12 sys=0.01, real=0.62 secs] 
2018-01-24T15:08:50.816+0800: 2998.882: [GC (Allocation Failure) [PSYoungGen: 8312258K->464488K(9175040K)] 14016451K->6183309K(40632320K), 0.6873898 secs] [Times: user=6.85 sys=0.02, real=0.69 secs] 
2018-01-24T15:08:57.834+0800: 3005.900: [GC (Allocation Failure) [PSYoungGen: 8328808K->503555K(9175040K)] 14047629K->6232096K(40632320K), 1.0393343 secs] [Times: user=10.52 sys=0.00, real=1.04 secs] 
2018-01-24T15:09:06.484+0800: 3014.549: [GC (Allocation Failure) [PSYoungGen: 8367875K->653313K(9175040K)] 14096416K->6390015K(40632320K), 1.8326990 secs] [Times: user=18.32 sys=0.00, real=1.83 secs] 
2018-01-24T15:09:14.517+0800: 3022.582: [GC (Allocation Failure) [PSYoungGen: 8517633K->694205K(9175040K)] 14254335K->6439862K(40632320K), 2.0075142 secs] [Times: user=20.07 sys=0.00, real=2.01 secs] 
2018-01-24T15:09:22.442+0800: 3030.508: [GC (Allocation Failure) [PSYoungGen: 8558525K->680568K(9175040K)] 14304182K->6432288K(40632320K), 1.9823928 secs] [Times: user=19.76 sys=0.01, real=1.99 secs] 
2018-01-24T15:09:30.425+0800: 3038.490: [GC (Allocation Failure) [PSYoungGen: 8544888K->682694K(9175040K)] 14296608K->6442548K(40632320K), 1.9901893 secs] [Times: user=19.89 sys=0.00, real=1.99 secs] 
2018-01-24T15:09:38.166+0800: 3046.232: [GC (Allocation Failure) [PSYoungGen: 8547014K->641436K(9175040K)] 14306868K->6413962K(40632320K), 1.7414826 secs] [Times: user=17.53 sys=0.01, real=1.74 secs] 
2018-01-24T15:09:45.767+0800: 3053.833: [GC (Allocation Failure) [PSYoungGen: 8505756K->661152K(9175040K)] 14278282K->6453992K(40632320K), 1.8809221 secs] [Times: user=18.77 sys=0.00, real=1.88 secs] 
2018-01-24T15:09:53.823+0800: 3061.888: [GC (Allocation Failure) [PSYoungGen: 8525472K->701454K(9175040K)] 14318312K->6504832K(40632320K), 2.0521165 secs] [Times: user=20.50 sys=0.00, real=2.05 secs] 
2018-01-24T15:10:01.415+0800: 3069.480: [GC (Allocation Failure) [PSYoungGen: 8565774K->739864K(9175040K)] 14369152K->6553519K(40632320K), 2.4033559 secs] [Times: user=24.03 sys=0.00, real=2.40 secs] 
2018-01-24T15:10:09.770+0800: 3077.835: [GC (Allocation Failure) [PSYoungGen: 8604184K->773748K(9175040K)] 14417839K->6599545K(40632320K), 2.5234427 secs] [Times: user=25.22 sys=0.00, real=2.53 secs] 
2018-01-24T15:10:18.510+0800: 3086.576: [GC (Allocation Failure) [PSYoungGen: 8638068K->793254K(9175040K)] 14463865K->6629059K(40632320K), 2.6823239 secs] [Times: user=26.90 sys=0.00, real=2.69 secs] 
2018-01-24T15:10:28.140+0800: 3096.205: [GC (Allocation Failure) [PSYoungGen: 8657574K->860322K(9175040K)] 14493379K->6706843K(40632320K), 3.0401675 secs] [Times: user=30.12 sys=0.00, real=3.04 secs] 
2018-01-24T15:10:37.600+0800: 3105.665: [GC (Allocation Failure) [PSYoungGen: 8724642K->915220K(9175040K)] 14571163K->6770802K(40632320K), 3.2758327 secs] [Times: user=32.77 sys=0.00, real=3.28 secs] 
2018-01-24T15:10:46.836+0800: 3114.901: [GC (Allocation Failure) [PSYoungGen: 8779540K->904150K(9175040K)] 14635122K->6769113K(40632320K), 3.1770298 secs] [Times: user=31.78 sys=0.00, real=3.17 secs] 

為了解釋上述現(xiàn)象喂急,特對(duì)YGC的停頓進(jìn)行了調(diào)研格嘁,YGC 停頓時(shí)長(zhǎng)滿足如下公式:

Tyoung = Tstack_scan + Tcard_scan + Told_scan+ Tcopy 

其中:
    Tyoung YGC總停頓;  

    Tstack_scan 棧上根對(duì)象掃描時(shí)間,可以認(rèn)為是一個(gè)應(yīng)用指定的常數(shù);  
    
    Tcard_scan  掃描臟表的時(shí)間, 與old區(qū)大小有關(guān),另外廊移,Young區(qū)對(duì)象的移動(dòng)后糕簿,必須要更新指向他們的引用及寫入臟表,確保下次Minor GC的時(shí)候狡孔,臟表可以指向新生代的引用(全集);  
    
    Told_scan  老年代跟對(duì)象掃描時(shí)間懂诗,和臟表的數(shù)據(jù)量有關(guān); 

    Tcopy  復(fù)制存活對(duì)象的時(shí)間,與堆中存活數(shù)據(jù)量成正比;

因此,在其它變量一定的情況下苗膝,YGC(停頓)時(shí)長(zhǎng)與堆中存活對(duì)象(Tcard_scan)及數(shù)據(jù)量(Tcopy)正相關(guān)殃恒。

影響YGC的主要對(duì)象構(gòu)成

上一小節(jié)提到,在其它變量一定的情況下辱揭,YGC停頓時(shí)長(zhǎng)與堆中存活對(duì)象及數(shù)據(jù)量正相關(guān)芋类,那RM運(yùn)行中那些對(duì)象可能會(huì)對(duì)GC造成較大的影響呢,為了解答這個(gè)問題界阁,特在換機(jī)前申請(qǐng)了在線上執(zhí)行jmap的命令的機(jī)會(huì)(jmap -histo可能會(huì)造成進(jìn)程進(jìn)入暫停狀態(tài),加live選項(xiàng)時(shí)還會(huì)引發(fā)FullGC, 屬于較危險(xiǎn)的操作, 本來相多搞兩次看一下對(duì)象變化呢侯繁,最后……),截取量較大的前20條記錄如下所示:

    1:     118404863    15622861752  [C
   2:      16268603     2992477680  [B
   3:      69146586     2765863440  java.util.ArrayList
   4:      33872192     2283061128  [Ljava.lang.Object;
   5:      45116321     1443722272  java.lang.String
   6:       7607286     1377877792  [I
   7:       8787512     1195101632  org.apache.hadoop.yarn.proto.YarnProtos$ResourceRequestProto$Builder
   8:      22939461     1101094128  org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeFinishedContainersPulledByAMEvent
   9:       8728662      907780848  org.apache.hadoop.yarn.proto.YarnProtos$ResourceRequestProto
  10:      23550270      753608640  java.util.concurrent.LinkedBlockingQueue$Node
  11:      22774040      728769280  java.lang.StringBuilder
  12:      10450545      668834880  org.apache.hadoop.yarn.proto.YarnProtos$ResourceProto$Builder
  13:       8787512      562400768  org.apache.hadoop.yarn.api.records.impl.pb.ResourceRequestPBImpl
  14:      11461810      550166880  java.util.concurrent.ConcurrentHashMap$Node
  15:       9652655      540548680  org.apache.hadoop.yarn.proto.YarnProtos$ResourceProto
  16:       8780794      491724464  org.apache.hadoop.yarn.proto.YarnProtos$PriorityProto
  17:      10946511      437860440  java.util.ArrayList$Itr
  18:      10598259      423930360  org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl
  19:       8780666      351226640  org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl
  20:       3672732      264436704  org.apache.hadoop.yarn.proto.YarnProtos$ContainerIdProto

通過jmap -histo信息可知泡躯,數(shù)量和數(shù)據(jù)量均較大的有價(jià)值的對(duì)象有第3贮竟、8等丽焊,像Object這類JAVA基本對(duì)象價(jià)值不是太大,而ArrayList也為JAVA內(nèi)置對(duì)象代碼咕别,Yarn源碼中難以確定影響較大的位置技健,又RM日志中存在Event-Queue事件堆積現(xiàn)象,因此惰拱,從數(shù)量與數(shù)據(jù)量均較大的Yarn專屬對(duì)象RMNodeFinishedContainersPulledByAMEvent入手進(jìn)行排查雌贱。

RMNodeFinishedContainersPulledByAMEvent對(duì)象來源分析

由于對(duì)其一無所知,就先搜了一下該對(duì)象偿短,發(fā)現(xiàn)社區(qū)有類似的issue存在(issue: RM is flooded with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event queue), 視其描述與我們情況大體現(xiàn)同(RM切換前后描述跟我們也基本一致)欣孤。

因此堅(jiān)定了RMNodeFinishedContainersPulledByAMEvent對(duì)象為YGC問題的主要禍源。

通過IDE搜索了該對(duì)象昔逗,該對(duì)象只在RMAppAttemptImpl中進(jìn)行創(chuàng)建降传,視其創(chuàng)建代碼,也找到了大量ArrayList對(duì)象存在的原因勾怒。

 // Ack NM to remove finished containers from context.
  private void sendFinishedContainersToNM() {
    for (NodeId nodeId : finishedContainersSentToAM.keySet()) {

      // Clear and get current values
      List<ContainerStatus> currentSentContainers =
          finishedContainersSentToAM.put(nodeId,
            new ArrayList<ContainerStatus>());
      List<ContainerId> containerIdList =
          new ArrayList<ContainerId>(currentSentContainers.size());
      for (ContainerStatus containerStatus : currentSentContainers) {
        containerIdList.add(containerStatus.getContainerId());
      }
      eventHandler.handle(new RMNodeFinishedContainersPulledByAMEvent(nodeId,
        containerIdList));
    }
  }


該方法的調(diào)用層次結(jié)構(gòu)為(idea快捷鍵:Command + Shift + H)

sendFinishedContainersToNM

其中一支調(diào)用層次如下所示:

RMAppAttemptimpl.sendFinishedContainersToNM()
RMAppAttemptimpl.pullJustFinishedContainers()
ApplicationMasterService.allocate()

ApplicationMasterService運(yùn)行于RM服務(wù)端婆排,用于處理AM(ApplicationMaster)的請(qǐng)求,其中allocate方法用于處理AM的資源請(qǐng)求笔链。

@Override
  public AllocateResponse allocate(AllocateRequest request)
      throws YarnException, IOException {
      
         ......
         
      allocateResponse.setAllocatedContainers(allocation.getContainers());
      allocateResponse.setCompletedContainersStatuses(appAttempt
          .pullJustFinishedContainers());
      allocateResponse.setResponseId(lastResponse.getResponseId() + 1);
      
      ......
      
    }

通過其實(shí)現(xiàn)可知段只,AM通過 allocate()伸請(qǐng)資源時(shí),都將會(huì)調(diào)用pullJustFinishedContainers方法鉴扫。

而pullJustFinishedContainers用以拉取結(jié)束的Containers赞枕。通過分析其實(shí)現(xiàn)邏輯,發(fā)現(xiàn)一個(gè)有意思的事幔妨,每個(gè)RMAppAttemptImpl維護(hù)了justFinishedContainers和finishedContainersSentToAM兩個(gè)列表鹦赎,列表用于記錄上次請(qǐng)求之后結(jié)束的Container,做最后的處理误堡。然而古话,代碼實(shí)現(xiàn)中卻只有向列表中存儲(chǔ)數(shù)據(jù)的邏輯,而沒有清理數(shù)據(jù)的邏輯锁施,這就導(dǎo)致AM存活期間justFinishedContainers及finishedContainersSentToAM列表中數(shù)據(jù)項(xiàng)越來越多陪踩,而AM申請(qǐng)資源時(shí),每次都將會(huì)調(diào)用 sendFinishedContainersToNM對(duì)finishedContainersSentToAM列表中所有記錄創(chuàng)建RMNodeFinishedContainersPulledByAMEvent對(duì)象悉抵,而早期被處理過的歷史記錄可能在應(yīng)用結(jié)束之前多次創(chuàng)建該對(duì)象(無效對(duì)象)肩狂。因此,會(huì)出現(xiàn)AM存活期間RMNodeFinishedContainersPulledByAMEvent對(duì)象創(chuàng)建的越來越多的現(xiàn)象姥饰。

當(dāng)同時(shí)存在的應(yīng)用數(shù)量非常大時(shí)傻谁,會(huì)由于創(chuàng)建過多的RMNodeFinishedContainersPulledByAMEvent對(duì)象,而導(dǎo)致處理不及時(shí)列粪,導(dǎo)致event-queue堆積审磁,進(jìn)而造成YGC時(shí)因存活對(duì)象數(shù)量過多而在更新臟表及Copy大量數(shù)據(jù)耗時(shí)過長(zhǎng)谈飒,導(dǎo)致YGC整體過長(zhǎng)現(xiàn)象。且大量AM存活時(shí)間越久态蒂,問題越嚴(yán)重杭措。

@Override
  public List<ContainerStatus> pullJustFinishedContainers() {
    this.writeLock.lock();

    try {
      List<ContainerStatus> returnList = new ArrayList<ContainerStatus>();

      // A new allocate means the AM received the previously sent
      // finishedContainers. We can ack this to NM now
      sendFinishedContainersToNM();

      // Mark every containerStatus as being sent to AM though we may return
      // only the ones that belong to the current attempt
      boolean keepContainersAcressAttempts = this.submissionContext
          .getKeepContainersAcrossApplicationAttempts();
      for (NodeId nodeId:justFinishedContainers.keySet()) {

        // Clear and get current values
        List<ContainerStatus> finishedContainers = justFinishedContainers.put
            (nodeId, new ArrayList<ContainerStatus>());

        if (keepContainersAcressAttempts) {
          returnList.addAll(finishedContainers);
        } else {
          // Filter out containers from previous attempt
          for (ContainerStatus containerStatus: finishedContainers) {
            if (containerStatus.getContainerId().getApplicationAttemptId()
                .equals(this.getAppAttemptId())) {
              returnList.add(containerStatus);
            }
          }
        }

        finishedContainersSentToAM.putIfAbsent(nodeId, new ArrayList
              <ContainerStatus>());
        finishedContainersSentToAM.get(nodeId).addAll(finishedContainers);
      }

      return returnList;
    } finally {
      this.writeLock.unlock();
    }
  }

解決方案

這么大的bug,之前也有人提過該問題钾恢,社區(qū)應(yīng)該有解決方案手素,于是拉了社區(qū)最新代碼,master分支轉(zhuǎn)到了RMAppAttemptImpl下瘩蚪,git對(duì)比了一下版本信息.

fuck!
最近一條更新竟然是2015/7/13日的泉懦, 難道 這幾年都沒人更新這一塊。還是我的master分支發(fā)生了什么募舟?


Hadoop Master分支-RMAppAttemptImpl GIT Show History記錄

記得之前 Hadoop 3.0發(fā)布時(shí)祠斧,更新了好多特性闻察,于是轉(zhuǎn)到了branch 3.0 重新進(jìn)行g(shù)it比較拱礁,發(fā)現(xiàn)還是有不少更新的。
Spark出身的我實(shí)在理解不了hadoop的分支是如何搞的辕漂。(issue: RM is flooded with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event queue)解決完了狀態(tài)竟然還是Open呢灶。。钉嘹。

通過比對(duì)發(fā)現(xiàn)官方通過YARN-5262 和 YARN-5483兩個(gè)Patch對(duì)上述問題進(jìn)行了修復(fù)鸯乃。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市跋涣,隨后出現(xiàn)的幾起案子缨睡,更是在濱河造成了極大的恐慌,老刑警劉巖陈辱,帶你破解...
    沈念sama閱讀 216,372評(píng)論 6 498
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件奖年,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡沛贪,警方通過查閱死者的電腦和手機(jī)陋守,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,368評(píng)論 3 392
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來利赋,“玉大人水评,你說我怎么就攤上這事∶乃停” “怎么了中燥?”我有些...
    開封第一講書人閱讀 162,415評(píng)論 0 353
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)塘偎。 經(jīng)常有香客問我疗涉,道長(zhǎng)幽纷,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,157評(píng)論 1 292
  • 正文 為了忘掉前任博敬,我火速辦了婚禮友浸,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘偏窝。我一直安慰自己剧防,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,171評(píng)論 6 388
  • 文/花漫 我一把揭開白布募判。 她就那樣靜靜地躺著姆泻,像睡著了一般。 火紅的嫁衣襯著肌膚如雪硼补。 梳的紋絲不亂的頭發(fā)上驮肉,一...
    開封第一講書人閱讀 51,125評(píng)論 1 297
  • 那天,我揣著相機(jī)與錄音已骇,去河邊找鬼离钝。 笑死,一個(gè)胖子當(dāng)著我的面吹牛褪储,可吹牛的內(nèi)容都是我干的卵渴。 我是一名探鬼主播,決...
    沈念sama閱讀 40,028評(píng)論 3 417
  • 文/蒼蘭香墨 我猛地睜開眼鲤竹,長(zhǎng)吁一口氣:“原來是場(chǎng)噩夢(mèng)啊……” “哼浪读!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起辛藻,我...
    開封第一講書人閱讀 38,887評(píng)論 0 274
  • 序言:老撾萬榮一對(duì)情侶失蹤碘橘,失蹤者是張志新(化名)和其女友劉穎,沒想到半個(gè)月后吱肌,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體痘拆,經(jīng)...
    沈念sama閱讀 45,310評(píng)論 1 310
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,533評(píng)論 2 332
  • 正文 我和宋清朗相戀三年岩榆,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了错负。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 39,690評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡勇边,死狀恐怖犹撒,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情粒褒,我是刑警寧澤识颊,帶...
    沈念sama閱讀 35,411評(píng)論 5 343
  • 正文 年R本政府宣布,位于F島的核電站,受9級(jí)特大地震影響祥款,放射性物質(zhì)發(fā)生泄漏清笨。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,004評(píng)論 3 325
  • 文/蒙蒙 一刃跛、第九天 我趴在偏房一處隱蔽的房頂上張望抠艾。 院中可真熱鬧,春花似錦桨昙、人聲如沸检号。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,659評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽齐苛。三九已至,卻和暖如春桂塞,著一層夾襖步出監(jiān)牢的瞬間凹蜂,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,812評(píng)論 1 268
  • 我被黑心中介騙來泰國(guó)打工阁危, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留玛痊,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 47,693評(píng)論 2 368
  • 正文 我出身青樓欲芹,卻偏偏與公主長(zhǎng)得像卿啡,于是被迫代替她去往敵國(guó)和親吟吝。 傳聞我的和親對(duì)象是個(gè)殘疾皇子菱父,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,577評(píng)論 2 353

推薦閱讀更多精彩內(nèi)容