動(dòng)機(jī)
昨天又收到有同學(xué)因數(shù)據(jù)(代碼)問題被rccd的郵件轻抱,領(lǐng)導(dǎo)也一再提醒數(shù)據(jù)的安全問題。于是審視了一下硬盤文件,以免存在無心之過… 審視過程中發(fā)現(xiàn)了部分日志信息及排錯(cuò)過程中的一些進(jìn)程狀態(tài)信息,雖不是什么敏感數(shù)據(jù),為了不必要的麻煩決定予以清理雁比,但其中部分信息棄之有些可惜,如排查Yarn RM YGC問題的一些jmap撤嫩,jstack等信息偎捎,由于該問題通過更換了更高性能的機(jī)器得以解決,未繼續(xù)進(jìn)行深入排查序攘,這部分來之不易信息也就并未派上用場(chǎng)茴她。因此,在清理之前決定結(jié)合該部分信息重新梳理分析一下Yarn RM YGC問題程奠。
重溫問題解決之道
//未保存現(xiàn)場(chǎng)丈牢,聊天記錄中截取
2018-01-29T19:09:58.294+0800: 15752.474: [GC (Allocation Failure) [PSYoungGen: 40755358K->2823487K(50331648K)] 56773140K->18863096K(71303168K), 12.4035418 secs] [Times: user=222.88 sys=0.03, real=12.40 secs]
2018-01-29T19:10:38.945+0800: 15793.125: [GC (Allocation Failure) [PSYoungGen: 40572223K->2694856K(50331648K)] 56611832K->18751652K(71303168K), 11.5971746 secs] [Times: user=208.62 sys=0.03, real=11.60 secs]
2018-01-29T19:11:23.713+0800: 15837.893: [GC (Allocation Failure) [PSYoungGen: 40443592K->2834225K(50331648K)] 56500388K->18915647K(71303168K), 12.2938357 secs] [Times: user=221.24 sys=0.00, real=12.29 secs]
2018-01-29T19:12:05.063+0800: 15879.243: [GC (Allocation Failure) [PSYoungGen: 40582961K->2755339K(50331648K)] 56664383K->18858245K(71303168K), 11.9166865 secs] [Times: user=214.42 sys=0.02, real=11.91 secs]
另外, RM日志中存在event-queue堆集問題
如上所示瞄沙,Yarn RM YGC時(shí)長(zhǎng)達(dá)10s赡麦,高峰期可達(dá)20s以上。因此帕识,性能問題不言而喻泛粹。
先前問題解決方案的制定主要從GC調(diào)優(yōu)角度進(jìn)行優(yōu)化,JVM堆內(nèi)存從40G變遷到了80G肮疗,為使對(duì)象盡可能止步在YGC晶姊,Young區(qū)(Young Gen)內(nèi)存也經(jīng)歷了10g-20g-40g-60g幾次飛躍。GC參數(shù)也換了一茬又一茬伪货,然而并未從根本上解決問題们衙。由于認(rèn)定是由于CPU性能問題導(dǎo)致GC效率低下,因此碱呼,超線程利器也上陣了蒙挑,而問題依舊;最后愚臀,辛勤的OP同學(xué)換了更高性能的服務(wù)器忆蚀,GC時(shí)長(zhǎng)得以控制在毫秒級(jí)。
目前姑裂,GC配制情況如下:
[#9#xg-ResouceManager2#yarn@rmserver ~/cluster-data/logs]$ jcmd 27736 VM.flags
27736:
-XX:CICompilerCount=15 -XX:GCLogFileSize=536870912
-XX:InitialHeapSize=85899345920
-XX:MaxHeapSize=85899345920
-XX:MaxNewSize=64424509440 //young區(qū) 60g
-XX:MinHeapDeltaBytes=524288 -XX:NewSize=64424509440
-XX:NumberOfGCLogFiles=2 -XX:OldSize=21474836480
-XX:ParGCCardsPerStrideChunk=4096 -XX:+PrintGC
-XX:+PrintGCDateStamps -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:SurvivorRatio=3 //單個(gè)survivor 12g
-XX:+UnlockDiagnosticVMOptions
-XX:-UseAdaptiveSizePolicy
-XX:+UseFastUnorderedTimeStamps
-XX:+UseGCLogFileRotation
-XX:+UseParallelGC // parallel Scavenge + Serial Old(單線程)
如此的參數(shù)配制竟然有效馋袜,真是讓我大開眼界,也顛覆我傳統(tǒng)的認(rèn)知:Old區(qū)比Young區(qū)大舶斧,那我只能說盡信書則不如無書欣鳖。
重新梳理問題
問題緩解之后,又轉(zhuǎn)而去解決其它問題了茴厉,沒有去深究該問題泽台,在檢查數(shù)據(jù)安全的背景下什荣,又將該問題拉回了視野中,一直感覺換機(jī)不是根本解決之道怀酷,因此決定重新梳理問題稻爬,所幸參與解決問題過程中,為方便分析問題一些日志拉到了本地胰坟,因此也得以保存因篇,遺憾的是保存的信息量不全泞辐,也非同一日的完整信息體笔横。雖如此,問題外顯特征一直相同咐吼,屬于同一問題吹缔,因此也能說明一定問題。
Young區(qū)大小與YGC時(shí)長(zhǎng)的關(guān)系
通過分析得以存留的GC日志信息锯茄,發(fā)現(xiàn)YGC時(shí)長(zhǎng)與YGC后Young區(qū)的Young區(qū)的大小存在正比關(guān)系厢塘,即剩余的量越大,YGC越長(zhǎng)肌幽,為了更直觀的說明該結(jié)論晚碾,我從GC日志中截取了一段連續(xù)的存在波動(dòng)的日志信息。
2018-01-24T15:05:12.055+0800: 2780.121: [GC (Allocation Failure) [PSYoungGen: 8344459K->484310K(9175040K)] 13672674K->5830336K(40632320K), 0.6421833 secs] [Times: user=6.34 sys=0.03, real=0.64 secs]
2018-01-24T15:05:18.403+0800: 2786.468: [GC (Allocation Failure) [PSYoungGen: 8348630K->493586K(9175040K)] 13694656K->5852127K(40632320K), 0.6354490 secs] [Times: user=6.30 sys=0.01, real=0.64 secs]
2018-01-24T15:05:24.810+0800: 2792.876: [GC (Allocation Failure) [PSYoungGen: 8357906K->507046K(9175040K)] 13716447K->5876773K(40632320K), 0.7877401 secs] [Times: user=7.85 sys=0.01, real=0.79 secs]
2018-01-24T15:05:31.743+0800: 2799.808: [GC (Allocation Failure) [PSYoungGen: 8371366K->537442K(9175040K)] 13741093K->5919236K(40632320K), 0.8498489 secs] [Times: user=8.47 sys=0.01, real=0.85 secs]
2018-01-24T15:05:38.765+0800: 2806.831: [GC (Allocation Failure) [PSYoungGen: 8401762K->567214K(9175040K)] 13783556K->5962118K(40632320K), 0.9487537 secs] [Times: user=9.44 sys=0.00, real=0.95 secs]
2018-01-24T15:05:45.791+0800: 2813.856: [GC (Allocation Failure) [PSYoungGen: 8431534K->583819K(9175040K)] 13826438K->5988521K(40632320K), 1.0380007 secs] [Times: user=10.35 sys=0.00, real=1.03 secs]
2018-01-24T15:05:52.966+0800: 2821.031: [GC (Allocation Failure) [PSYoungGen: 8448139K->556379K(9175040K)] 13852841K->5976631K(40632320K), 1.1442949 secs] [Times: user=11.43 sys=0.00, real=1.14 secs]
2018-01-24T15:06:00.030+0800: 2828.095: [GC (Allocation Failure) [PSYoungGen: 8420699K->555026K(9175040K)] 13840951K->5984419K(40632320K), 1.1077380 secs] [Times: user=11.06 sys=0.00, real=1.11 secs]
2018-01-24T15:06:06.335+0800: 2834.400: [GC (Allocation Failure) [PSYoungGen: 8419346K->590867K(9175040K)] 13848739K->6037725K(40632320K), 1.2386294 secs] [Times: user=12.38 sys=0.00, real=1.23 secs]
2018-01-24T15:06:13.638+0800: 2841.703: [GC (Allocation Failure) [PSYoungGen: 8455187K->563957K(9175040K)] 13902045K->6021894K(40632320K), 1.1187463 secs] [Times: user=11.17 sys=0.00, real=1.12 secs]
2018-01-24T15:06:21.030+0800: 2849.095: [GC (Allocation Failure) [PSYoungGen: 8428277K->462155K(9175040K)] 13886214K->5928377K(40632320K), 0.6891712 secs] [Times: user=6.86 sys=0.02, real=0.69 secs]
2018-01-24T15:06:27.777+0800: 2855.842: [GC (Allocation Failure) [PSYoungGen: 8326475K->494792K(9175040K)] 13792697K->5968171K(40632320K), 0.7356112 secs] [Times: user=7.33 sys=0.00, real=0.73 secs]
2018-01-24T15:06:34.361+0800: 2862.426: [GC (Allocation Failure) [PSYoungGen: 8359112K->472731K(9175040K)] 13832491K->5956080K(40632320K), 0.6683025 secs] [Times: user=6.64 sys=0.00, real=0.67 secs]
2018-01-24T15:06:41.069+0800: 2869.134: [GC (Allocation Failure) [PSYoungGen: 8337051K->483344K(9175040K)] 13820400K->5978220K(40632320K), 0.6570753 secs] [Times: user=6.51 sys=0.02, real=0.65 secs]
2018-01-24T15:06:47.772+0800: 2875.837: [GC (Allocation Failure) [PSYoungGen: 8347664K->465064K(9175040K)] 13842540K->5974059K(40632320K), 0.5663590 secs] [Times: user=5.63 sys=0.01, real=0.56 secs]
2018-01-24T15:06:54.389+0800: 2882.455: [GC (Allocation Failure) [PSYoungGen: 8329384K->454076K(9175040K)] 13838379K->5973293K(40632320K), 0.5559584 secs] [Times: user=5.52 sys=0.03, real=0.56 secs]
2018-01-24T15:07:01.635+0800: 2889.700: [GC (Allocation Failure) [PSYoungGen: 8318396K->521880K(9175040K)] 13837613K->6056761K(40632320K), 1.0041698 secs] [Times: user=10.02 sys=0.00, real=1.00 secs]
2018-01-24T15:07:08.663+0800: 2896.729: [GC (Allocation Failure) [PSYoungGen: 8386200K->520605K(9175040K)] 13921081K->6065095K(40632320K), 1.0463873 secs] [Times: user=10.45 sys=0.00, real=1.04 secs]
2018-01-24T15:07:15.796+0800: 2903.861: [GC (Allocation Failure) [PSYoungGen: 8384925K->529044K(9175040K)] 13929415K->6087207K(40632320K), 1.0115040 secs] [Times: user=10.01 sys=0.00, real=1.01 secs]
2018-01-24T15:07:23.222+0800: 2911.288: [GC (Allocation Failure) [PSYoungGen: 8393364K->534805K(9175040K)] 13951527K->6113778K(40632320K), 1.0747669 secs] [Times: user=10.73 sys=0.01, real=1.08 secs]
2018-01-24T15:07:30.332+0800: 2918.397: [GC (Allocation Failure) [PSYoungGen: 8399125K->489447K(9175040K)] 13978098K->6079306K(40632320K), 0.9554071 secs] [Times: user=9.54 sys=0.00, real=0.96 secs]
2018-01-24T15:07:37.296+0800: 2925.361: [GC (Allocation Failure) [PSYoungGen: 8353767K->442958K(9175040K)] 13943626K->6041227K(40632320K), 0.6691608 secs] [Times: user=6.68 sys=0.00, real=0.67 secs]
2018-01-24T15:07:44.914+0800: 2932.979: [GC (Allocation Failure) [PSYoungGen: 8307278K->437538K(9175040K)] 13905547K->6050795K(40632320K), 0.7124929 secs] [Times: user=7.07 sys=0.01, real=0.71 secs]
2018-01-24T15:07:51.793+0800: 2939.858: [GC (Allocation Failure) [PSYoungGen: 8301858K->473921K(9175040K)] 13915115K->6098001K(40632320K), 0.7684869 secs] [Times: user=7.64 sys=0.00, real=0.77 secs]
2018-01-24T15:07:58.598+0800: 2946.663: [GC (Allocation Failure) [PSYoungGen: 8338241K->478071K(9175040K)] 13962321K->6110603K(40632320K), 0.7639746 secs] [Times: user=7.63 sys=0.00, real=0.77 secs]
2018-01-24T15:08:04.382+0800: 2952.448: [GC (Allocation Failure) [PSYoungGen: 8342391K->479205K(9175040K)] 13974923K->6122021K(40632320K), 0.8109380 secs] [Times: user=8.09 sys=0.00, real=0.81 secs]
2018-01-24T15:08:11.056+0800: 2959.121: [GC (Allocation Failure) [PSYoungGen: 8343525K->457732K(9175040K)] 13986341K->6113836K(40632320K), 0.7351115 secs] [Times: user=7.33 sys=0.01, real=0.73 secs]
2018-01-24T15:08:17.686+0800: 2965.751: [GC (Allocation Failure) [PSYoungGen: 8322052K->452644K(9175040K)] 13978156K->6117093K(40632320K), 0.6676740 secs] [Times: user=6.65 sys=0.01, real=0.67 secs]
2018-01-24T15:08:24.360+0800: 2972.425: [GC (Allocation Failure) [PSYoungGen: 8316964K->448014K(9175040K)] 13981413K->6125855K(40632320K), 0.6466176 secs] [Times: user=6.45 sys=0.01, real=0.65 secs]
2018-01-24T15:08:30.842+0800: 2978.907: [GC (Allocation Failure) [PSYoungGen: 8312334K->443836K(9175040K)] 13990175K->6132684K(40632320K), 0.6180497 secs] [Times: user=6.12 sys=0.01, real=0.62 secs]
2018-01-24T15:08:37.439+0800: 2985.504: [GC (Allocation Failure) [PSYoungGen: 8308156K->428507K(9175040K)] 13997004K->6126109K(40632320K), 0.5974813 secs] [Times: user=5.94 sys=0.02, real=0.60 secs]
2018-01-24T15:08:44.154+0800: 2992.219: [GC (Allocation Failure) [PSYoungGen: 8292827K->447938K(9175040K)] 13990429K->6152131K(40632320K), 0.6153070 secs] [Times: user=6.12 sys=0.01, real=0.62 secs]
2018-01-24T15:08:50.816+0800: 2998.882: [GC (Allocation Failure) [PSYoungGen: 8312258K->464488K(9175040K)] 14016451K->6183309K(40632320K), 0.6873898 secs] [Times: user=6.85 sys=0.02, real=0.69 secs]
2018-01-24T15:08:57.834+0800: 3005.900: [GC (Allocation Failure) [PSYoungGen: 8328808K->503555K(9175040K)] 14047629K->6232096K(40632320K), 1.0393343 secs] [Times: user=10.52 sys=0.00, real=1.04 secs]
2018-01-24T15:09:06.484+0800: 3014.549: [GC (Allocation Failure) [PSYoungGen: 8367875K->653313K(9175040K)] 14096416K->6390015K(40632320K), 1.8326990 secs] [Times: user=18.32 sys=0.00, real=1.83 secs]
2018-01-24T15:09:14.517+0800: 3022.582: [GC (Allocation Failure) [PSYoungGen: 8517633K->694205K(9175040K)] 14254335K->6439862K(40632320K), 2.0075142 secs] [Times: user=20.07 sys=0.00, real=2.01 secs]
2018-01-24T15:09:22.442+0800: 3030.508: [GC (Allocation Failure) [PSYoungGen: 8558525K->680568K(9175040K)] 14304182K->6432288K(40632320K), 1.9823928 secs] [Times: user=19.76 sys=0.01, real=1.99 secs]
2018-01-24T15:09:30.425+0800: 3038.490: [GC (Allocation Failure) [PSYoungGen: 8544888K->682694K(9175040K)] 14296608K->6442548K(40632320K), 1.9901893 secs] [Times: user=19.89 sys=0.00, real=1.99 secs]
2018-01-24T15:09:38.166+0800: 3046.232: [GC (Allocation Failure) [PSYoungGen: 8547014K->641436K(9175040K)] 14306868K->6413962K(40632320K), 1.7414826 secs] [Times: user=17.53 sys=0.01, real=1.74 secs]
2018-01-24T15:09:45.767+0800: 3053.833: [GC (Allocation Failure) [PSYoungGen: 8505756K->661152K(9175040K)] 14278282K->6453992K(40632320K), 1.8809221 secs] [Times: user=18.77 sys=0.00, real=1.88 secs]
2018-01-24T15:09:53.823+0800: 3061.888: [GC (Allocation Failure) [PSYoungGen: 8525472K->701454K(9175040K)] 14318312K->6504832K(40632320K), 2.0521165 secs] [Times: user=20.50 sys=0.00, real=2.05 secs]
2018-01-24T15:10:01.415+0800: 3069.480: [GC (Allocation Failure) [PSYoungGen: 8565774K->739864K(9175040K)] 14369152K->6553519K(40632320K), 2.4033559 secs] [Times: user=24.03 sys=0.00, real=2.40 secs]
2018-01-24T15:10:09.770+0800: 3077.835: [GC (Allocation Failure) [PSYoungGen: 8604184K->773748K(9175040K)] 14417839K->6599545K(40632320K), 2.5234427 secs] [Times: user=25.22 sys=0.00, real=2.53 secs]
2018-01-24T15:10:18.510+0800: 3086.576: [GC (Allocation Failure) [PSYoungGen: 8638068K->793254K(9175040K)] 14463865K->6629059K(40632320K), 2.6823239 secs] [Times: user=26.90 sys=0.00, real=2.69 secs]
2018-01-24T15:10:28.140+0800: 3096.205: [GC (Allocation Failure) [PSYoungGen: 8657574K->860322K(9175040K)] 14493379K->6706843K(40632320K), 3.0401675 secs] [Times: user=30.12 sys=0.00, real=3.04 secs]
2018-01-24T15:10:37.600+0800: 3105.665: [GC (Allocation Failure) [PSYoungGen: 8724642K->915220K(9175040K)] 14571163K->6770802K(40632320K), 3.2758327 secs] [Times: user=32.77 sys=0.00, real=3.28 secs]
2018-01-24T15:10:46.836+0800: 3114.901: [GC (Allocation Failure) [PSYoungGen: 8779540K->904150K(9175040K)] 14635122K->6769113K(40632320K), 3.1770298 secs] [Times: user=31.78 sys=0.00, real=3.17 secs]
為了解釋上述現(xiàn)象喂急,特對(duì)YGC的停頓進(jìn)行了調(diào)研格嘁,YGC 停頓時(shí)長(zhǎng)滿足如下公式:
Tyoung = Tstack_scan + Tcard_scan + Told_scan+ Tcopy
其中:
Tyoung YGC總停頓;
Tstack_scan 棧上根對(duì)象掃描時(shí)間,可以認(rèn)為是一個(gè)應(yīng)用指定的常數(shù);
Tcard_scan 掃描臟表的時(shí)間, 與old區(qū)大小有關(guān),另外廊移,Young區(qū)對(duì)象的移動(dòng)后糕簿,必須要更新指向他們的引用及寫入臟表,確保下次Minor GC的時(shí)候狡孔,臟表可以指向新生代的引用(全集);
Told_scan 老年代跟對(duì)象掃描時(shí)間懂诗,和臟表的數(shù)據(jù)量有關(guān);
Tcopy 復(fù)制存活對(duì)象的時(shí)間,與堆中存活數(shù)據(jù)量成正比;
因此,在其它變量一定的情況下苗膝,YGC(停頓)時(shí)長(zhǎng)與堆中存活對(duì)象(Tcard_scan)及數(shù)據(jù)量(Tcopy)正相關(guān)殃恒。
影響YGC的主要對(duì)象構(gòu)成
上一小節(jié)提到,在其它變量一定的情況下辱揭,YGC停頓時(shí)長(zhǎng)與堆中存活對(duì)象及數(shù)據(jù)量正相關(guān)芋类,那RM運(yùn)行中那些對(duì)象可能會(huì)對(duì)GC造成較大的影響呢,為了解答這個(gè)問題界阁,特在換機(jī)前申請(qǐng)了在線上執(zhí)行jmap的命令的機(jī)會(huì)(jmap -histo可能會(huì)造成進(jìn)程進(jìn)入暫停狀態(tài),加live選項(xiàng)時(shí)還會(huì)引發(fā)FullGC, 屬于較危險(xiǎn)的操作, 本來相多搞兩次看一下對(duì)象變化呢侯繁,最后……),截取量較大的前20條記錄如下所示:
1: 118404863 15622861752 [C
2: 16268603 2992477680 [B
3: 69146586 2765863440 java.util.ArrayList
4: 33872192 2283061128 [Ljava.lang.Object;
5: 45116321 1443722272 java.lang.String
6: 7607286 1377877792 [I
7: 8787512 1195101632 org.apache.hadoop.yarn.proto.YarnProtos$ResourceRequestProto$Builder
8: 22939461 1101094128 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeFinishedContainersPulledByAMEvent
9: 8728662 907780848 org.apache.hadoop.yarn.proto.YarnProtos$ResourceRequestProto
10: 23550270 753608640 java.util.concurrent.LinkedBlockingQueue$Node
11: 22774040 728769280 java.lang.StringBuilder
12: 10450545 668834880 org.apache.hadoop.yarn.proto.YarnProtos$ResourceProto$Builder
13: 8787512 562400768 org.apache.hadoop.yarn.api.records.impl.pb.ResourceRequestPBImpl
14: 11461810 550166880 java.util.concurrent.ConcurrentHashMap$Node
15: 9652655 540548680 org.apache.hadoop.yarn.proto.YarnProtos$ResourceProto
16: 8780794 491724464 org.apache.hadoop.yarn.proto.YarnProtos$PriorityProto
17: 10946511 437860440 java.util.ArrayList$Itr
18: 10598259 423930360 org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl
19: 8780666 351226640 org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl
20: 3672732 264436704 org.apache.hadoop.yarn.proto.YarnProtos$ContainerIdProto
通過jmap -histo信息可知泡躯,數(shù)量和數(shù)據(jù)量均較大的有價(jià)值的對(duì)象有第3贮竟、8等丽焊,像Object這類JAVA基本對(duì)象價(jià)值不是太大,而ArrayList也為JAVA內(nèi)置對(duì)象代碼咕别,Yarn源碼中難以確定影響較大的位置技健,又RM日志中存在Event-Queue事件堆積現(xiàn)象,因此惰拱,從數(shù)量與數(shù)據(jù)量均較大的Yarn專屬對(duì)象RMNodeFinishedContainersPulledByAMEvent入手進(jìn)行排查雌贱。
RMNodeFinishedContainersPulledByAMEvent對(duì)象來源分析
由于對(duì)其一無所知,就先搜了一下該對(duì)象偿短,發(fā)現(xiàn)社區(qū)有類似的issue存在(issue: RM is flooded with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event queue), 視其描述與我們情況大體現(xiàn)同(RM切換前后描述跟我們也基本一致)欣孤。
因此堅(jiān)定了RMNodeFinishedContainersPulledByAMEvent對(duì)象為YGC問題的主要禍源。
通過IDE搜索了該對(duì)象昔逗,該對(duì)象只在RMAppAttemptImpl中進(jìn)行創(chuàng)建降传,視其創(chuàng)建代碼,也找到了大量ArrayList對(duì)象存在的原因勾怒。
// Ack NM to remove finished containers from context.
private void sendFinishedContainersToNM() {
for (NodeId nodeId : finishedContainersSentToAM.keySet()) {
// Clear and get current values
List<ContainerStatus> currentSentContainers =
finishedContainersSentToAM.put(nodeId,
new ArrayList<ContainerStatus>());
List<ContainerId> containerIdList =
new ArrayList<ContainerId>(currentSentContainers.size());
for (ContainerStatus containerStatus : currentSentContainers) {
containerIdList.add(containerStatus.getContainerId());
}
eventHandler.handle(new RMNodeFinishedContainersPulledByAMEvent(nodeId,
containerIdList));
}
}
該方法的調(diào)用層次結(jié)構(gòu)為(idea快捷鍵:Command + Shift + H)
其中一支調(diào)用層次如下所示:
RMAppAttemptimpl.sendFinishedContainersToNM()
RMAppAttemptimpl.pullJustFinishedContainers()
ApplicationMasterService.allocate()
ApplicationMasterService運(yùn)行于RM服務(wù)端婆排,用于處理AM(ApplicationMaster)的請(qǐng)求,其中allocate方法用于處理AM的資源請(qǐng)求笔链。
@Override
public AllocateResponse allocate(AllocateRequest request)
throws YarnException, IOException {
......
allocateResponse.setAllocatedContainers(allocation.getContainers());
allocateResponse.setCompletedContainersStatuses(appAttempt
.pullJustFinishedContainers());
allocateResponse.setResponseId(lastResponse.getResponseId() + 1);
......
}
通過其實(shí)現(xiàn)可知段只,AM通過 allocate()伸請(qǐng)資源時(shí),都將會(huì)調(diào)用pullJustFinishedContainers方法鉴扫。
而pullJustFinishedContainers用以拉取結(jié)束的Containers赞枕。通過分析其實(shí)現(xiàn)邏輯,發(fā)現(xiàn)一個(gè)有意思的事幔妨,每個(gè)RMAppAttemptImpl維護(hù)了justFinishedContainers和finishedContainersSentToAM兩個(gè)列表鹦赎,列表用于記錄上次請(qǐng)求之后結(jié)束的Container,做最后的處理误堡。然而古话,代碼實(shí)現(xiàn)中卻只有向列表中存儲(chǔ)數(shù)據(jù)的邏輯,而沒有清理數(shù)據(jù)的邏輯锁施,這就導(dǎo)致AM存活期間justFinishedContainers及finishedContainersSentToAM列表中數(shù)據(jù)項(xiàng)越來越多陪踩,而AM申請(qǐng)資源時(shí),每次都將會(huì)調(diào)用 sendFinishedContainersToNM對(duì)finishedContainersSentToAM列表中所有記錄創(chuàng)建RMNodeFinishedContainersPulledByAMEvent對(duì)象悉抵,而早期被處理過的歷史記錄可能在應(yīng)用結(jié)束之前多次創(chuàng)建該對(duì)象(無效對(duì)象)肩狂。因此,會(huì)出現(xiàn)AM存活期間RMNodeFinishedContainersPulledByAMEvent對(duì)象創(chuàng)建的越來越多的現(xiàn)象姥饰。
當(dāng)同時(shí)存在的應(yīng)用數(shù)量非常大時(shí)傻谁,會(huì)由于創(chuàng)建過多的RMNodeFinishedContainersPulledByAMEvent對(duì)象,而導(dǎo)致處理不及時(shí)列粪,導(dǎo)致event-queue堆積审磁,進(jìn)而造成YGC時(shí)因存活對(duì)象數(shù)量過多而在更新臟表及Copy大量數(shù)據(jù)耗時(shí)過長(zhǎng)谈飒,導(dǎo)致YGC整體過長(zhǎng)現(xiàn)象。且大量AM存活時(shí)間越久态蒂,問題越嚴(yán)重杭措。
@Override
public List<ContainerStatus> pullJustFinishedContainers() {
this.writeLock.lock();
try {
List<ContainerStatus> returnList = new ArrayList<ContainerStatus>();
// A new allocate means the AM received the previously sent
// finishedContainers. We can ack this to NM now
sendFinishedContainersToNM();
// Mark every containerStatus as being sent to AM though we may return
// only the ones that belong to the current attempt
boolean keepContainersAcressAttempts = this.submissionContext
.getKeepContainersAcrossApplicationAttempts();
for (NodeId nodeId:justFinishedContainers.keySet()) {
// Clear and get current values
List<ContainerStatus> finishedContainers = justFinishedContainers.put
(nodeId, new ArrayList<ContainerStatus>());
if (keepContainersAcressAttempts) {
returnList.addAll(finishedContainers);
} else {
// Filter out containers from previous attempt
for (ContainerStatus containerStatus: finishedContainers) {
if (containerStatus.getContainerId().getApplicationAttemptId()
.equals(this.getAppAttemptId())) {
returnList.add(containerStatus);
}
}
}
finishedContainersSentToAM.putIfAbsent(nodeId, new ArrayList
<ContainerStatus>());
finishedContainersSentToAM.get(nodeId).addAll(finishedContainers);
}
return returnList;
} finally {
this.writeLock.unlock();
}
}
解決方案
這么大的bug,之前也有人提過該問題钾恢,社區(qū)應(yīng)該有解決方案手素,于是拉了社區(qū)最新代碼,master分支轉(zhuǎn)到了RMAppAttemptImpl下瘩蚪,git對(duì)比了一下版本信息.
fuck!
最近一條更新竟然是2015/7/13日的泉懦, 難道 這幾年都沒人更新這一塊。還是我的master分支發(fā)生了什么募舟?
記得之前 Hadoop 3.0發(fā)布時(shí)祠斧,更新了好多特性闻察,于是轉(zhuǎn)到了branch 3.0 重新進(jìn)行g(shù)it比較拱礁,發(fā)現(xiàn)還是有不少更新的。
Spark出身的我實(shí)在理解不了hadoop的分支是如何搞的辕漂。(issue: RM is flooded with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event queue)解決完了狀態(tài)竟然還是Open呢灶。。钉嘹。
通過比對(duì)發(fā)現(xiàn)官方通過YARN-5262 和 YARN-5483兩個(gè)Patch對(duì)上述問題進(jìn)行了修復(fù)鸯乃。