現(xiàn)象
從監(jiān)控來看娩嚼,堆內(nèi)存是夠用的,但是頻繁觸發(fā)Full GC腐泻,每秒鐘三次,每次耗時(shí)三四秒队询。
結(jié)合Young GC的信息和堆內(nèi)存的使用情況派桩,可以發(fā)現(xiàn)新生代的內(nèi)存夠用,老生代的內(nèi)存不夠用蚌斩,頻繁Full GC铆惑,老生代的內(nèi)存使用率依舊達(dá)到了98%。
2018-11-20T15:02:46.002+0800: 2779214.232: [GC2018-11-20T15:02:46.002+0800: 2779214.232: [ParNew: 1643604K->4513K(1843200K), 0.0129920 secs] 4994360K->3355569K(5257216K), 0.0131040 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
$ sudo jmap -heap 11927
Attaching to process ID 11927, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.65-b04-internal
using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 5593104384 (5334.0MB)
NewSize = 2097152000 (2000.0MB)
MaxNewSize = 2097152000 (2000.0MB)
OldSize = 5439488 (5.1875MB)
NewRatio = 2
SurvivorRatio = 8
PermSize = 536870912 (512.0MB)
MaxPermSize = 536870912 (512.0MB)
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 1887436800 (1800.0MB)
used = 899619040 (857.9435729980469MB)
free = 987817760 (942.0564270019531MB)
47.66353183322482% used
Eden Space:
capacity = 1677721600 (1600.0MB)
used = 895864384 (854.3628540039062MB)
free = 781857216 (745.6371459960938MB)
53.39767837524414% used
From Space:
capacity = 209715200 (200.0MB)
used = 3754656 (3.580718994140625MB)
free = 205960544 (196.41928100585938MB)
1.7903594970703125% used
To Space:
capacity = 209715200 (200.0MB)
used = 0 (0.0MB)
free = 209715200 (200.0MB)
0.0% used
concurrent mark-sweep generation:
capacity = 3495952384 (3334.0MB)
used = 3432530640 (3273.516311645508MB)
free = 63421744 (60.48368835449219MB)
98.18585217892945% used
Perm Generation:
capacity = 536870912 (512.0MB)
used = 52807616 (50.36126708984375MB)
free = 484063296 (461.63873291015625MB)
9.836184978485107% used
28784 interned Strings occupying 3175392 bytes.
排查過程
這種現(xiàn)象表明送膳,服務(wù)存在內(nèi)存泄露员魏,所以需要分析堆的使用。使用gcore產(chǎn)生core文件叠聋,這比直接使用jmap dump快得多撕阎,可以避免對(duì)線上服務(wù)造成影響。core文件的大小跟進(jìn)程使用的堆大小一樣大碌补。
$ sudo gcore 11927
[Thread debugging using libthread_db enabled]
[New Thread 0x487ac940 (LWP 1653)]
[New Thread 0x486ab940 (LWP 1652)]
[New Thread 0x485aa940 (LWP 1651)]
0x00000030f0e07ba5 in pthread_join () from /lib64/libpthread.so.0
Saved corefile core.11927
使用jmap將core文件轉(zhuǎn)換成bin文件虏束。hprof文件跟進(jìn)程實(shí)際使用的堆內(nèi)存一樣大棉饶。
$sudo jmap -dump:format=b,file=heap.hprof.bin /opt/taobao/java/bin/java core.11927
Attaching to core core.11927 from executable /opt/taobao/java/bin/java, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.65-b04-internal
Dumping heap to heap.hprof.bin ...
這個(gè)轉(zhuǎn)換過程非常慢。將5.3GB的core轉(zhuǎn)換成4GB的hprof文件镇匀,花了將近半個(gè)小時(shí)照藻。通過top來看,這個(gè)dump是單線程做的汗侵,一直跑滿了一個(gè)核幸缕。
top - 15:43:38 up 214 days, 1:02, 2 users, load average: 1.69, 1.83, 1.56
Tasks: 142 total, 1 running, 141 sleeping, 0 stopped, 0 zombie
Cpu(s): 51.5%us, 0.8%sy, 0.0%ni, 47.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8194164k total, 8135628k used, 58536k free, 75324k buffers
Swap: 0k total, 0k used, 0k free, 797688k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11927 www 20 0 6513m 5.3g 82m S 105.5 67.4 10760:36 java
3764 root 20 0 2458m 791m 12m S 100.5 9.9 18:53.36 jmap
16376 root 20 0 411m 360m 1468 S 0.0 4.5 19:52.15 syslog-ng
11190 agent 20 0 540m 143m 5832 S 0.0 1.8 65:14.77 java
接著就是如何分析這個(gè)hprof文件的問題。目前主流的分析工具晰韵,有MAT和jhat发乔,下面分別使用二者分析一下。
jhat
jhat看起來比較簡(jiǎn)單宫屠,從分析結(jié)果里面可以看出用戶自定義類中列疗,SlsRecordWriter類的對(duì)象比較多,再做進(jìn)一步的分析就不知如何下手了浪蹂。
MAT
MAT體驗(yàn)比較好抵栈,直接分析出持有內(nèi)存最大的類,然后從dominator_tree里面可以看到具體的應(yīng)用情況坤次,可以發(fā)現(xiàn)SlsRecordWriter對(duì)象很多古劲。需要注意的是,使用MAT分析大的dump文件很吃內(nèi)存缰猴,所以需要調(diào)大堆內(nèi)存产艾。Mac上Eclipse的配置文件路徑是/Applications/Eclipse.app/Contents/Eclipse/eclipse.ini
。
結(jié)論
構(gòu)造一個(gè)demo驗(yàn)證一下滑绒。下面這個(gè)例子闷堡,運(yùn)行一段時(shí)間便會(huì)觸發(fā)Young GC(IDEA VM options設(shè)置上"-verbose:gc -XX:+PrintGCDetails"),但是從來不會(huì)調(diào)用Foo的finalize方法疑故,只有停止運(yùn)行的時(shí)候杠览,才會(huì)進(jìn)入hook執(zhí)行,接著釋放對(duì)象纵势。如果注釋掉addShutdownHook這段代碼踱阿,就符合期望了。所以addShutdownHook會(huì)導(dǎo)致Runtime持有對(duì)象钦铁,需要注意使用場(chǎng)景软舌。
class Foo {
private String bar = new String("fdsaflkjdsalfjldsajflsajfoiwefnaslfjlaskejfowieajfalfjelajfieslkjflksdajlf");
private int num = 0;
public Foo() {
}
public Foo(int num) {
System.out.println(String.format("Foo initialize: %d", num));
this.num = num;
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
public void run() {
clean();
}
}));
}
void clean() {
System.out.println(String.format("Foo clean: %d", num));
}
@Override
protected void finalize() throws Throwable {
super.finalize();
System.out.println(String.format("foo finalize: %d", this.num));
}
}
public class Demo {
public static void main(String[] args) {
int num = 0;
while (true) {
try {
Foo foo = new Foo(num);
num += 1;
Thread.sleep(100);
}catch (Exception e) {
}
}
}
}