OOM Hprof Catch
一. am dumpheap方法
Java進(jìn)程需要抓取hprof文件的時(shí)候可以使用如下命令抓缺啬帷:
adb shell am dumheap [pid] savePath
1.1 看下流程:
1.Am.java接收到相應(yīng)的指令后進(jìn)行分發(fā),會(huì)執(zhí)行相應(yīng)的方法 runDumpHeap 谜洽,然后通過(guò)相應(yīng)的aidl把任務(wù)傳遞到AMS:
frameworks/base/services/core/java/com/android/server/am/ActivityManagerShellCommand.java
int runDumpHeap(PrintWriter pw) throws RemoteException {
if (!mInterface.dumpHeap(process, userId, managed, mallocInfo, runGc, heapFile, fd)) {
err.println("HEAP DUMP FAILED on process " + process);
return -1;
}
return 0;
}
2.AMS 的dumpHeap方法,通過(guò)AMS的一系列權(quán)限檢查后套像,分發(fā)到相應(yīng)的進(jìn)程進(jìn)行真正的hprof dump工作:
frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
public boolean dumpHeap(String process, int userId, boolean managed, boolean mallocInfo,
boolean runGc, String path, ParcelFileDescriptor fd) throws RemoteException {
try {
ProcessRecord proc = findProcessLocked(process, userId, "dumpHeap");
proc.thread.dumpHeap(managed, mallocInfo, runGc, path, fd);
return true;
} catch (RemoteException e) {
}
}
3.目標(biāo)進(jìn)程的ApplicationThread的dumpHeap方法民宿,這里會(huì)把任務(wù)交給ActivityThread 的內(nèi)部類H去處理::
frameworks/base/core/java/android/app/ActivityThread.java
static void handleDumpHeap(DumpHeapData dhd) {
try {
Debug.dumpHprofData(dhd.path, dhd.fd.getFileDescriptor());
} catch (IOException e) {
Slog.w(TAG, "Managed heap dump failed on path " + dhd.path
+ " -- can the process access this path?");
} finally {}
}
4.接下來(lái)調(diào)用到Debug.java中的實(shí)現(xiàn)方法:
frameworks/base/core/java/android/os/Debug.java
/**
* Like dumpHprofData(String), but takes an already-opened
* FileDescriptor to which the trace is written. The file name is also
* supplied simply for logging. Makes a dup of the file descriptor.
*
* Primarily for use by the "am" shell command.
*
* @hide
*/
public static void dumpHprofData(String fileName, FileDescriptor fd)
throws IOException {
VMDebug.dumpHprofData(fileName, fd);
}
5.調(diào)用到虛擬機(jī)中的Dumpheap方法:
art/runtime/native/dalvik_system_VMDebug.cc
/*
* static void dumpHprofData(String fileName, FileDescriptor fd)
*
* Cause "hprof" data to be dumped. We can throw an IOException if an
* error occurs during file handling.
*/
static void VMDebug_dumpHprofData(JNIEnv* env, jclass, jstring javaFilename, jint javaFd) {
// Only one of these may be null.
if (javaFilename == nullptr && javaFd < 0) {
ScopedObjectAccess soa(env);
ThrowNullPointerException("fileName == null && fd == null");
return;
}
std::string filename;
if (javaFilename != nullptr) {
ScopedUtfChars chars(env, javaFilename);
if (env->ExceptionCheck()) {
return;
}
filename = chars.c_str();
} else {
filename = "[fd]";
}
int fd = javaFd;
hprof::DumpHeap(filename.c_str(), fd, false);
}
6.最終調(diào)用hprof.cc中的dump方法:
art/runtime/hprof/hprof.cc
void DumpHeap(const char* filename, int fd, bool direct_to_ddms) {
CHECK(filename != nullptr);
Thread* self = Thread::Current();
// Need to take a heap dump while GC isn't running. See the comment in Heap::VisitObjects().
// Also we need the critical section to avoid visiting the same object twice. See b/34967844
gc::ScopedGCCriticalSection gcs(self,
gc::kGcCauseHprof,
gc::kCollectorTypeHprof);
ScopedSuspendAll ssa(__FUNCTION__, true /* long suspend */);
Hprof hprof(filename, fd, direct_to_ddms);
hprof.Dump();//最終使用DumpToFile輸出到文件
}
因此本質(zhì)上是目標(biāo)進(jìn)程的虛擬機(jī)在dump這個(gè)hprof文件,大致過(guò)程如下:
1.2 抓取Java進(jìn)程OOM時(shí)的Hprof文件
ROM開發(fā)者可以通過(guò)直接調(diào)用系統(tǒng)接口Debug.dumpHprofData()在需要dumphprof的地方進(jìn)行dump操作崇堰。
比如我們可以在RuntimeInit.java中在java進(jìn)程拋異常的時(shí)候抓取本進(jìn)程的hprof文件:
frameworks/base/core/java/com/android/internal/os/RuntimeInit.java
private static class LoggingHandler implements Thread.UncaughtExceptionHandler {
@Override
public void uncaughtException(Thread t, Throwable e) {
// SELF-ADD begin
ApplicationErrorReport.CrashInfo crashInfo = new ApplicationErrorReport.CrashInfo(e);
if("java.lang.OutOfMemoryError".equals(crashInfo.exceptionClassName) {
try {
Debug.dumpHprofData(outPutFileName);
} catch (IOException e) {
e.printStackTrace();
}
}
// END
}
}
當(dāng)然如上只是簡(jiǎn)單的描述升熊,是否該做dump需要對(duì)具體場(chǎng)景俄烁,軟件版本等進(jìn)行判斷后再做dumpheap操作。
因?yàn)閐umpheap的操作是在應(yīng)用進(jìn)程的主線程中進(jìn)行操作级野,本質(zhì)上是在該應(yīng)用進(jìn)程的虛擬機(jī)中進(jìn)行页屠,dumpheap時(shí)應(yīng)用進(jìn)程會(huì)block住,如過(guò)heap文件過(guò)大很容易導(dǎo)致應(yīng)用進(jìn)程操作界面卡住蓖柔,如果此時(shí)再進(jìn)行點(diǎn)擊或滑動(dòng)等操作極易再拋出anr等彈窗辰企,用戶體驗(yàn)極差。
另外system_server進(jìn)程拋出OOM異常時(shí)同樣走到這里况鸣,同樣可以使用該方法抓取system_server進(jìn)程OOM時(shí)的hprof文件牢贸。
此時(shí)運(yùn)行shell命令同樣可以,只是跨了眾多的進(jìn)程镐捧,這個(gè)比較直接潜索。
二. 異常情況的hprof文件抓取
也有異常情況,當(dāng)OOM發(fā)生時(shí)去dumpheap懂酱,此時(shí)內(nèi)存又回到了低值竹习,抓不到oom之前的現(xiàn)場(chǎng)了。
案例:
發(fā)生OOM的是system_server進(jìn)程列牺,因此重啟概率突增整陌,后臺(tái)監(jiān)控看來(lái)該機(jī)型的重啟概率飆升了十幾倍,相當(dāng)嚴(yán)重,可是用戶反饋看來(lái)基本上很少泌辫。更奇怪的是什么都沒動(dòng)随夸,兩個(gè)版本過(guò)去后,該問(wèn)題導(dǎo)致重啟概率竟然又下降了10幾倍震放,日了夠宾毒。
服務(wù)器抓取到的日志,在短時(shí)間內(nèi)飆升 從08-10 00:04:44.167 到 08-10 00:04:57.825 10秒多從幾十兆飆到500多兆達(dá)到峰值澜搅,就說(shuō)你看到這個(gè)日志怕不怕:
2.1 日志:
08-09 23:59:31.395 1193 1202 I art : Background sticky concurrent mark sweep GC freed 7270(1366KB) AllocSpace objects, 84(1680KB) LOS objects, 4% free, 57MB/60MB, paused 5.453ms total 53.036ms
08-09 23:59:54.735 1193 1202 I art : Background sticky concurrent mark sweep GC freed 11130(978KB) AllocSpace objects, 39(1300KB) LOS objects, 4% free, 58MB/60MB, paused 5.354ms total 40.800ms
08-10 00:00:00.226 1193 1202 I art : Background sticky concurrent mark sweep GC freed 13139(917KB) AllocSpace objects, 18(1336KB) LOS objects, 3% free, 58MB/60MB, paused 5.051ms total 50.740ms
08-10 00:00:01.685 1193 1202 I art : Background partial concurrent mark sweep GC freed 54242(8MB) AllocSpace objects, 3(1004KB) LOS objects, 6% free, 61MB/65MB, paused 2.338ms total 196.838ms
08-10 00:00:02.091 1193 1202 I art : Background partial concurrent mark sweep GC freed 22138(8MB) AllocSpace objects, 0(0B) LOS objects, 5% free, 63MB/67MB, paused 1.922ms total 242.373ms
08-10 00:00:03.009 1193 1202 I art : Background sticky concurrent mark sweep GC freed 4959(3MB) AllocSpace objects, 30(480KB) LOS objects, 5% free, 63MB/67MB, paused 5.672ms total 42.871ms
08-10 00:00:06.326 5659 5698 I art : Waiting for a blocking GC DisableMovingGc
08-10 00:00:06.328 5659 5698 I art : Starting a blocking GC DisableMovingGc
08-10 00:00:10.989 3138 3150 I art : Background sticky concurrent mark sweep GC freed 3252(3MB) AllocSpace objects, 30(480KB) LOS objects, 24% free, 12MB/16MB, paused 5.453ms total 31.291ms
08-10 00:00:12.127 1193 1202 I art : Background sticky concurrent mark sweep GC freed 1177(2028KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 66MB/67MB, paused 5.414ms total 31.304ms
08-10 00:00:13.469 1193 1202 I art : Background sticky concurrent mark sweep GC freed 345(753KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 66MB/67MB, paused 7.552ms total 22.277ms
08-10 00:00:13.989 1193 1202 I art : Background partial concurrent mark sweep GC freed 12567(6MB) AllocSpace objects, 59(1048KB) LOS objects, 5% free, 63MB/67MB, paused 2.038ms total 184.519ms
08-10 00:00:21.986 1193 1202 I art : Background sticky concurrent mark sweep GC freed 2471(2MB) AllocSpace objects, 0(0B) LOS objects, 3% free, 64MB/67MB, paused 5.281ms total 31.327ms
08-10 00:00:28.702 1193 1202 I art : Background partial concurrent mark sweep GC freed 4937(4MB) AllocSpace objects, 120(2MB) LOS objects, 5% free, 63MB/67MB, paused 3.661ms total 364.700ms
08-10 00:00:30.352 1193 1202 I art : Background sticky concurrent mark sweep GC freed 2912(4MB) AllocSpace objects, 0(0B) LOS objects, 5% free, 63MB/67MB, paused 5.093ms total 40.789ms
08-10 00:00:33.555 1193 1202 I art : Background sticky concurrent mark sweep GC freed 3201(3MB) AllocSpace objects, 0(0B) LOS objects, 4% free, 64MB/67MB, paused 5.120ms total 41.522ms
08-10 00:00:35.927 1193 1202 I art : Background sticky concurrent mark sweep GC freed 4108(2MB) AllocSpace objects, 6(96KB) LOS objects, 4% free, 64MB/67MB, paused 5.201ms total 39.964ms
08-10 00:00:36.504 1193 1202 I art : Background sticky concurrent mark sweep GC freed 3715(2MB) AllocSpace objects, 0(0B) LOS objects, 4% free, 64MB/67MB, paused 5.005ms total 39.263ms
08-10 00:00:37.439 1193 1202 I art : Background sticky concurrent mark sweep GC freed 2013(2MB) AllocSpace objects, 0(0B) LOS objects, 3% free, 64MB/67MB, paused 5.401ms total 42.217ms
08-10 00:00:37.742 1193 1202 I art : Background sticky concurrent mark sweep GC freed 2327(2MB) AllocSpace objects, 0(0B) LOS objects, 3% free, 64MB/67MB, paused 5.479ms total 40.687ms
08-10 00:00:42.013 2648 3700 I art : Starting a blocking GC Explicit
08-10 00:00:42.049 2648 3700 I art : Explicit concurrent mark sweep GC freed 13530(1605KB) AllocSpace objects, 0(0B) LOS objects, 28% free, 10MB/14MB, paused 393us total 35.471ms
08-10 00:00:42.053 2648 3700 I art : Starting a blocking GC Explicit
08-10 00:00:42.072 2648 3700 I art : Explicit concurrent mark sweep GC freed 462(16KB) AllocSpace objects, 0(0B) LOS objects, 28% free, 10MB/14MB, paused 376us total 18.540ms
08-10 00:00:42.315 1193 1202 I art : Background sticky concurrent mark sweep GC freed 19742(1979KB) AllocSpace objects, 7(156KB) LOS objects, 2% free, 65MB/67MB, paused 5.721ms total 84.216ms
08-10 00:00:43.328 1193 1202 I art : Background partial concurrent mark sweep GC freed 60214(3MB) AllocSpace objects, 64(1208KB) LOS objects, 5% free, 63MB/67MB, paused 1.594ms total 216.641ms
08-10 00:00:52.118 1193 1202 I art : Background partial concurrent mark sweep GC freed 38624(3MB) AllocSpace objects, 21(420KB) LOS objects, 5% free, 63MB/67MB, paused 1.699ms total 212.617ms
08-10 00:00:55.946 2004 2014 I art : Background sticky concurrent mark sweep GC freed 8019(353KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 39MB/39MB, paused 5.213ms total 40.251ms
08-10 00:02:15.572 1193 1202 I art : Background partial concurrent mark sweep GC freed 47034(8MB) AllocSpace objects, 53(1364KB) LOS objects, 6% free, 56MB/60MB, paused 1.790ms total 238.573ms
08-10 00:02:18.896 1193 1202 I art : Background partial concurrent mark sweep GC freed 52779(3MB) AllocSpace objects, 8(160KB) LOS objects, 6% free, 56MB/60MB, paused 3.178ms total 382.199ms
08-10 00:04:00.117 1193 1202 I art : Background sticky concurrent mark sweep GC freed 32710(2MB) AllocSpace objects, 33(1220KB) LOS objects, 6% free, 56MB/60MB, paused 5.415ms total 62.678ms
08-10 00:04:44.167 1193 1202 I art : Background partial concurrent mark sweep GC freed 11012(632KB) AllocSpace objects, 8(332KB) LOS objects, 5% free, 67MB/71MB, paused 1.991ms total 241.279ms
08-10 00:04:44.519 1193 1202 I art : Background partial concurrent mark sweep GC freed 3652(391KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 77MB/81MB, paused 1.854ms total 228.374ms
08-10 00:04:44.876 1193 1202 I art : Background partial concurrent mark sweep GC freed 68(7KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 88MB/92MB, paused 1.921ms total 237.600ms
08-10 00:04:45.158 1193 1202 I art : Background partial concurrent mark sweep GC freed 6(192B) AllocSpace objects, 0(0B) LOS objects, 3% free, 99MB/103MB, paused 2.042ms total 176.341ms
08-10 00:04:45.460 1193 1202 I art : Background partial concurrent mark sweep GC freed 18(8KB) AllocSpace objects, 0(0B) LOS objects, 3% free, 111MB/115MB, paused 2.190ms total 190.518ms
08-10 00:04:45.586 1193 1202 I art : Background sticky concurrent mark sweep GC freed 0(0B) AllocSpace objects, 0(0B) LOS objects, 0% free, 115MB/115MB, paused 5.780ms total 37.561ms
08-10 00:04:45.798 1193 1202 I art : Background partial concurrent mark sweep GC freed 36(1648B) AllocSpace objects, 0(0B) LOS objects, 3% free, 122MB/126MB, paused 2.088ms total 211.466ms
08-10 00:04:45.931 1193 1202 I art : Background sticky concurrent mark sweep GC freed 226(18KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 126MB/126MB, paused 11.727ms total 44.764ms
08-10 00:04:46.123 1193 1202 I art : Background partial concurrent mark sweep GC freed 453(24KB) AllocSpace objects, 0(0B) LOS objects, 2% free, 134MB/138MB, paused 2.029ms total 190.864ms
08-10 00:04:46.395 1193 1202 I art : Background partial concurrent mark sweep GC freed 134(7KB) AllocSpace objects, 0(0B) LOS objects, 2% free, 145MB/149MB, paused 1.995ms total 169.401ms
08-10 00:04:46.770 1193 1202 I art : Background partial concurrent mark sweep GC freed 109(5KB) AllocSpace objects, 0(0B) LOS objects, 2% free, 156MB/160MB, paused 2.027ms total 253.147ms
08-10 00:04:47.147 1193 1202 I art : Background partial concurrent mark sweep GC freed 215(22KB) AllocSpace objects, 0(0B) LOS objects, 2% free, 171MB/175MB, paused 1.984ms total 252.140ms
08-10 00:04:47.557 1193 1202 I art : Background partial concurrent mark sweep GC freed 515(65KB) AllocSpace objects, 0(0B) LOS objects, 2% free, 182MB/186MB, paused 2.035ms total 294.901ms
08-10 00:04:47.897 1193 1202 I art : Background partial concurrent mark sweep GC freed 358(31KB) AllocSpace objects, 1(20KB) LOS objects, 2% free, 191MB/195MB, paused 2.134ms total 224.444ms
08-10 00:04:48.191 1193 1202 I art : Background partial concurrent mark sweep GC freed 621(63KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 203MB/207MB, paused 2.055ms total 187.326ms
08-10 00:04:48.514 1193 1202 I art : Background partial concurrent mark sweep GC freed 110(8KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 215MB/219MB, paused 2.196ms total 215.961ms
08-10 00:04:48.790 1193 1202 I art : Background partial concurrent mark sweep GC freed 6(192B) AllocSpace objects, 0(0B) LOS objects, 1% free, 226MB/230MB, paused 2.149ms total 172.089ms
08-10 00:04:49.090 1193 1202 I art : Background partial concurrent mark sweep GC freed 6(192B) AllocSpace objects, 0(0B) LOS objects, 1% free, 237MB/241MB, paused 2.228ms total 173.851ms
08-10 00:04:49.375 1193 1202 I art : Background partial concurrent mark sweep GC freed 104(5KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 248MB/252MB, paused 2.253ms total 173.835ms
08-10 00:04:49.673 1193 1202 I art : Background partial concurrent mark sweep GC freed 13(448B) AllocSpace objects, 0(0B) LOS objects, 1% free, 260MB/264MB, paused 2.259ms total 193.370ms
08-10 00:04:49.970 1193 1202 I art : Background partial concurrent mark sweep GC freed 1104(87KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 271MB/275MB, paused 2.486ms total 186.926ms
08-10 00:04:50.291 1193 1202 I art : Background partial concurrent mark sweep GC freed 259(39KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 283MB/287MB, paused 2.229ms total 207.021ms
08-10 00:04:50.596 1193 1202 I art : Background partial concurrent mark sweep GC freed 35(2048B) AllocSpace objects, 0(0B) LOS objects, 1% free, 294MB/298MB, paused 3.200ms total 192.377ms
08-10 00:04:50.709 1193 1202 I art : Background sticky concurrent mark sweep GC freed 70(3KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 299MB/299MB, paused 5.388ms total 26.256ms
08-10 00:04:50.898 1193 1202 I art : Background partial concurrent mark sweep GC freed 1338(64KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 306MB/310MB, paused 2.209ms total 188.606ms
08-10 00:04:51.200 1193 1202 I art : Background partial concurrent mark sweep GC freed 331(17KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 317MB/321MB, paused 2.329ms total 184.804ms
08-10 00:04:51.323 1193 1202 I art : Background sticky concurrent mark sweep GC freed 1592(139KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 322MB/322MB, paused 8.261ms total 36.921ms
08-10 00:04:51.520 1193 1202 I art : Background partial concurrent mark sweep GC freed 1960(161KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 329MB/333MB, paused 2.538ms total 196.193ms
08-10 00:04:51.883 1193 1202 I art : Background partial concurrent mark sweep GC freed 464(18KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 344MB/348MB, paused 2.162ms total 254.264ms
08-10 00:04:52.207 1193 1202 I art : Background partial concurrent mark sweep GC freed 341(19KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 354MB/358MB, paused 2.173ms total 217.485ms
08-10 00:04:52.579 1193 1202 I art : Background partial concurrent mark sweep GC freed 406(16KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 367MB/371MB, paused 2.258ms total 262.958ms
08-10 00:04:53.007 1193 1202 I art : Background partial concurrent mark sweep GC freed 102(4KB) AllocSpace objects, 0(0B) LOS objects, 1% free, 378MB/382MB, paused 2.208ms total 310.370ms
08-10 00:04:53.327 1193 1202 I art : Background partial concurrent mark sweep GC freed 38(1424B) AllocSpace objects, 0(0B) LOS objects, 1% free, 388MB/392MB, paused 2.212ms total 210.770ms
08-10 00:04:53.767 1193 1202 I art : Background partial concurrent mark sweep GC freed 6(192B) AllocSpace objects, 0(0B) LOS objects, 0% free, 402MB/406MB, paused 2.285ms total 321.039ms
08-10 00:04:53.897 1193 1202 I art : Background sticky concurrent mark sweep GC freed 0(0B) AllocSpace objects, 0(0B) LOS objects, 0% free, 406MB/406MB, paused 5.966ms total 41.389ms
08-10 00:04:54.129 1193 1202 I art : Background partial concurrent mark sweep GC freed 6(192B) AllocSpace objects, 0(0B) LOS objects, 0% free, 413MB/417MB, paused 2.322ms total 230.110ms
08-10 00:04:54.261 1193 1202 I art : Background sticky concurrent mark sweep GC freed 0(0B) AllocSpace objects, 0(0B) LOS objects, 0% free, 417MB/417MB, paused 7.865ms total 43.687ms
08-10 00:04:54.511 1193 1202 I art : Background partial concurrent mark sweep GC freed 298(21KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 424MB/428MB, paused 2.090ms total 249.911ms
08-10 00:04:54.887 1193 1202 I art : Background partial concurrent mark sweep GC freed 48(2624B) AllocSpace objects, 0(0B) LOS objects, 0% free, 435MB/439MB, paused 2.242ms total 258.446ms
08-10 00:04:55.017 1193 1202 I art : Background sticky concurrent mark sweep GC freed 10(752B) AllocSpace objects, 0(0B) LOS objects, 0% free, 440MB/440MB, paused 5.978ms total 40.969ms
08-10 00:04:55.221 1193 1202 I art : Background partial concurrent mark sweep GC freed 6(192B) AllocSpace objects, 0(0B) LOS objects, 0% free, 447MB/451MB, paused 2.312ms total 203.380ms
08-10 00:04:55.366 1193 1202 I art : Background sticky concurrent mark sweep GC freed 0(0B) AllocSpace objects, 0(0B) LOS objects, 0% free, 451MB/451MB, paused 6.244ms total 37.508ms
08-10 00:04:55.627 1193 1202 I art : Background partial concurrent mark sweep GC freed 163(8KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 459MB/463MB, paused 2.290ms total 259.702ms
08-10 00:04:55.759 1193 1202 I art : Background sticky concurrent mark sweep GC freed 0(0B) AllocSpace objects, 0(0B) LOS objects, 0% free, 464MB/464MB, paused 7.253ms total 44.006ms
08-10 00:04:56.123 1193 1202 I art : Background partial concurrent mark sweep GC freed 19(592B) AllocSpace objects, 0(0B) LOS objects, 0% free, 471MB/475MB, paused 5.091ms total 363.359ms
08-10 00:04:56.320 1193 1202 I art : Background sticky concurrent mark sweep GC freed 0(0B) AllocSpace objects, 0(0B) LOS objects, 0% free, 475MB/475MB, paused 5.201ms total 28.116ms
08-10 00:04:56.620 1193 1202 I art : Background partial concurrent mark sweep GC freed 68(4KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 483MB/487MB, paused 4.594ms total 299.701ms
08-10 00:04:56.776 1193 1202 I art : Background sticky concurrent mark sweep GC freed 0(0B) AllocSpace objects, 0(0B) LOS objects, 0% free, 486MB/487MB, paused 5.572ms total 27.013ms
08-10 00:04:56.984 1193 1202 I art : Background partial concurrent mark sweep GC freed 19(60KB) AllocSpace objects, 1(20KB) LOS objects, 0% free, 494MB/498MB, paused 2.273ms total 207.931ms
08-10 00:04:57.117 1193 1202 I art : Background sticky concurrent mark sweep GC freed 1(16B) AllocSpace objects, 0(0B) LOS objects, 0% free, 498MB/498MB, paused 6.765ms total 44.880ms
08-10 00:04:57.335 1193 1202 I art : Background partial concurrent mark sweep GC freed 20(704B) AllocSpace objects, 0(0B) LOS objects, 0% free, 505MB/509MB, paused 3.179ms total 217.057ms
08-10 00:04:57.467 1193 1202 I art : Background sticky concurrent mark sweep GC freed 0(0B) AllocSpace objects, 0(0B) LOS objects, 0% free, 509MB/509MB, paused 8.363ms total 36.710ms
08-10 00:04:57.628 1193 2617 I art : Waiting for a blocking GC Alloc
08-10 00:04:57.825 1193 1202 I art : Clamp target GC heap from 515MB to 512MB
08-10 00:04:57.825 1193 1202 I art : Background partial concurrent mark sweep GC freed 87(6KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 511MB/512MB, paused 3.637ms total 357.088ms
08-10 00:04:57.826 1193 2617 I art : Starting a blocking GC Alloc
08-10 00:04:57.826 1193 2617 I art : Starting a blocking GC Alloc
08-10 00:04:57.854 1193 2617 I art : Alloc sticky concurrent mark sweep GC freed 0(0B) AllocSpace objects, 0(0B) LOS objects, 0% free, 511MB/512MB, paused 3.953ms total 27.850ms
08-10 00:04:57.854 1193 2617 I art : Starting a blocking GC Alloc
08-10 00:04:57.859 1193 2601 I art : Waiting for a blocking GC Alloc
08-10 00:04:58.278 1193 2617 I art : Clamp target GC heap from 515MB to 512MB
08-10 00:04:58.279 1193 2617 I art : Alloc partial concurrent mark sweep GC freed 88(5KB) AllocSpace objects, 0(0B) LOS objects, 0% free, 511MB/512MB, paused 5.126ms total 424.244ms
08-10 00:04:58.279 1193 2617 I art : Starting a blocking GC Alloc
08-10 00:04:58.279 1193 2601 I art : Waiting for a blocking GC Alloc
08-10 00:04:58.634 1193 2617 I art : Clamp target GC heap from 515MB to 512MB
08-10 00:04:58.634 1193 2617 I art : Alloc concurrent mark sweep GC freed 6(208B) AllocSpace objects, 0(0B) LOS objects, 0% free, 511MB/512MB, paused 3.198ms total 354.919ms
伍俘。。勉躺。
2.2 看日志打印的地方:
art/runtime/gc/heap.cc
void Heap::LogGC(GcCause gc_cause, collector::GarbageCollector* collector) {
const size_t duration = GetCurrentGcIteration()->GetDurationNs();
const std::vector<uint64_t>& pause_times = GetCurrentGcIteration()->GetPauseTimes();
// Print the GC if it is an explicit GC (e.g. Runtime.gc()) or a slow GC
// (mutator time blocked >= long_pause_log_threshold_).
bool log_gc = kLogAllGCs || gc_cause == kGcCauseExplicit;
if (!log_gc && CareAboutPauseTimes()) {
// GC for alloc pauses the allocating thread, so consider it as a pause.
log_gc = duration > long_gc_log_threshold_ ||
(gc_cause == kGcCauseForAlloc && duration > long_pause_log_threshold_);
for (uint64_t pause : pause_times) {
log_gc = log_gc || pause >= long_pause_log_threshold_;
}
}
if (log_gc) {
const size_t percent_free = GetPercentFree();
const size_t current_heap_size = GetBytesAllocated();
const size_t total_memory = GetTotalMemory();
std::ostringstream pause_string;
for (size_t i = 0; i < pause_times.size(); ++i) {
pause_string << PrettyDuration((pause_times[i] / 1000) * 1000)
<< ((i != pause_times.size() - 1) ? "," : "");
}
LOG(INFO) << gc_cause << " " << collector->GetName()
<< " GC freed " << current_gc_iteration_.GetFreedObjects() << "("
<< PrettySize(current_gc_iteration_.GetFreedBytes()) << ") AllocSpace objects, "
<< current_gc_iteration_.GetFreedLargeObjects() << "("
<< PrettySize(current_gc_iteration_.GetFreedLargeObjectBytes()) << ") LOS objects, "
<< percent_free << "% free, " << PrettySize(current_heap_size) << "/"
<< PrettySize(total_memory) << ", " << "paused " << pause_string.str()
<< " total " << PrettyDuration((duration / 1000) * 1000);
VLOG(heap) << Dumpable<TimingLogger>(*current_gc_iteration_.GetTimings());
}
}
從代碼中可以看出各個(gè)值的意思,其中 current_heap_size 是當(dāng)前進(jìn)程分配了的heap大小觅丰,的確是從很短的時(shí)間內(nèi)就內(nèi)存上漲上來(lái)了饵溅。
現(xiàn)在看下為什么突然上漲上來(lái),因此就需要抓取system_server的hprof文件來(lái)確定是哪個(gè)對(duì)象過(guò)大妇萄,哪里導(dǎo)致突然分配這么多內(nèi)存蜕企。
于是用上面提到的在RuntimeInit.java處system_server拋OOM異常時(shí)抓取system_server的hporf文件,可是每次手機(jī)抓取上傳到服務(wù)器hprof文件大小只有幾十兆冠句,恢復(fù)了正常轻掩。
猜測(cè)在system_server OOM crash之前大對(duì)象又被清理掉了,頭疼懦底,抓不到造事者影子唇牧。
2.3 本地復(fù)現(xiàn)抓取Hprof文件方法:
于是準(zhǔn)備本地復(fù)現(xiàn),跑上monkey進(jìn)行復(fù)現(xiàn)聚唐,寫個(gè)腳本丐重,不斷的抓取system_server的hprof文件
編個(gè)bin文件監(jiān)控system_server的heap大小,大體上的想法就是想在運(yùn)行的手機(jī)中抓取到system_server的大的hprof文件杆查,用于查找大對(duì)象到底是誰(shuí)扮惦,當(dāng)然也可以不用這么麻煩,寫個(gè)python腳本手機(jī)連著電腦不斷監(jiān)控抓取就可以亲桦,只是后來(lái)發(fā)現(xiàn)Monkey復(fù)現(xiàn)不了崖蜜,于是準(zhǔn)備使用該手機(jī)作為日常機(jī)器進(jìn)行操作使用進(jìn)行復(fù)現(xiàn),需要能在手機(jī)自身里運(yùn)行監(jiān)控抓取客峭,使用了如下方案豫领,源碼如下:
#include <stdio.h>
#include <unistd.h>
#include <sys/time.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <time.h>
#include <assert.h>
#include <dirent.h>
#define BUF_SIZE 1024
void modifyMyAdj()
{
char text[100];
sprintf(text, "/proc/%d/oom_score_adj", getpid());
int fd = open(text, O_WRONLY);
if (fd >= 0) {
sprintf(text, "%d", -1000); //讓自己不被殺死
write(fd, text, strlen(text));
printf("modify adj sucess!!\n");
} else {
printf("modify adj fail!!\n");
}
close(fd);
}
char* getPidByName(char* task_name, char * pid) {
DIR *dir;
struct dirent *ptr;
FILE *fp;
char filepath[50];//大小隨意,能裝下cmdline文件的路徑即可
char cur_task_name[50];//大小隨意桃笙,能裝下要識(shí)別的命令行文本即可
char buf[BUF_SIZE];
dir = opendir("/proc"); //打開路徑
if (NULL != dir) {
while ((ptr = readdir(dir)) != NULL) {//循環(huán)讀取路徑下的每一個(gè)文件/文件夾
//如果讀取到的是"."或者".."則跳過(guò)氏堤,讀取到的不是文件夾名字也跳過(guò)
if ((strcmp(ptr->d_name, ".") == 0) || (strcmp(ptr->d_name, "..") == 0))
continue;
if (DT_DIR != ptr->d_type)
continue;
sprintf(filepath, "/proc/%s/status", ptr->d_name);//生成要讀取的文件的路徑
fp = fopen(filepath, "r");//打開文件
if (NULL != fp) {
if( fgets(buf, BUF_SIZE-1, fp)== NULL ){
fclose(fp);
continue;
}
sscanf(buf, "%*s %s", cur_task_name);
//如果文件內(nèi)容滿足要求則打印路徑的名字(即進(jìn)程的PID)
if (!strcmp(task_name, cur_task_name)){
printf("PID: %s\n", ptr->d_name);
strcpy(pid, ptr->d_name);
}
fclose(fp);
}
}
closedir(dir);//關(guān)閉路徑
}
return ptr->d_name;
}
void getNameByPid(pid_t pid, char *task_name) {
char proc_pid_path[BUF_SIZE];
char buf[BUF_SIZE];
sprintf(proc_pid_path, "/proc/%d/status", pid);
FILE* fp = fopen(proc_pid_path, "r");
if(NULL != fp){
if( fgets(buf, BUF_SIZE-1, fp)== NULL ){
fclose(fp);
}
fclose(fp);
sscanf(buf, "%*s %s", task_name);
}
}
unsigned getRssByPid(pid_t pid) {
char proc_pid_stat[BUF_SIZE];
char buf[BUF_SIZE];
sprintf(proc_pid_stat, "/proc/%d/stat", pid);
FILE* fp = fopen(proc_pid_stat, "r");
char *temp;
int i = 0;
if(NULL != fp){
if( fgets(buf, BUF_SIZE-1, fp)== NULL ){
fclose(fp);
}
fclose(fp);
temp = strtok(buf," ");
while(temp)
{
i++;
printf("%d :%s \n",i,temp);
if(i == 24) {
return strtoul(temp, 0, 10)*4;
}
temp = strtok(NULL," ");
}
}
return 0;
}
#define RSS_STEP 50*1024 //50M
#define HPROF_FILE_DIR "/data/local/tmp/"
int main(int argc, char *argv[]) {
modifyMyAdj();
unsigned rss_threshold = 200 * 1024;
char task_name[50];
char *pid = (char*)calloc(10,sizeof(char));
strcpy(task_name, "system_server");
getPidByName(task_name, pid);
char target_file[128];
char dumpheap_cmd[256];
system("mkdir -p " HPROF_FILE_DIR);
printf("OOMMonitor start monitor system_server rss!\n");
while (true ){
unsigned rss = getRssByPid(atoi(pid));
printf("OOMMonitor unsigned rss = %u !\n",rss);
if (rss==0)
{
printf("OOMMonitor Error read system_server pid , system_server may reboot!\n");
getPidByName(task_name, pid);
continue;
}
if (rss>rss_threshold){
sprintf(target_file, "%sss_oom_%u.hprof",HPROF_FILE_DIR,rss);
sprintf(dumpheap_cmd, "am dumpheap %s %s",pid,target_file);
system(dumpheap_cmd);
printf("OOMMonitor dumpheap finished: %s\n",target_file);
rss_threshold += RSS_STEP;
}
if (rss>500 * 1024)
{
printf("%s\n","OOMMonitor system_server oom!");
sprintf(target_file, "%sss_oom_%u.hprof",HPROF_FILE_DIR,rss);
sprintf(dumpheap_cmd, "am dumpheap %s %s",pid,target_file);
system(dumpheap_cmd);
system("bugreport > " HPROF_FILE_DIR "bug.txt");
system("am broadcast -a com.XXX.bugreport.service.action.CONFIRM_DIALOG");
break;
}
sleep(2);
}
return 0;
}
mk文件,在源碼下編譯出該bin文件,OOMMonitor:
LOCAL_PATH:= $(call my-dir)
include $(CLEAR_VARS)
LOCAL_SRC_FILES:= \
monitor.cpp
LOCAL_MODULE:= OOMMonitor
include $(BUILD_EXECUTABLE)
push到system/bin/下鼠锈,給予OOMMonitor運(yùn)行的權(quán)限闪檬,并跑起來(lái),也可以更改rc文件讓其開機(jī)自動(dòng)就跑起來(lái)购笆。
adb shell OOMMonitor &
跑起monkey粗悯,一般如果大概率OOM的案例跑跑應(yīng)該都可以很快復(fù)現(xiàn),可這次怎么跑都不復(fù)現(xiàn)同欠,本地手機(jī)又少样傍,只有2,3臺(tái),日樂(lè)購(gòu)铺遂。
因此最終還是思考需要從用戶機(jī)器上抓取hprof衫哥。
2.3 在線抓取用戶機(jī)器上的hprof文件:
既然最終OOM 拋異常Crash之前大對(duì)象就不見了,那就在system_server heap增大的過(guò)程中對(duì)其heap進(jìn)行dump襟锐,還能跑的了撤逢?我靠
于是準(zhǔn)備如下這么干:
通過(guò)跑shell腳本來(lái)抓取hprof文件
將system_server的pid設(shè)置到屬性里腳本直接讀取屬性值得到system_server的pid
#!/system/bin/sh
ss_pid=`getprop sys.XXX.dumpheap`
mkdir -p /data/local/hprof/exceed/
am dumpheap $ss_pid /data/local/hprof/exceed/system.hprof
echo "oom sh dumpheap done!"
chown -R system:system /data/local/hprof/
更該rc文件,當(dāng)屬性一旦被設(shè)置后就跑起該shell腳本:
on property:sys.XXX.dumpheap=*
exec u:r:dumpstate:s0 -- /system/bin/sh /init.XXX.dumpheap.sh
設(shè)置屬性的地方就放在上面gclog打印的地方粮坞,當(dāng) current_heap_size 大于450M的時(shí)候就去抓取設(shè)置熟悉蚊荣,執(zhí)行shell腳本,抓取hprof文件:
diff --git a/runtime/gc/heap.cc b/runtime/gc/heap.cc
index 82c160f..053f459 100644
--- a/runtime/gc/heap.cc
+++ b/runtime/gc/heap.cc
@@ -23,6 +23,10 @@
#include <memory>
#include <unwind.h> // For GC verification.
#include <vector>
+// XXX Add : Start
+#include <sys/types.h>
+#include <unistd.h>
+// End
#include "art_field-inl.h"
#include "base/allocator.h"
@@ -74,6 +78,12 @@
#include "handle_scope-inl.h"
#include "thread_list.h"
#include "well_known_classes.h"
+
+#ifdef HAVE_ANDROID_OS
+extern "C" {
+ int property_set(const char* name, const char* value);
+}
+#endif
namespace art {
@@ -215,7 +225,8 @@
kGcCountRateMaxBucketCount),
backtrace_lock_(nullptr),
seen_backtrace_count_(0u),
- unique_backtrace_count_(0u) {
+ unique_backtrace_count_(0u),
+ need_dump_heap_(0u) {// XXX Add
if (VLOG_IS_ON(heap) || VLOG_IS_ON(startup)) {
LOG(INFO) << "Heap() entering";
}
@@ -2522,6 +2533,9 @@
<< percent_free << "% free, " << PrettySize(current_heap_size) << "/"
<< PrettySize(total_memory) << ", " << "paused " << pause_string.str()
<< " total " << PrettyDuration((duration / 1000) * 1000);
+ // XXX Add : Start
+ CheckConcurrentHeapSize(current_heap_size);
+ // End
VLOG(heap) << Dumpable<TimingLogger>(*current_gc_iteration_.GetTimings());
}
}
@@ -3776,6 +3790,30 @@
}
}
}
-
+// XXX Add : Start
+#ifdef HAVE_ANDROID_OS
+void Heap::CheckConcurrentHeapSize(size_t current_heap_size) {
+ if(current_heap_size < 450 *1024 *1024) {
+ return;
+ }
+ if(!need_dump_heap_.CompareExchangeStrongSequentiallyConsistent(0,1)) {
+ return;
+ }
+ std::string current_cmd_line;
+ std::string propertyName = "sys.XXX.dumpheap";
+ if(ReadFileToString("/proc/self/cmdline", ¤t_cmd_line)) {
+ int ret = strcmp(current_cmd_line.c_str(), "system_server");
+ if(ret == 0 ) {
+ ret = property_set(propertyName.c_str(),std::to_string(getpid()).c_str());
+ LOG(WARNING) << "system_server oomevent ,set prop ret: " << ret;
+ }
+ }
+}
+#else
+void Heap::CheckConcurrentHeapSize(__attribute__((unused)) size_t new_num_bytes_allocated) {
+ return;
+}
+#endif
+// End
} // namespace gc
} // namespace art
diff --git a/runtime/gc/heap.h b/runtime/gc/heap.h
index 026795c..75c97c7 100644
--- a/runtime/gc/heap.h
+++ b/runtime/gc/heap.h
@@ -891,6 +891,9 @@
// GC stress mode attempts to do one GC per unique backtrace.
void CheckGcStressMode(Thread* self, mirror::Object** obj)
SHARED_LOCKS_REQUIRED(Locks::mutator_lock_);
+ // XXX Add : Start
+ void CheckConcurrentHeapSize(size_t current_heapsize);
+ // End
// All-known continuous spaces, where objects lie within fixed bounds.
std::vector<space::ContinuousSpace*> continuous_spaces_;
@@ -1205,7 +1208,9 @@
Atomic<uint64_t> unique_backtrace_count_;
// Stack trace hashes that we already saw,
std::unordered_set<uint64_t> seen_backtraces_ GUARDED_BY(backtrace_lock_);
-
+ // XXX Add : Start
+ AtomicInteger need_dump_heap_;
+ // End
friend class CollectorTransitionTask;
friend class collector::GarbageCollector;
friend class collector::MarkCompact;
之后再通過(guò)我廠的log收集系統(tǒng)得到log和hprof文件莫杈,hprof需要壓縮后上傳互例,當(dāng)然這之后都是建立在已有的log收集系統(tǒng)之上,hprof有將近500M筝闹,壓縮完后大概只有30-50M媳叨。
應(yīng)該可以直接在art打日志的地方調(diào)用hprof:dump()方法實(shí)現(xiàn),就不用通過(guò)上面這種方式丁存,跨了好幾個(gè)進(jìn)程肩杈,只是我在試直接dump的時(shí)候有發(fā)生系統(tǒng)卡住的情況,畢竟是虛擬機(jī)里改動(dòng)解寝,影響會(huì)很大扩然,決定采用上面方式,雖然繞了點(diǎn)聋伦,但是比較完整成熟夫偶。
因?yàn)樽ト〈蟮膆prof會(huì)影響用戶體驗(yàn),因此我們選擇攜帶如上change打包觉增,新ota升級(jí)的用戶手機(jī)里發(fā)生問(wèn)題時(shí)即可以抓取hprof文件并上傳上來(lái)兵拢,再下一個(gè)版本就不攜帶該change了,只影響該一個(gè)版本逾礁,盡量將影響控制到最小说铃。
呵呵访惜,終于抓到了這次。腻扇。债热。
2.4 OOM問(wèn)題分析
看下抓到的hprof文件終于可以定位問(wèn)題了:
hprof中 long型數(shù)據(jù)Processstats對(duì)象占了400多兆,大頭呀幼苛,數(shù)量眾多窒篱,都是32768字節(jié),這個(gè)東西是怎么來(lái)的舶沿?
在異常的logcat中我們看到一些相關(guān)的異常日志:
09-08 22:14:53.058 1169 2675 E ProcessStatsService: java.lang.OutOfMemoryError: Failed to allocate a 32784 byte allocation with 16256 free bytes and 11KB until OOM
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at com.android.internal.app.ProcessStats.readFromParcel(ProcessStats.java:1751)
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at com.android.internal.app.ProcessStats.read(ProcessStats.java:1696)
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at com.android.server.am.ProcessStatsService.readLocked(ProcessStatsService.java:313)
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at com.android.server.am.ProcessStatsService.getStatsOverTime(ProcessStatsService.java:530)
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at com.android.server.am.ProcessStatsService.dumpAggregatedStats(ProcessStatsService.java:583)
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at com.android.server.am.ProcessStatsService.dumpInner(ProcessStatsService.java:909)
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at com.android.server.am.ProcessStatsService.dump(ProcessStatsService.java:651)
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at android.os.Binder.dump(Binder.java:331)
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at android.os.Binder.onTransact(Binder.java:297)
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at com.android.internal.app.IProcessStats$Stub.onTransact(IProcessStats.java:81)
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at com.android.server.am.ProcessStatsService.onTransact(ProcessStatsService.java:120)
09-08 22:14:53.058 1169 2675 E ProcessStatsService: at android.os.Binder.execTransact(Binder.java:460)
該調(diào)用棧一般是使用dumpsys procstats服務(wù)調(diào)用墙杯,查看整個(gè)代碼發(fā)現(xiàn)只有我廠自己的性能監(jiān)控軟件會(huì)去定時(shí)的調(diào)用dump procstats服務(wù)來(lái)獲取進(jìn)程的運(yùn)行數(shù)據(jù)。
抓上來(lái)的日志看到調(diào)用棧的線程 1169 2675的binder trasaction信息括荡,該調(diào)用是從進(jìn)程31045調(diào)用過(guò)來(lái):
proc 31045
thread 31045: l 10
outgoing transaction 9933905: 0000000000000000 from 31045:31045 to 1169:2675 code 5f444d50 flags 10 pri 0 r1 node 4001 size 72:8 data 0000000000000000
而ps信息中可以看到最終是由我廠的com.XXX.XXXdaemon性能監(jiān)控程序調(diào)用的高镐,該程序一般夜里收集性能數(shù)據(jù),用戶都不知道畸冲,當(dāng)然用戶反饋少避消。
------ ps(-t,-p,-P) ------
USER PID PPID VSIZE RSS PRIO NICE RTPRI SCHED PCY WCHAN PC NAME
system 31045 2610 16616 1736 20 0 0 0 fg binder_thr 0000000000 S dumpsys //父進(jìn)程是2610
system 2610 533 1685660 50384 20 0 0 0 fg SyS_epoll_ 0000000000 S com.XXX.XXXdaemon //2610是我廠性能監(jiān)控程序
但是看了調(diào)用的性能監(jiān)控程序,調(diào)用的頻率并不高召夹,一天調(diào)用一次,看了下dumpsys procstats 的原理恕沫,實(shí)際上是應(yīng)用進(jìn)程的操作的生命周期里监憎,AMS會(huì)不斷的向/data/system/procstats/下 更新進(jìn)程的運(yùn)行數(shù)據(jù):
當(dāng)然dumpsys procstats時(shí)即是從該文件夾下讀取數(shù)據(jù)的過(guò)程。
查看代碼婶溯,該代碼是m機(jī)型上才有鲸阔,m之上的已經(jīng)沒有了:
frameworks/base/core/java/com/android/internal/app/ProcessStats.java
final int NLONGS = in.readInt();
final int NEXTLONG = in.readInt();
mLongs.clear();
for (int i=0; i<(NLONGS-1); i++) {//日志中未打印該值,無(wú)法知道到底多大迄委,但應(yīng)該是很大導(dǎo)致分配new longs 過(guò)多而oom褐筛,
for (int i=0; i<(NLONGS-1); i++) {
while (i >= mLongs.size()) {
mLongs.add(new long[LONGS_SIZE]);
}
readCompactedLongArray(in, version, mLongs.get(i), LONGS_SIZE);
}
long[] longs = new long[LONGS_SIZE];
mNextLong = NEXTLONG;
readCompactedLongArray(in, version, longs, NEXTLONG);//日志中有報(bào)NEXTLONG太大而拋異常 dump procstats 失敗,但dump 方法 有try catch這種異常而未影響system_server進(jìn)程
最終因?yàn)閚ew long[LONGS_SIZE])分配過(guò)多而oom
這里的LONGS_SIZE值是4096叙身,long型占用8字節(jié)渔扎,這里mLongs中的元素是LONGS_SIZE的long型數(shù)組,則new一個(gè)的size正好就是32768信轿。
猜測(cè)/data/system/procstats/下的文件有異常晃痴,再看hprof文件,的確mLongs的size值很大:
hprof1:
hprof2:
NLONGS是從ProcessStats對(duì)象中的成員變量mLongs的size獲取到的财忽,從hprof文件看來(lái)這個(gè)值應(yīng)該相當(dāng)?shù)拇筇群耍@么大的值,上面代碼中一個(gè)for循環(huán)分配即彪,很快heap就飆上去了紧唱,這里就是根本原因。
/data/system/procstats下的文件會(huì)不斷的更新掉,如果異常文件被更新掉了也就不會(huì)再出現(xiàn)OOM了漏益,怪不得又OOM的概率兩個(gè)版本后又下降了蛹锰。
尋找代碼為看出哪里有導(dǎo)致procstats寫入異常,而且當(dāng)時(shí)OOM的概率已經(jīng)下降遭庶,異常文件已經(jīng)被更新掉宁仔,因此采取workarround方法,避免之后再遇到異常文件.
workaround方式峦睡,對(duì)NSLONG進(jìn)行判斷翎苫,若太大則不再去分配,放棄本次dumpsys procstats操作即可榨了。
if (NLONGS > 4000) {
Slog.i(TAG, "Record size maybe too large , disgard to avoid oom ! NLONGS: " + NLONGS);
mReadError = "bad NLONGS count: " + NLONGS;
return;
}
三. OOM異常分析總結(jié)
分析OOM最主要需要看是哪里有內(nèi)存泄漏煎谍,通常OOM的時(shí)候其他進(jìn)程或線程有分配內(nèi)存的時(shí)候就很容易躺槍,造成crash的調(diào)用棧并不是根本原因龙屉,只是躺槍呐粘,分析OOM最主要的利器就是hprof文件了,能有效抓取到現(xiàn)場(chǎng)的hprof基本上就已經(jīng)成功一大半了转捕。
- 本地容易復(fù)現(xiàn):
采用腳本監(jiān)控進(jìn)程內(nèi)存的情況作岖,達(dá)到閾值及手動(dòng)抓取,或腳本自動(dòng)抓取五芝,python腳本shell腳本均可痘儡。
應(yīng)用的話可以使用leakcanery進(jìn)行復(fù)現(xiàn),很簡(jiǎn)單且明了枢步。也可以復(fù)寫unCatchExceptionHandler,拋異常之前抓取下hprof文件沉删。 - 本地不易復(fù)現(xiàn):
想辦法在線抓取,即在該抓hprof的地方通過(guò)調(diào)用Debug接口的方式或者跑屬性服務(wù)腳本的方式進(jìn)行監(jiān)控及抓取醉途,上傳上來(lái)分析矾瑰。