性能優(yōu)化(9)-Android ANR在線監(jiān)控原理

主目錄見：Android高級(jí)進(jìn)階知識(shí)（這是總目錄索引）
[This tutorial was written by 無心追求]

Android中的Watchdog

在Android中，Watchdog是用來監(jiān)測(cè)關(guān)鍵服務(wù)是否發(fā)生了死鎖簇爆，如果發(fā)生了死鎖就kill進(jìn)程癞松，重啟SystemServer
Android的Watchdog是在SystemServer中進(jìn)行初始化的，所以Watchdog是運(yùn)行在SystemServer進(jìn)程中
Watchdog是運(yùn)行一個(gè)單獨(dú)的線程中的入蛆，每次wait 30s之后就會(huì)發(fā)起一個(gè)監(jiān)測(cè)行為响蓉，如果系統(tǒng)休眠了，那Watchdog的wait行為也會(huì)休眠哨毁，此時(shí)需要等待系統(tǒng)喚醒之后才會(huì)重新恢復(fù)監(jiān)測(cè)
想要被Watchdog監(jiān)測(cè)的對(duì)象需要實(shí)現(xiàn)Watchdog.Monitor接口的monitor()方法枫甲，然后調(diào)用addMonitor()方法
其實(shí)framework里面的Watchdog實(shí)現(xiàn)除了能監(jiān)控線程死鎖以外還能夠監(jiān)控線程卡頓，addMonitor()方法是監(jiān)控線程死鎖的扼褪，而addThread()方法是監(jiān)控線程卡頓的

Watchdog線程死鎖監(jiān)控實(shí)現(xiàn)

Watchdog監(jiān)控線程死鎖需要被監(jiān)控的對(duì)象實(shí)現(xiàn)Watchdog.Monitor接口的monitor()方法想幻，然后再調(diào)用addMonitor()方法，例如ActivityManagerService：

public final class ActivityManagerService extends ActivityManagerNative
        implements Watchdog.Monitor, BatteryStatsImpl.BatteryCallback {

  public ActivityManagerService(Context systemContext) {
    Watchdog.getInstance().addMonitor(this);
  }

  public void monitor() {
        synchronized (this) { }
    }
// ...
}

如上是從ActivityManagerService提取出來關(guān)于Watchdog監(jiān)控ActivityManagerService這個(gè)對(duì)象鎖的相關(guān)代碼话浇，而監(jiān)控的實(shí)現(xiàn)如下脏毯，Watchdog是一個(gè)線程對(duì)象，start這個(gè)線程之后就會(huì)每次wait 30s后檢查一次幔崖，如此不斷的循環(huán)檢查：

public void addMonitor(Monitor monitor) {
        synchronized (this) {
            if (isAlive()) {
                throw new RuntimeException("Monitors can't be added once the Watchdog is running");
            }
            mMonitorChecker.addMonitor(monitor);
        }
    }

@Override
    public void run() {
        boolean waitedHalf = false;
        while (true) {
            final ArrayList<HandlerChecker> blockedCheckers;
            final String subject;
            final boolean allowRestart;
            int debuggerWasConnected = 0;
            synchronized (this) {
                long timeout = CHECK_INTERVAL;
                // Make sure we (re)spin the checkers that have become idle within
                // this wait-and-check interval
                for (int i=0; i<mHandlerCheckers.size(); i++) {
                    HandlerChecker hc = mHandlerCheckers.get(i);
                    hc.scheduleCheckLocked();
                }

                if (debuggerWasConnected > 0) {
                    debuggerWasConnected--;
                }

                // NOTE: We use uptimeMillis() here because we do not want to increment the time we
                // wait while asleep. If the device is asleep then the thing that we are waiting
                // to timeout on is asleep as well and won't have a chance to run, causing a false
                // positive on when to kill things.
                long start = SystemClock.uptimeMillis();
                while (timeout > 0) {
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    try {
                        wait(timeout);
                    } catch (InterruptedException e) {
                        Log.wtf(TAG, e);
                    }
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
                }

                final int waitState = evaluateCheckerCompletionLocked();
                if (waitState == COMPLETED) {
                    // The monitors have returned; reset
                    waitedHalf = false;
                    continue;
                } else if (waitState == WAITING) {
                    // still waiting but within their configured intervals; back off and recheck
                    continue;
                } else if (waitState == WAITED_HALF) {
                    if (!waitedHalf) {
                        // We've waited half the deadlock-detection interval.  Pull a stack
                        // trace and wait another half.
                        ArrayList<Integer> pids = new ArrayList<Integer>();
                        pids.add(Process.myPid());
                        ActivityManagerService.dumpStackTraces(true, pids, null, null,
                                NATIVE_STACKS_OF_INTEREST);
                        waitedHalf = true;
                    }
                    continue;
                }

                // something is overdue!
                blockedCheckers = getBlockedCheckersLocked();
                subject = describeCheckersLocked(blockedCheckers);
                allowRestart = mAllowRestart;
            }

            // If we got here, that means that the system is most likely hung.
            // First collect stack traces from all threads of the system process.
            // Then kill this process so that the system will restart.
            EventLog.writeEvent(EventLogTags.WATCHDOG, subject);

            ArrayList<Integer> pids = new ArrayList<Integer>();
            pids.add(Process.myPid());
            if (mPhonePid > 0) pids.add(mPhonePid);
            // Pass !waitedHalf so that just in case we somehow wind up here without having
            // dumped the halfway stacks, we properly re-initialize the trace file.
            final File stack = ActivityManagerService.dumpStackTraces(
                    !waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);

            // Give some extra time to make sure the stack traces get written.
            // The system's been hanging for a minute, another second or two won't hurt much.
            SystemClock.sleep(2000);

            // Pull our own kernel thread stacks as well if we're configured for that
            if (RECORD_KERNEL_THREADS) {
                dumpKernelStackTraces();
            }

            String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
            String traceFileNameAmendment = "_SystemServer_WDT" + mTraceDateFormat.format(new Date());

            if (tracesPath != null && tracesPath.length() != 0) {
                File traceRenameFile = new File(tracesPath);
                String newTracesPath;
                int lpos = tracesPath.lastIndexOf (".");
                if (-1 != lpos)
                    newTracesPath = tracesPath.substring (0, lpos) + traceFileNameAmendment + tracesPath.substring (lpos);
                else
                    newTracesPath = tracesPath + traceFileNameAmendment;
                traceRenameFile.renameTo(new File(newTracesPath));
                tracesPath = newTracesPath;
            }

            final File newFd = new File(tracesPath);

            // Try to add the error to the dropbox, but assuming that the ActivityManager
            // itself may be deadlocked.  (which has happened, causing this statement to
            // deadlock and the watchdog as a whole to be ineffective)
            Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
                    public void run() {
                        mActivity.addErrorToDropBox(
                                "watchdog", null, "system_server", null, null,
                                subject, null, newFd, null);
                    }
                };
            dropboxThread.start();
            try {
                dropboxThread.join(2000);  // wait up to 2 seconds for it to return.
            } catch (InterruptedException ignored) {}


            // At times, when user space watchdog traces don't give an indication on
            // which component held a lock, because of which other threads are blocked,
            // (thereby causing Watchdog), crash the device to analyze RAM dumps
            boolean crashOnWatchdog = SystemProperties
                                        .getBoolean("persist.sys.crashOnWatchdog", false);
            if (crashOnWatchdog) {
                // Trigger the kernel to dump all blocked threads, and backtraces
                // on all CPUs to the kernel log
                Slog.e(TAG, "Triggering SysRq for system_server watchdog");
                doSysRq('w');
                doSysRq('l');

                // wait until the above blocked threads be dumped into kernel log
                SystemClock.sleep(3000);

                // now try to crash the target
                doSysRq('c');
            }

            IActivityController controller;
            synchronized (this) {
                controller = mController;
            }
            if (controller != null) {
                Slog.i(TAG, "Reporting stuck state to activity controller");
                try {
                    Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
                    // 1 = keep waiting, -1 = kill system
                    int res = controller.systemNotResponding(subject);
                    if (res >= 0) {
                        Slog.i(TAG, "Activity controller requested to coninue to wait");
                        waitedHalf = false;
                        continue;
                    }
                } catch (RemoteException e) {
                }
            }

            // Only kill the process if the debugger is not attached.
            if (Debug.isDebuggerConnected()) {
                debuggerWasConnected = 2;
            }
            if (debuggerWasConnected >= 2) {
                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
            } else if (debuggerWasConnected > 0) {
                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
            } else if (!allowRestart) {
                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
            } else {
                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
                for (int i=0; i<blockedCheckers.size(); i++) {
                    Slog.w(TAG, blockedCheckers.get(i).getName() + " stack trace:");
                    StackTraceElement[] stackTrace
                            = blockedCheckers.get(i).getThread().getStackTrace();
                    for (StackTraceElement element: stackTrace) {
                        Slog.w(TAG, "    at " + element);
                    }
                }
                Slog.w(TAG, "*** GOODBYE!");
                Process.killProcess(Process.myPid());
                System.exit(10);
            }

            waitedHalf = false;
        }
    }

首先食店，ActivityManagerService調(diào)用addMonitor()方法把自己添加到了Watchdog的mMonitorChecker對(duì)象中，這是Watchdog的一個(gè)全局變量赏寇，這個(gè)全部變量在Watchdog的構(gòu)造方法中已經(jīng)事先初始化好并添加到mHandlerCheckers：ArrayList<HandlerChecker>這個(gè)監(jiān)控對(duì)象列表中了吉嫩，mMonitorChecker是一個(gè)HandlerChecker類的實(shí)例對(duì)象，代碼如下：

public final class HandlerChecker implements Runnable {
        private final Handler mHandler;
        private final String mName;
        private final long mWaitMax;
        private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();
        private boolean mCompleted;
        private Monitor mCurrentMonitor;
        private long mStartTime;

        HandlerChecker(Handler handler, String name, long waitMaxMillis) {
            mHandler = handler;
            mName = name;
            mWaitMax = waitMaxMillis;
            mCompleted = true;
        }

        public void addMonitor(Monitor monitor) {
            mMonitors.add(monitor);
        }

        public void scheduleCheckLocked() {
            if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
                // If the target looper has recently been polling, then
                // there is no reason to enqueue our checker on it since that
                // is as good as it not being deadlocked.  This avoid having
                // to do a context switch to check the thread.  Note that we
                // only do this if mCheckReboot is false and we have no
                // monitors, since those would need to be executed at this point.
                mCompleted = true;
                return;
            }

            if (!mCompleted) {
                // we already have a check in flight, so no need
                return;
            }

            mCompleted = false;
            mCurrentMonitor = null;
            mStartTime = SystemClock.uptimeMillis();
            mHandler.postAtFrontOfQueue(this);
        }

        public boolean isOverdueLocked() {
            return (!mCompleted) && (SystemClock.uptimeMillis() > mStartTime + mWaitMax);
        }

        public int getCompletionStateLocked() {
            if (mCompleted) {
                return COMPLETED;
            } else {
                long latency = SystemClock.uptimeMillis() - mStartTime;
                if (latency < mWaitMax/2) {
                    return WAITING;
                } else if (latency < mWaitMax) {
                    return WAITED_HALF;
                }
            }
            return OVERDUE;
        }

        public Thread getThread() {
            return mHandler.getLooper().getThread();
        }

        public String getName() {
            return mName;
        }

        public String describeBlockedStateLocked() {
            if (mCurrentMonitor == null) {
                return "Blocked in handler on " + mName + " (" + getThread().getName() + ")";
            } else {
                return "Blocked in monitor " + mCurrentMonitor.getClass().getName()
                        + " on " + mName + " (" + getThread().getName() + ")";
            }
        }

        @Override
        public void run() {
            final int size = mMonitors.size();
            for (int i = 0 ; i < size ; i++) {
                synchronized (Watchdog.this) {
                    mCurrentMonitor = mMonitors.get(i);
                }
                mCurrentMonitor.monitor();
            }

            synchronized (Watchdog.this) {
                mCompleted = true;
                mCurrentMonitor = null;
            }
        }
    }

HandlerChecker類中的mMonitors也是監(jiān)控對(duì)象列表嗅定，這里是監(jiān)控所有實(shí)現(xiàn)了Watchdog.Monitor接口的監(jiān)控對(duì)象自娩，而那些沒有實(shí)現(xiàn)Watchdog.Monitor接口的對(duì)象則會(huì)單獨(dú)創(chuàng)建一個(gè)HandlerChecker類并add到Watchdog的mHandlerCheckers監(jiān)控列表中，當(dāng)Watchdog線程開始健康那個(gè)的時(shí)候就回去遍歷mHandlerCheckers列表渠退，并逐一的調(diào)用HandlerChecker的scheduleCheckLocked方法：

public void scheduleCheckLocked() {
            if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
                // If the target looper has recently been polling, then
                // there is no reason to enqueue our checker on it since that
                // is as good as it not being deadlocked.  This avoid having
                // to do a context switch to check the thread.  Note that we
                // only do this if mCheckReboot is false and we have no
                // monitors, since those would need to be executed at this point.
                mCompleted = true;
                return;
            }

            if (!mCompleted) {
                // we already have a check in flight, so no need
                return;
            }

            mCompleted = false;
            mCurrentMonitor = null;
            mStartTime = SystemClock.uptimeMillis();
            mHandler.postAtFrontOfQueue(this);
        }

HandlerChecker這個(gè)類中有幾個(gè)比較重要的標(biāo)志忙迁，一個(gè)是mCompleted，標(biāo)識(shí)著本次監(jiān)控掃描是否在指定時(shí)間內(nèi)完成智什，mStartTime標(biāo)識(shí)本次開始掃描的時(shí)間mHandler动漾，則是被監(jiān)控的線程的handler，scheduleCheckLocked是開啟本次對(duì)與改線程的監(jiān)控荠锭，里面理所當(dāng)然的會(huì)把mCompleted置為false并設(shè)置開始時(shí)間旱眯，可以看到，監(jiān)控原理就是向被監(jiān)控的線程的Handler的消息隊(duì)列中post一個(gè)任務(wù)，也就是HandlerChecker本身删豺，然后HandlerChecker這個(gè)任務(wù)就會(huì)在被監(jiān)控的線程對(duì)應(yīng)Handler維護(hù)的消息隊(duì)列中被執(zhí)行共虑，如果消息隊(duì)列因?yàn)槟骋粋€(gè)任務(wù)卡住，那么HandlerChecker這個(gè)任務(wù)就無法及時(shí)的執(zhí)行到呀页，超過了指定的時(shí)間后就會(huì)被認(rèn)為當(dāng)前被監(jiān)控的這個(gè)線程發(fā)生了卡死（死鎖造成的卡死或者執(zhí)行耗時(shí)任務(wù)造成的卡死）妈拌，在HandlerChecker這個(gè)任務(wù)中：

@Override
        public void run() {
            final int size = mMonitors.size();
            for (int i = 0 ; i < size ; i++) {
                synchronized (Watchdog.this) {
                    mCurrentMonitor = mMonitors.get(i);
                }
                mCurrentMonitor.monitor();
            }

            synchronized (Watchdog.this) {
                mCompleted = true;
                mCurrentMonitor = null;
            }
        }

首先遍歷mMonitors列表中的監(jiān)控對(duì)象并調(diào)用monitor()方法來開啟監(jiān)控，通常在被監(jiān)控對(duì)象實(shí)現(xiàn)的monitor()方法都是按照如下實(shí)現(xiàn)的：

public void monitor() {
        synchronized (this) { }
    }

即監(jiān)控某一個(gè)死鎖蓬蝶，然后就是本次監(jiān)控完成尘分，mCompleted設(shè)置為true，而當(dāng)所有的scheduleCheckLocked都執(zhí)行完了之后丸氛，Watchdog就開始wait培愁，而且一定要wait for 30s，這里有一個(gè)實(shí)現(xiàn)細(xì)節(jié)：

long start = SystemClock.uptimeMillis();
                while (timeout > 0) {
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    try {
                        wait(timeout);
                    } catch (InterruptedException e) {
                        Log.wtf(TAG, e);
                    }
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
                }

原先缓窜，我看到這段代碼的時(shí)候定续，首先關(guān)注到SystemClock.uptimeMillis()在設(shè)備休眠的時(shí)候是不計(jì)時(shí)的，因此猜測(cè)會(huì)不會(huì)是因?yàn)樵O(shè)備休眠了禾锤，wait也停止了私股，Watchdog在wait到15s的時(shí)候設(shè)備休眠了，并且連續(xù)休眠30分鐘后才又被喚醒恩掷，那么這時(shí)候wait會(huì)不會(huì)馬上被喚醒倡鲸，答案是：正常情況下wait會(huì)繼續(xù)，知道直到剩下的15s也wait完成后才會(huì)喚醒螃成，所以我疑惑了旦签，于是查看下下Thread的wait()方法的接口文檔，終于找到如下解釋：

A thread can also wake up without being notified, interrupted, or
     * timing out, a so-called <i>spurious wakeup</i>.  While this will rarely
     * occur in practice, applications must guard against it by testing for
     * the condition that should have caused the thread to be awakened, and
     * continuing to wait if the condition is not satisfied.  In other words,
     * waits should always occur in loops, like this one:
     * <pre>
     *     synchronized (obj) {
     *         while (&lt;condition does not hold&gt;)
     *             obj.wait(timeout);
     *         ... // Perform action appropriate to condition
     *     }
     * </pre>

大致意思是說當(dāng)Thread在wait的時(shí)候除了會(huì)被主動(dòng)喚醒（notify或者notifyAll）寸宏，中斷（interrupt），或者wait的時(shí)間到期而喚醒偿曙，還有可能被假喚醒氮凝，而這種假喚醒在實(shí)踐中發(fā)生的幾率非常低，不過針對(duì)這種假喚醒望忆，程序需要通過驗(yàn)證喚醒條件來區(qū)分線程是真的喚醒還是假的喚醒罩阵，如果是假的喚醒那么就繼續(xù)wait直到真喚醒，事實(shí)上启摄，在我們實(shí)際的開發(fā)過程中確實(shí)要注意這種微小的細(xì)節(jié)稿壁，可能99%的情況下不會(huì)發(fā)生，但是要是遇到1%的情況發(fā)生之后歉备，那么這個(gè)問題將會(huì)是非常隱晦的傅是，而且在查找問題的時(shí)候也會(huì)變得很困難，很奇怪，為什么線程好好的wait過程中突然被喚醒了呢喧笔，甚至可能懷疑我們以前對(duì)于線程wait在設(shè)備休眠狀態(tài)下的執(zhí)行情況帽驯？，廢話就扯到這里书闸，繼續(xù)來研究Watchdog機(jī)制尼变，在Watchdog等待30s之后會(huì)調(diào)用evaluateCheckerCompletionLocked()方法來檢測(cè)被監(jiān)控對(duì)象的運(yùn)行情況：

private int evaluateCheckerCompletionLocked() {
        int state = COMPLETED;
        for (int i=0; i<mHandlerCheckers.size(); i++) {
            HandlerChecker hc = mHandlerCheckers.get(i);
            state = Math.max(state, hc.getCompletionStateLocked());
        }
        return state;
    }

通過調(diào)用HandlerChecker的getCompletionStateLocked來獲取每一個(gè)HandlerChecker的監(jiān)控狀態(tài)：

public int getCompletionStateLocked() {
            if (mCompleted) {
                return COMPLETED;
            } else {
                long latency = SystemClock.uptimeMillis() - mStartTime;
                if (latency < mWaitMax/2) {
                    return WAITING;
                } else if (latency < mWaitMax) {
                    return WAITED_HALF;
                }
            }
            return OVERDUE;
        }

從這里，我們就看到了其實(shí)是通過mCompleted這個(gè)標(biāo)志來區(qū)分30s之前和30s之后的不通狀態(tài)浆劲，因?yàn)?0s之前對(duì)被監(jiān)控的線程對(duì)應(yīng)的Handler的消息對(duì)了中post了一個(gè)HandlerChecker任務(wù)嫌术，然后mCompleted = false，等待了30s后牌借，如果HandlerChecker被及時(shí)的執(zhí)行了蛉威，那么mCompleted = true表示任務(wù)及時(shí)執(zhí)行完畢，而如果發(fā)現(xiàn)mCompleted = false那就說明HandlerChecker依然未被執(zhí)行走哺，當(dāng)mCompleted = false的時(shí)候蚯嫌，會(huì)繼續(xù)檢測(cè)HandlerChecker任務(wù)的執(zhí)行時(shí)間，如果在喚醒狀態(tài)下的執(zhí)行時(shí)間小于30秒丙躏，那重新post監(jiān)控等待择示，如果在30秒到60秒之間，那就會(huì)dump出一些堆棧信息晒旅，然后重新post監(jiān)控等待栅盲，當(dāng)?shù)却龝r(shí)間已經(jīng)超過60秒了离唐，那就認(rèn)為這是異常情況了（要么死鎖晰赞，要么耗時(shí)任務(wù)太久），這時(shí)候就會(huì)搜集各種相關(guān)信息或杠，例如代碼堆棧信息鱼鼓，kernel信息拟烫，cpu信息等，生成trace文件迄本，保存相關(guān)信息到dropbox文件夾下硕淑，然后殺死該進(jìn)程，到這里監(jiān)控就結(jié)束了

Watchdog線程卡頓監(jiān)控實(shí)現(xiàn)

之前我們提到Watchdog監(jiān)控的實(shí)現(xiàn)是通過post一個(gè)HandlerChecker到線程對(duì)應(yīng)的Handler對(duì)的消息對(duì)了中的嘉赎，而死鎖的監(jiān)控對(duì)象都是保存在HandlerChecker的mMonitors列表中的置媳，所以外部調(diào)用addMonitor()方法，最終都會(huì)add到Watchdog的全局變量mMonitorChecker中的監(jiān)控列表公条，一次所有線程的死鎖監(jiān)控都由mMonitorChecker來負(fù)責(zé)實(shí)現(xiàn)拇囊，那么對(duì)于線程耗時(shí)任務(wù)的監(jiān)控，Watchdog是通過addThread()方法來實(shí)現(xiàn)的：

public void addThread(Handler thread) {
        addThread(thread, DEFAULT_TIMEOUT);
    }

    public void addThread(Handler thread, long timeoutMillis) {
        synchronized (this) {
            if (isAlive()) {
                throw new RuntimeException("Threads can't be added once the Watchdog is running");
            }
            final String name = thread.getLooper().getThread().getName();
            mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis));
        }
    }

addThread()方法實(shí)際上是創(chuàng)建了一個(gè)新的HandlerChecker對(duì)象靶橱，通過該對(duì)象來實(shí)現(xiàn)耗時(shí)任務(wù)的監(jiān)控寥袭，而該HandlerChecker對(duì)象的mMonitors列表實(shí)際上是空的路捧，因此在執(zhí)行任務(wù)的時(shí)候并不會(huì)執(zhí)行monitor()方法了，而是直接設(shè)置mCompleted標(biāo)志位纠永，所以可以這么解釋：Watchdog監(jiān)控者是HandlerChecker鬓长，而HandlerChecker實(shí)現(xiàn)了線程死鎖監(jiān)控和耗時(shí)任務(wù)監(jiān)控，當(dāng)有Monitor對(duì)象的時(shí)候就會(huì)同時(shí)監(jiān)控線程死鎖和耗時(shí)任務(wù)尝江，而沒有Monitor的時(shí)候就只是監(jiān)控線程的耗時(shí)任務(wù)造成的卡頓

Watchdog監(jiān)控流程

watchdog.jpg

理解了Watchdog的監(jiān)控流程涉波，我們可以考慮是否把Watchdog機(jī)制運(yùn)用到我們實(shí)際的項(xiàng)目中去實(shí)現(xiàn)監(jiān)控在多線程場(chǎng)景中重要線程的死鎖，以及實(shí)時(shí)監(jiān)控主線程的anr的發(fā)生炭序？當(dāng)然是可以的啤覆，事實(shí)上，Watchdog的在framework中的重要作用就是監(jiān)控主要的系統(tǒng)服務(wù)器是否發(fā)生死鎖或者發(fā)生卡頓惭聂，例如監(jiān)控ActivityManagerService窗声，如果發(fā)生異常情況，那么Watchdog將會(huì)殺死進(jìn)程重啟辜纲，這樣可以保證重要的系統(tǒng)服務(wù)遇到類似問題的時(shí)候可以通過重啟來恢復(fù)笨觅，Watchdog實(shí)際上相當(dāng)于一個(gè)最后的保障，及時(shí)的dump出異常信息耕腾，異臣＃恢復(fù)進(jìn)程運(yùn)行環(huán)境
對(duì)于應(yīng)用程序中，健康那個(gè)重要線程的死鎖問題實(shí)現(xiàn)原理可以和Watchdog保持一致
對(duì)于監(jiān)控應(yīng)用的anr卡頓的實(shí)現(xiàn)原理可以從Watchdog中借鑒扫俺，具體實(shí)現(xiàn)稍微有點(diǎn)不一樣苍苞，Activity是5秒發(fā)生anr，Broadcast是10秒狼纬，Service是20秒羹呵，但是實(shí)際四大組件都是運(yùn)行在主線程中的，所以可以用像Watchdog一樣疗琉，wait 30秒發(fā)起一次監(jiān)控冈欢，通過設(shè)置mCompleted標(biāo)志位來檢測(cè)post到MessageQueue的任務(wù)是否被卡住并未及時(shí)的執(zhí)行，通過mStartTime來計(jì)算出任務(wù)的執(zhí)行時(shí)間没炒，然后通過任務(wù)執(zhí)行的時(shí)間來檢測(cè)MessageQueue中其他的任務(wù)執(zhí)行是否存在耗時(shí)操作涛癌，如果發(fā)現(xiàn)執(zhí)行時(shí)間超過5秒，那么可以說明消息隊(duì)列中存在耗時(shí)任務(wù)送火，這時(shí)候可能就有anr的風(fēng)險(xiǎn)，應(yīng)該及時(shí)dump線程棧信息保存先匪，然后通過大數(shù)據(jù)上報(bào)后臺(tái)分析种吸，記住這里一定是計(jì)算設(shè)備活躍的狀態(tài)下的時(shí)間，如果是設(shè)備休眠呀非，MessageQueue本來就會(huì)暫停運(yùn)行坚俗，這時(shí)候其實(shí)并不是死鎖或者卡頓

anr1.jpg

Watchdog機(jī)制總結(jié)

每一個(gè)線程都可以對(duì)應(yīng)一個(gè)Looper镜盯，一個(gè)Looper對(duì)應(yīng)一個(gè)MessageQueue，所以可以通過向MessageQueue中post檢測(cè)任務(wù)來預(yù)測(cè)該檢測(cè)任務(wù)是否被及時(shí)的執(zhí)行猖败，以此達(dá)到檢測(cè)線程任務(wù)卡頓的效果速缆，但是前提是該線程要先創(chuàng)建一個(gè)Looper
Watchdog必須獨(dú)自運(yùn)行在一個(gè)單獨(dú)的線程中，這樣才可以監(jiān)控其他線程而不互相影響
使用Watchdog機(jī)制來實(shí)現(xiàn)在線的anr監(jiān)控可能并不能百分百準(zhǔn)確恩闻，比如5秒發(fā)生anr艺糜，在快到5秒的臨界值的時(shí)候耗時(shí)任務(wù)正好執(zhí)行完成了，這時(shí)候執(zhí)行anr檢測(cè)任務(wù)幢尚，在檢測(cè)任務(wù)執(zhí)行過程中破停，有可能Watchdog線程wait的時(shí)間也到了，這時(shí)候發(fā)現(xiàn)檢測(cè)任務(wù)還沒執(zhí)行完于是就報(bào)了一個(gè)anr尉剩，這是不準(zhǔn)確的真慢；另一種情況可能是5秒anr已經(jīng)發(fā)生了，但是Watchdog線程檢測(cè)還沒還是wait理茎，也就是anr發(fā)生的時(shí)間和Watchdog線程wait的時(shí)間錯(cuò)開了黑界，等到下一次Watchdog線程開始wait的時(shí)候，anr已經(jīng)發(fā)生完了皂林，主線程可能已經(jīng)恢復(fù)正常朗鸠，這時(shí)候就會(huì)漏掉這次發(fā)生的anr信息搜集，所以當(dāng)anr卡頓的時(shí)間是Watchdog線程wait時(shí)間的兩倍的時(shí)候式撼，才能完整的掃描到anr并記錄童社，也就是說Watchdog的wait時(shí)間為2.5秒，這個(gè)在實(shí)際應(yīng)用中有點(diǎn)過于頻繁了著隆，如果設(shè)備不休眠扰楼，Watchdog相當(dāng)于每間隔2.5秒就會(huì)運(yùn)行一下，可能會(huì)有耗電風(fēng)險(xiǎn)

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末美浦，一起剝皮案震驚了整個(gè)濱河市弦赖，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌浦辨，老刑警劉巖蹬竖，帶你破解...
沈念sama閱讀 217,542評(píng)論 6贊 504
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場(chǎng)離奇詭異流酬，居然都是意外死亡币厕，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,822評(píng)論 3贊 394
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門芽腾，熙熙樓的掌柜王于貴愁眉苦臉地迎上來旦装，“玉大人，你說我怎么就攤上這事摊滔∫蹙睿” “怎么了店乐？”我有些...
開封第一講書人閱讀 163,912評(píng)論 0贊 354
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長(zhǎng)呻袭。經(jīng)常有香客問我眨八，道長(zhǎng)，這世上最難降的妖魔是什么左电？我笑而不...
開封第一講書人閱讀 58,449評(píng)論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任廉侧，我火速辦了婚禮，結(jié)果婚禮上券腔，老公的妹妹穿的比我還像新娘伏穆。我一直安慰自己，他們只是感情好纷纫，可當(dāng)我...
茶點(diǎn)故事閱讀 67,500評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布枕扫。她就那樣靜靜地躺著，像睡著了一般辱魁。火紅的嫁衣襯著肌膚如雪烟瞧。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,370評(píng)論 1贊 302
城市分裂傳說
那天染簇，我揣著相機(jī)與錄音参滴，去河邊找鬼。笑死锻弓，一個(gè)胖子當(dāng)著我的面吹牛砾赔，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播青灼，決...
沈念sama閱讀 40,193評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼暴心，長(zhǎng)吁一口氣：“原來是場(chǎng)噩夢(mèng)啊……” “哼！你這毒婦竟也來了杂拨？” 一聲冷哼從身側(cè)響起专普，我...
開封第一講書人閱讀 39,074評(píng)論 0贊 276
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎弹沽，沒想到半個(gè)月后檀夹，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,505評(píng)論 1贊 314
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡策橘，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,722評(píng)論 3贊 335
?白月光啟示錄
正文我和宋清朗相戀三年炸渡，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片丽已。...
茶點(diǎn)故事閱讀 39,841評(píng)論 1贊 348
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡偶摔，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出促脉，到底是詐尸還是另有隱情辰斋，我是刑警寧澤，帶...
沈念sama閱讀 35,569評(píng)論 5贊 345
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布瘸味，位于F島的核電站宫仗，受9級(jí)特大地震影響，放射性物質(zhì)發(fā)生泄漏旁仿。R本人自食惡果不足惜藕夫，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,168評(píng)論 3贊 328
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望枯冈。院中可真熱鬧毅贮，春花似錦、人聲如沸尘奏。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,783評(píng)論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽炫加。三九已至瑰煎，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間俗孝，已是汗流浹背酒甸。一陣腳步聲響...
開封第一講書人閱讀 32,918評(píng)論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留赋铝，地道東北人插勤。一個(gè)月前我還...
沈念sama閱讀 47,962評(píng)論 2贊 370
代替公主和親
正文我出身青樓，卻偏偏與公主長(zhǎng)得像革骨，于是被迫代替她去往敵國和親农尖。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,781評(píng)論 2贊 354