watchdog分析

watchdog是什么

Watchdog是SystemServer的一個(gè)線程(mThread = new Thread(this::run, "watchdog");)瓦盛,檢測(cè)system server重要線程的鎖狀態(tài)和Handler消息是否阻塞京腥,假如有線程block了60s那么就會(huì)觸發(fā)watchdog timeout flow, 觸發(fā)Android重啟來(lái)使系統(tǒng)恢復(fù);block 30s會(huì)有線程堆棧log打印,并觸發(fā)/data/anr/trace.txt文件生成船老。

watchdog怎么用

以AMS為例

public class ActivityManagerService extends IActivityManager.Stub
        implements Watchdog.Monitor, BatteryStatsImpl.BatteryCallback, ActivityManagerGlobalLock {
    public ActivityManagerService(Context systemContext, ActivityTaskManagerService atm) {
        Watchdog.getInstance().addMonitor(this);
        Watchdog.getInstance().addThread(mHandler);
    }

    public void monitor() {
        synchronized (this) { }
    }

watchdog源碼分析

com/android/server/Watchdog.java



    private final HandlerChecker mMonitorChecker;  //專門用于監(jiān)控是否死鎖  
    public void addMonitor(Monitor monitor) {
        synchronized (mLock) {
            mMonitorChecker.addMonitorLocked(monitor);
        }
    }

    /* This handler will be used to post message back onto the main thread */
    private final ArrayList<HandlerCheckerAndTimeout> mHandlerCheckers = new ArrayList<>();//需要監(jiān)控的looper
    public void addThread(Handler thread) {
        synchronized (mLock) {
            final String name = thread.getLooper().getThread().getName();
            mHandlerCheckers.add(withDefaultTimeout(new HandlerChecker(thread, name)));
        }
    }

    private Watchdog() {
        mThread = new Thread(this::run, "watchdog");//本質(zhì)上是一個(gè)線程

        // Initialize handler checkers for each common thread we want to check.  Note
        // that we are not currently checking the background thread, since it can
        // potentially hold longer running operations with no guarantees about the timeliness
        // of operations there.
        //
        // The shared foreground thread is the main checker.  It is where we
        // will also dispatch monitor checks and do other work.
        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
                "foreground thread");
        mHandlerCheckers.add(withDefaultTimeout(mMonitorChecker));
        // Add checker for main thread.  We only do a quick check since there
        // can be UI running on the thread.
        mHandlerCheckers.add(withDefaultTimeout(
                new HandlerChecker(new Handler(Looper.getMainLooper()), "main thread")));
        // Add checker for shared UI thread.
        mHandlerCheckers.add(withDefaultTimeout(
                new HandlerChecker(UiThread.getHandler(), "ui thread")));
        // And also check IO thread.
        mHandlerCheckers.add(withDefaultTimeout(
                new HandlerChecker(IoThread.getHandler(), "i/o thread")));
        // And the display thread.
        mHandlerCheckers.add(withDefaultTimeout(
                new HandlerChecker(DisplayThread.getHandler(), "display thread")));
        // And the animation thread.
        mHandlerCheckers.add(withDefaultTimeout(
                 new HandlerChecker(AnimationThread.getHandler(), "animation thread")));
        // And the surface animation thread.
        mHandlerCheckers.add(withDefaultTimeout(
                new HandlerChecker(SurfaceAnimationThread.getHandler(),
                    "surface animation thread")));
        // Initialize monitor for Binder threads.
        addMonitor(new BinderThreadMonitor());//檢測(cè)是否binder線程耗盡

        mInterestingJavaPids.add(Process.myPid());

        // See the notes on DEFAULT_TIMEOUT.
        assert DB ||
                DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;

        mTraceErrorLogger = new TraceErrorLogger();
    }
   public void start() {
        mThread.start();
    }

    private void run() {
        boolean waitedHalf = false;

        while (true) {
            List<HandlerChecker> blockedCheckers = Collections.emptyList();
            String subject = "";
            boolean allowRestart = true;
            int debuggerWasConnected = 0;
            boolean doWaitedHalfDump = false;
            // The value of mWatchdogTimeoutMillis might change while we are executing the loop.
            // We store the current value to use a consistent value for all handlers.
            final long watchdogTimeoutMillis = mWatchdogTimeoutMillis;
            final long checkIntervalMillis = watchdogTimeoutMillis / 2;
            final ArrayList<Integer> pids;
            synchronized (mLock) {
                long timeout = checkIntervalMillis;
                // Make sure we (re)spin the checkers that have become idle within
                // this wait-and-check interval
                for (int i=0; i<mHandlerCheckers.size(); i++) {
                    HandlerCheckerAndTimeout hc = mHandlerCheckers.get(i);
                    // We pick the watchdog to apply every time we reschedule the checkers. The
                    // default timeout might have changed since the last run.
                    hc.checker().scheduleCheckLocked(hc.customTimeoutMillis()
                            .orElse(watchdogTimeoutMillis * Build.HW_TIMEOUT_MULTIPLIER));//往被監(jiān)測(cè)的looper線程發(fā)消息
                }

                if (debuggerWasConnected > 0) {
                    debuggerWasConnected--;
                }

                // NOTE: We use uptimeMillis() here because we do not want to increment the time we
                // wait while asleep. If the device is asleep then the thing that we are waiting
                // to timeout on is asleep as well and won't have a chance to run, causing a false
                // positive on when to kill things.
                long start = SystemClock.uptimeMillis();
                while (timeout > 0) {
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    try {
                        mLock.wait(timeout);//等30s
                        // Note: mHandlerCheckers and mMonitorChecker may have changed after waiting
                    } catch (InterruptedException e) {
                        Log.wtf(TAG, e);
                    }
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    timeout = checkIntervalMillis - (SystemClock.uptimeMillis() - start);
                }

                final int waitState = evaluateCheckerCompletionLocked();//判斷是否有阻塞
                if (waitState == COMPLETED) {
                    // The monitors have returned; reset
                    waitedHalf = false;
                    continue;
                } else if (waitState == WAITING) {
                    // still waiting but within their configured intervals; back off and recheck
                    continue;
                } else if (waitState == WAITED_HALF) {
                    if (!waitedHalf) {
                        Slog.i(TAG, "WAITED_HALF");
                        waitedHalf = true;
                        // We've waited half, but we'd need to do the stack trace dump w/o the lock.
                        blockedCheckers = getCheckersWithStateLocked(WAITED_HALF);//找到被阻塞的looper
                        subject = describeCheckersLocked(blockedCheckers);
                        pids = new ArrayList<>(mInterestingJavaPids);
                        doWaitedHalfDump = true;
                    } else {
                        continue;
                    }
                } else {
                    // something is overdue!
                    blockedCheckers = getCheckersWithStateLocked(OVERDUE);
                    subject = describeCheckersLocked(blockedCheckers);
                    allowRestart = mAllowRestart;
                    pids = new ArrayList<>(mInterestingJavaPids);
                }
            } // END synchronized (mLock)

            // If we got here, that means that the system is most likely hung.
            //
            // First collect stack traces from all threads of the system process.
            //
            // Then, if we reached the full timeout, kill this process so that the system will
            // restart. If we reached half of the timeout, just log some information and continue.
            logWatchog(doWaitedHalfDump, subject, pids);//打日志世曾,觸發(fā)生成trace.txt

            if (doWaitedHalfDump) {
                // We have waited for only half of the timeout, we continue to wait for the duration
                // of the full timeout before killing the process.
                continue;
            }

            IActivityController controller;
            synchronized (mLock) {
                controller = mController;
            }
            if (controller != null) {
                Slog.i(TAG, "Reporting stuck state to activity controller");
                try {
                    Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
                    // 1 = keep waiting, -1 = kill system
                    int res = controller.systemNotResponding(subject);
                    if (res >= 0) {
                        Slog.i(TAG, "Activity controller requested to coninue to wait");
                        waitedHalf = false;
                        continue;
                    }
                } catch (RemoteException e) {
                }
            }

            // Only kill the process if the debugger is not attached.
            if (Debug.isDebuggerConnected()) {//判斷當(dāng)前是否處于debugger調(diào)試裕循,排除debugger調(diào)試引起的超時(shí)
                debuggerWasConnected = 2;
            }
            if (debuggerWasConnected >= 2) {
                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
            } else if (debuggerWasConnected > 0) {
                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
            } else if (!allowRestart) {
                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
            } else {
                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
                WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
                Slog.w(TAG, "*** GOODBYE!");
                if (!Build.IS_USER && isCrashLoopFound()
                        && !WatchdogProperties.should_ignore_fatal_count().orElse(false)) {
                    breakCrashLoop();
                }
                Process.killProcess(Process.myPid());//重啟system_server
                System.exit(10);
            }

            waitedHalf = false;
        }
    }
public final class HandlerChecker implements Runnable {
        public void scheduleCheckLocked(long handlerCheckerTimeoutMillis) {
            mWaitMax = handlerCheckerTimeoutMillis;
            if (mCompleted) {
                // Safe to update monitors in queue, Handler is not in the middle of work
                mMonitors.addAll(mMonitorQueue);
                mMonitorQueue.clear();
            }
            if ((mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling())
                    || (mPauseCount > 0)) {
                // Don't schedule until after resume OR
                // If the target looper has recently been polling, then
                // there is no reason to enqueue our checker on it since that
                // is as good as it not being deadlocked.  This avoid having
                // to do a context switch to check the thread. Note that we
                // only do this if we have no monitors since those would need to
                // be executed at this point.
                mCompleted = true;
                return;
            }
            if (!mCompleted) {
                // we already have a check in flight, so no need
                return;
            }

            mCompleted = false;
            mCurrentMonitor = null;
            mStartTime = SystemClock.uptimeMillis();
            mHandler.postAtFrontOfQueue(this);//發(fā)消息
        }

        public int getCompletionStateLocked() {//判斷是否阻塞
            if (mCompleted) {
                return COMPLETED;
            } else {
                long latency = SystemClock.uptimeMillis() - mStartTime;
                if (latency < mWaitMax/2) {
                    return WAITING;
                } else if (latency < mWaitMax) {
                    return WAITED_HALF;
                }
            }
            return OVERDUE;
        }

        public void run() {
            // Once we get here, we ensure that mMonitors does not change even if we call
            // #addMonitorLocked because we first add the new monitors to mMonitorQueue and
            // move them to mMonitors on the next schedule when mCompleted is true, at which
            // point we have completed execution of this method.
            final int size = mMonitors.size();
            for (int i = 0 ; i < size ; i++) {
                synchronized (mLock) {
                    mCurrentMonitor = mMonitors.get(i);
                }
                mCurrentMonitor.monitor();//判斷是否死鎖
            }

            synchronized (mLock) {
                mCompleted = true;
                mCurrentMonitor = null;
            }
        }
}

總結(jié):
Watchdog的主要流程是:開啟一個(gè)死循環(huán)时迫,不斷給指定線程發(fā)送一條消息椎扬,然后休眠30秒惫搏,休眠結(jié)束后判斷是否收到消息的回調(diào),如果有蚕涤,則正常進(jìn)行下次循環(huán)筐赔,如果沒(méi)收到,判斷從發(fā)消息到現(xiàn)在的時(shí)機(jī)小于30秒不處理揖铜,大于30秒小于60秒收集信息茴丰,大于60秒收集信息并重啟。

問(wèn)題分析解決思路

跟anr解決的思路一致天吓,看log贿肩,看trace.txt,注意cpu龄寞、io汰规、鎖、binder調(diào)用甚至是Thread.sleep等常見的因素物邑×锵可以參考https://blog.csdn.net/wd229047557/article/details/108059481

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末滔金,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子茬射,更是在濱河造成了極大的恐慌鹦蠕,老刑警劉巖,帶你破解...
    沈念sama閱讀 217,509評(píng)論 6 504
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件在抛,死亡現(xiàn)場(chǎng)離奇詭異钟病,居然都是意外死亡,警方通過(guò)查閱死者的電腦和手機(jī)刚梭,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,806評(píng)論 3 394
  • 文/潘曉璐 我一進(jìn)店門肠阱,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái),“玉大人朴读,你說(shuō)我怎么就攤上這事屹徘。” “怎么了衅金?”我有些...
    開封第一講書人閱讀 163,875評(píng)論 0 354
  • 文/不壞的土叔 我叫張陵噪伊,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我氮唯,道長(zhǎng)鉴吹,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,441評(píng)論 1 293
  • 正文 為了忘掉前任惩琉,我火速辦了婚禮豆励,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘瞒渠。我一直安慰自己良蒸,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,488評(píng)論 6 392
  • 文/花漫 我一把揭開白布伍玖。 她就那樣靜靜地躺著嫩痰,像睡著了一般。 火紅的嫁衣襯著肌膚如雪窍箍。 梳的紋絲不亂的頭發(fā)上始赎,一...
    開封第一講書人閱讀 51,365評(píng)論 1 302
  • 那天,我揣著相機(jī)與錄音仔燕,去河邊找鬼。 笑死魔招,一個(gè)胖子當(dāng)著我的面吹牛晰搀,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播办斑,決...
    沈念sama閱讀 40,190評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼外恕,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼杆逗!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起鳞疲,我...
    開封第一講書人閱讀 39,062評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤罪郊,失蹤者是張志新(化名)和其女友劉穎,沒(méi)想到半個(gè)月后尚洽,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體悔橄,經(jīng)...
    沈念sama閱讀 45,500評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,706評(píng)論 3 335
  • 正文 我和宋清朗相戀三年腺毫,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了癣疟。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 39,834評(píng)論 1 347
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡潮酒,死狀恐怖睛挚,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情急黎,我是刑警寧澤扎狱,帶...
    沈念sama閱讀 35,559評(píng)論 5 345
  • 正文 年R本政府宣布,位于F島的核電站勃教,受9級(jí)特大地震影響淤击,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜荣回,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,167評(píng)論 3 328
  • 文/蒙蒙 一遭贸、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧心软,春花似錦壕吹、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,779評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)。三九已至猎唁,卻和暖如春咒劲,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背诫隅。 一陣腳步聲響...
    開封第一講書人閱讀 32,912評(píng)論 1 269
  • 我被黑心中介騙來(lái)泰國(guó)打工腐魂, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人逐纬。 一個(gè)月前我還...
    沈念sama閱讀 47,958評(píng)論 2 370
  • 正文 我出身青樓蛔屹,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親豁生。 傳聞我的和親對(duì)象是個(gè)殘疾皇子兔毒,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,779評(píng)論 2 354