watchdog分析

watchdog是什么

Watchdog是SystemServer的一個(gè)線程（mThread = new Thread(this::run, "watchdog");）瓦盛，檢測(cè)system server重要線程的鎖狀態(tài)和Handler消息是否阻塞京腥，假如有線程block了60s那么就會(huì)觸發(fā)watchdog timeout flow, 觸發(fā)Android重啟來(lái)使系統(tǒng)恢復(fù);block 30s會(huì)有線程堆棧log打印,并觸發(fā)/data/anr/trace.txt文件生成船老。

watchdog怎么用

以AMS為例

public class ActivityManagerService extends IActivityManager.Stub
        implements Watchdog.Monitor, BatteryStatsImpl.BatteryCallback, ActivityManagerGlobalLock {
    public ActivityManagerService(Context systemContext, ActivityTaskManagerService atm) {
        Watchdog.getInstance().addMonitor(this);
        Watchdog.getInstance().addThread(mHandler);
    }

    public void monitor() {
        synchronized (this) { }
    }

watchdog源碼分析

com/android/server/Watchdog.java



    private final HandlerChecker mMonitorChecker;  //專門用于監(jiān)控是否死鎖  
    public void addMonitor(Monitor monitor) {
        synchronized (mLock) {
            mMonitorChecker.addMonitorLocked(monitor);
        }
    }

    /* This handler will be used to post message back onto the main thread */
    private final ArrayList<HandlerCheckerAndTimeout> mHandlerCheckers = new ArrayList<>();//需要監(jiān)控的looper
    public void addThread(Handler thread) {
        synchronized (mLock) {
            final String name = thread.getLooper().getThread().getName();
            mHandlerCheckers.add(withDefaultTimeout(new HandlerChecker(thread, name)));
        }
    }

    private Watchdog() {
        mThread = new Thread(this::run, "watchdog");//本質(zhì)上是一個(gè)線程

        // Initialize handler checkers for each common thread we want to check.  Note
        // that we are not currently checking the background thread, since it can
        // potentially hold longer running operations with no guarantees about the timeliness
        // of operations there.
        //
        // The shared foreground thread is the main checker.  It is where we
        // will also dispatch monitor checks and do other work.
        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
                "foreground thread");
        mHandlerCheckers.add(withDefaultTimeout(mMonitorChecker));
        // Add checker for main thread.  We only do a quick check since there
        // can be UI running on the thread.
        mHandlerCheckers.add(withDefaultTimeout(
                new HandlerChecker(new Handler(Looper.getMainLooper()), "main thread")));
        // Add checker for shared UI thread.
        mHandlerCheckers.add(withDefaultTimeout(
                new HandlerChecker(UiThread.getHandler(), "ui thread")));
        // And also check IO thread.
        mHandlerCheckers.add(withDefaultTimeout(
                new HandlerChecker(IoThread.getHandler(), "i/o thread")));
        // And the display thread.
        mHandlerCheckers.add(withDefaultTimeout(
                new HandlerChecker(DisplayThread.getHandler(), "display thread")));
        // And the animation thread.
        mHandlerCheckers.add(withDefaultTimeout(
                 new HandlerChecker(AnimationThread.getHandler(), "animation thread")));
        // And the surface animation thread.
        mHandlerCheckers.add(withDefaultTimeout(
                new HandlerChecker(SurfaceAnimationThread.getHandler(),
                    "surface animation thread")));
        // Initialize monitor for Binder threads.
        addMonitor(new BinderThreadMonitor());//檢測(cè)是否binder線程耗盡

        mInterestingJavaPids.add(Process.myPid());

        // See the notes on DEFAULT_TIMEOUT.
        assert DB ||
                DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;

        mTraceErrorLogger = new TraceErrorLogger();
    }
   public void start() {
        mThread.start();
    }

    private void run() {
        boolean waitedHalf = false;

        while (true) {
            List<HandlerChecker> blockedCheckers = Collections.emptyList();
            String subject = "";
            boolean allowRestart = true;
            int debuggerWasConnected = 0;
            boolean doWaitedHalfDump = false;
            // The value of mWatchdogTimeoutMillis might change while we are executing the loop.
            // We store the current value to use a consistent value for all handlers.
            final long watchdogTimeoutMillis = mWatchdogTimeoutMillis;
            final long checkIntervalMillis = watchdogTimeoutMillis / 2;
            final ArrayList<Integer> pids;
            synchronized (mLock) {
                long timeout = checkIntervalMillis;
                // Make sure we (re)spin the checkers that have become idle within
                // this wait-and-check interval
                for (int i=0; i<mHandlerCheckers.size(); i++) {
                    HandlerCheckerAndTimeout hc = mHandlerCheckers.get(i);
                    // We pick the watchdog to apply every time we reschedule the checkers. The
                    // default timeout might have changed since the last run.
                    hc.checker().scheduleCheckLocked(hc.customTimeoutMillis()
                            .orElse(watchdogTimeoutMillis * Build.HW_TIMEOUT_MULTIPLIER));//往被監(jiān)測(cè)的looper線程發(fā)消息
                }

                if (debuggerWasConnected > 0) {
                    debuggerWasConnected--;
                }

                // NOTE: We use uptimeMillis() here because we do not want to increment the time we
                // wait while asleep. If the device is asleep then the thing that we are waiting
                // to timeout on is asleep as well and won't have a chance to run, causing a false
                // positive on when to kill things.
                long start = SystemClock.uptimeMillis();
                while (timeout > 0) {
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    try {
                        mLock.wait(timeout);//等30s
                        // Note: mHandlerCheckers and mMonitorChecker may have changed after waiting
                    } catch (InterruptedException e) {
                        Log.wtf(TAG, e);
                    }
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    timeout = checkIntervalMillis - (SystemClock.uptimeMillis() - start);
                }

                final int waitState = evaluateCheckerCompletionLocked();//判斷是否有阻塞
                if (waitState == COMPLETED) {
                    // The monitors have returned; reset
                    waitedHalf = false;
                    continue;
                } else if (waitState == WAITING) {
                    // still waiting but within their configured intervals; back off and recheck
                    continue;
                } else if (waitState == WAITED_HALF) {
                    if (!waitedHalf) {
                        Slog.i(TAG, "WAITED_HALF");
                        waitedHalf = true;
                        // We've waited half, but we'd need to do the stack trace dump w/o the lock.
                        blockedCheckers = getCheckersWithStateLocked(WAITED_HALF);//找到被阻塞的looper
                        subject = describeCheckersLocked(blockedCheckers);
                        pids = new ArrayList<>(mInterestingJavaPids);
                        doWaitedHalfDump = true;
                    } else {
                        continue;
                    }
                } else {
                    // something is overdue!
                    blockedCheckers = getCheckersWithStateLocked(OVERDUE);
                    subject = describeCheckersLocked(blockedCheckers);
                    allowRestart = mAllowRestart;
                    pids = new ArrayList<>(mInterestingJavaPids);
                }
            } // END synchronized (mLock)

            // If we got here, that means that the system is most likely hung.
            //
            // First collect stack traces from all threads of the system process.
            //
            // Then, if we reached the full timeout, kill this process so that the system will
            // restart. If we reached half of the timeout, just log some information and continue.
            logWatchog(doWaitedHalfDump, subject, pids);//打日志世曾，觸發(fā)生成trace.txt

            if (doWaitedHalfDump) {
                // We have waited for only half of the timeout, we continue to wait for the duration
                // of the full timeout before killing the process.
                continue;
            }

            IActivityController controller;
            synchronized (mLock) {
                controller = mController;
            }
            if (controller != null) {
                Slog.i(TAG, "Reporting stuck state to activity controller");
                try {
                    Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
                    // 1 = keep waiting, -1 = kill system
                    int res = controller.systemNotResponding(subject);
                    if (res >= 0) {
                        Slog.i(TAG, "Activity controller requested to coninue to wait");
                        waitedHalf = false;
                        continue;
                    }
                } catch (RemoteException e) {
                }
            }

            // Only kill the process if the debugger is not attached.
            if (Debug.isDebuggerConnected()) {//判斷當(dāng)前是否處于debugger調(diào)試裕循，排除debugger調(diào)試引起的超時(shí)
                debuggerWasConnected = 2;
            }
            if (debuggerWasConnected >= 2) {
                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
            } else if (debuggerWasConnected > 0) {
                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
            } else if (!allowRestart) {
                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
            } else {
                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
                WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
                Slog.w(TAG, "*** GOODBYE!");
                if (!Build.IS_USER && isCrashLoopFound()
                        && !WatchdogProperties.should_ignore_fatal_count().orElse(false)) {
                    breakCrashLoop();
                }
                Process.killProcess(Process.myPid());//重啟system_server
                System.exit(10);
            }

            waitedHalf = false;
        }
    }

public final class HandlerChecker implements Runnable {
        public void scheduleCheckLocked(long handlerCheckerTimeoutMillis) {
            mWaitMax = handlerCheckerTimeoutMillis;
            if (mCompleted) {
                // Safe to update monitors in queue, Handler is not in the middle of work
                mMonitors.addAll(mMonitorQueue);
                mMonitorQueue.clear();
            }
            if ((mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling())
                    || (mPauseCount > 0)) {
                // Don't schedule until after resume OR
                // If the target looper has recently been polling, then
                // there is no reason to enqueue our checker on it since that
                // is as good as it not being deadlocked.  This avoid having
                // to do a context switch to check the thread. Note that we
                // only do this if we have no monitors since those would need to
                // be executed at this point.
                mCompleted = true;
                return;
            }
            if (!mCompleted) {
                // we already have a check in flight, so no need
                return;
            }

            mCompleted = false;
            mCurrentMonitor = null;
            mStartTime = SystemClock.uptimeMillis();
            mHandler.postAtFrontOfQueue(this);//發(fā)消息
        }

        public int getCompletionStateLocked() {//判斷是否阻塞
            if (mCompleted) {
                return COMPLETED;
            } else {
                long latency = SystemClock.uptimeMillis() - mStartTime;
                if (latency < mWaitMax/2) {
                    return WAITING;
                } else if (latency < mWaitMax) {
                    return WAITED_HALF;
                }
            }
            return OVERDUE;
        }

        public void run() {
            // Once we get here, we ensure that mMonitors does not change even if we call
            // #addMonitorLocked because we first add the new monitors to mMonitorQueue and
            // move them to mMonitors on the next schedule when mCompleted is true, at which
            // point we have completed execution of this method.
            final int size = mMonitors.size();
            for (int i = 0 ; i < size ; i++) {
                synchronized (mLock) {
                    mCurrentMonitor = mMonitors.get(i);
                }
                mCurrentMonitor.monitor();//判斷是否死鎖
            }

            synchronized (mLock) {
                mCompleted = true;
                mCurrentMonitor = null;
            }
        }
}

總結(jié)：
Watchdog的主要流程是：開啟一個(gè)死循環(huán)时迫，不斷給指定線程發(fā)送一條消息椎扬，然后休眠30秒惫搏，休眠結(jié)束后判斷是否收到消息的回調(diào)，如果有蚕涤，則正常進(jìn)行下次循環(huán)筐赔，如果沒(méi)收到，判斷從發(fā)消息到現(xiàn)在的時(shí)機(jī)小于30秒不處理揖铜，大于30秒小于60秒收集信息茴丰，大于60秒收集信息并重啟。

問(wèn)題分析解決思路

跟anr解決的思路一致天吓，看log贿肩，看trace.txt，注意cpu龄寞、io汰规、鎖、binder調(diào)用甚至是Thread.sleep等常見的因素物邑×锵可以參考https://blog.csdn.net/wd229047557/article/details/108059481

最后編輯于：2024.03.15 09:49:39

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末滔金，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子茬射，更是在濱河造成了極大的恐慌鹦蠕，老刑警劉巖，帶你破解...
沈念sama閱讀 217,509評(píng)論 6贊 504
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件在抛，死亡現(xiàn)場(chǎng)離奇詭異钟病，居然都是意外死亡，警方通過(guò)查閱死者的電腦和手機(jī)刚梭，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,806評(píng)論 3贊 394
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門肠阱，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)，“玉大人朴读，你說(shuō)我怎么就攤上這事屹徘。” “怎么了衅金？”我有些...
開封第一講書人閱讀 163,875評(píng)論 0贊 354
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵噪伊，是天一觀的道長(zhǎng)。經(jīng)常有香客問(wèn)我氮唯，道長(zhǎng)鉴吹，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 58,441評(píng)論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任惩琉，我火速辦了婚禮豆励，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘瞒渠。我一直安慰自己良蒸，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 67,488評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開白布伍玖。她就那樣靜靜地躺著嫩痰，像睡著了一般。火紅的嫁衣襯著肌膚如雪窍箍。梳的紋絲不亂的頭發(fā)上始赎，一...
開封第一講書人閱讀 51,365評(píng)論 1贊 302
城市分裂傳說(shuō)
那天，我揣著相機(jī)與錄音仔燕，去河邊找鬼。笑死魔招，一個(gè)胖子當(dāng)著我的面吹牛晰搀，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播办斑，決...
沈念sama閱讀 40,190評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼外恕，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼杆逗！你這毒婦竟也來(lái)了？” 一聲冷哼從身側(cè)響起鳞疲，我...
開封第一講書人閱讀 39,062評(píng)論 0贊 276
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤罪郊，失蹤者是張志新（化名）和其女友劉穎，沒(méi)想到半個(gè)月后尚洽，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體悔橄，經(jīng)...
沈念sama閱讀 45,500評(píng)論 1贊 314
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,706評(píng)論 3贊 335
?白月光啟示錄
正文我和宋清朗相戀三年腺毫，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了癣疟。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 39,834評(píng)論 1贊 347
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡潮酒，死狀恐怖睛挚，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情急黎，我是刑警寧澤扎狱，帶...
沈念sama閱讀 35,559評(píng)論 5贊 345
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站勃教，受9級(jí)特大地震影響淤击，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜荣回，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,167評(píng)論 3贊 328
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一遭贸、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧心软，春花似錦壕吹、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,779評(píng)論 0贊 22
一樁弒父案耳贬，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)。三九已至猎唁，卻和暖如春咒劲，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背诫隅。一陣腳步聲響...
開封第一講書人閱讀 32,912評(píng)論 1贊 269
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工腐魂，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人逐纬。一個(gè)月前我還...
沈念sama閱讀 47,958評(píng)論 2贊 370
代替公主和親
正文我出身青樓蛔屹，卻偏偏與公主長(zhǎng)得像，于是被迫代替她去往敵國(guó)和親豁生。傳聞我的和親對(duì)象是個(gè)殘疾皇子兔毒，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,779評(píng)論 2贊 354