Watchdog看門狗分析

看門狗最初的意義是因?yàn)樵缙谇度胧皆O(shè)備上的程序經(jīng)常跑飛（比如說電磁干擾等）踩麦，所以專門設(shè)置了一個(gè)硬件看門狗佛猛，每隔一段時(shí)間制妄，看門狗就去檢查某個(gè)參數(shù)是不是被設(shè)置了祠锣，如果發(fā)現(xiàn)該參數(shù)被設(shè)置了酷窥，則判斷為系統(tǒng)出錯(cuò)伴网，然后強(qiáng)制重啟。
Watchdog是Android用于對(duì)SystemServer的參數(shù)設(shè)置進(jìn)行監(jiān)聽的看門狗澡腾。那它看的是哪幾個(gè)門呢，主要是幾個(gè)重要的service的門动分。

ActivityManagerService
PowerManagerService
WindowManagerService

一旦發(fā)現(xiàn)service出了問題，就會(huì)殺掉system_server,而這也會(huì)使zygote隨其一起自殺澜公，最后導(dǎo)致重啟java世界。

那system_server是如何使用Watchdog來為自己服務(wù)的呢玛瘸？

system_server和Watchdog的交互流程可以總結(jié)為以下三個(gè)步驟：

Watchdog.getInstance().init()
Watchdog.getInstance().start().
Watchdog.getInstance().addMonitor()

這三個(gè)步驟都非常簡單。先看第一步

創(chuàng)建和初始化Watchdog

getInstance用于創(chuàng)建Watchdog

 public static Watchdog getInstance() {
        if (sWatchdog == null) {
            sWatchdog = new Watchdog();
        }

        return sWatchdog;
    }

    private Watchdog() {
        super("watchdog");
        // Initialize handler checkers for each common thread we want to check.  Note
        // that we are not currently checking the background thread, since it can
        // potentially hold longer running operations with no guarantees about the timeliness
        // of operations there.

        // The shared foreground thread is the main checker.  It is where we
        // will also dispatch monitor checks and do other work.
        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
                "foreground thread", DEFAULT_TIMEOUT);
        mHandlerCheckers.add(mMonitorChecker);
        // Add checker for main thread.  We only do a quick check since there
        // can be UI running on the thread.
        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
                "main thread", DEFAULT_TIMEOUT));
        // Add checker for shared UI thread.
        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
                "ui thread", DEFAULT_TIMEOUT));
        // And also check IO thread.
        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
                "i/o thread", DEFAULT_TIMEOUT));
        // And the display thread.
        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
                "display thread", DEFAULT_TIMEOUT));

        // Initialize monitor for Binder threads.
        addMonitor(new BinderThreadMonitor());
    }

接著看看Init函數(shù)做了些什么

    public void init(Context context, ActivityManagerService activity) {
        mResolver = context.getContentResolver();
        mActivity = activity;

        context.registerReceiver(new RebootRequestReceiver(),
                new IntentFilter(Intent.ACTION_REBOOT),
                android.Manifest.permission.REBOOT, null);
    }

2.讓W(xué)atchdog看門狗跑起來
SystemServer調(diào)用了Watchdog的start函數(shù)右核，這將導(dǎo)致Watchdog的run在另外一個(gè)線程中被執(zhí)行渺绒。

public void run() {
        boolean waitedHalf = false;
        while (true) {
            final ArrayList<HandlerChecker> blockedCheckers;
            final String subject;
            final boolean allowRestart;
            int debuggerWasConnected = 0;
            synchronized (this) {
                long timeout = CHECK_INTERVAL;
                // Make sure we (re)spin the checkers that have become idle within
                // this wait-and-check interval
                for (int i=0; i<mHandlerCheckers.size(); i++) {
                    HandlerChecker hc = mHandlerCheckers.get(i);
                    hc.scheduleCheckLocked();
                }

                if (debuggerWasConnected > 0) {
                    debuggerWasConnected--;
                }

                // NOTE: We use uptimeMillis() here because we do not want to increment the time we
                // wait while asleep. If the device is asleep then the thing that we are waiting
                // to timeout on is asleep as well and won't have a chance to run, causing a false
                // positive on when to kill things.
                long start = SystemClock.uptimeMillis();
                while (timeout > 0) {
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    try {
                        wait(timeout);
                    } catch (InterruptedException e) {
                        Log.wtf(TAG, e);
                    }
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
                }

                final int waitState = evaluateCheckerCompletionLocked();
                if (waitState == COMPLETED) {
                    // The monitors have returned; reset
                    waitedHalf = false;
                    continue;
                } else if (waitState == WAITING) {
                    // still waiting but within their configured intervals; back off and recheck
                    continue;
                } else if (waitState == WAITED_HALF) {
                    if (!waitedHalf) {
                        // We've waited half the deadlock-detection interval.  Pull a stack
                        // trace and wait another half.
                        ArrayList<Integer> pids = new ArrayList<Integer>();
                        pids.add(Process.myPid());
                        ActivityManagerService.dumpStackTraces(true, pids, null, null,
                                NATIVE_STACKS_OF_INTEREST);
                        waitedHalf = true;
                    }
                    continue;
                }

                // something is overdue!
                blockedCheckers = getBlockedCheckersLocked();
                subject = describeCheckersLocked(blockedCheckers);
                allowRestart = mAllowRestart;
            }

            // If we got here, that means that the system is most likely hung.
            // First collect stack traces from all threads of the system process.
            // Then kill this process so that the system will restart.
            EventLog.writeEvent(EventLogTags.WATCHDOG, subject);

            ArrayList<Integer> pids = new ArrayList<Integer>();
            pids.add(Process.myPid());
            if (mPhonePid > 0) pids.add(mPhonePid);
            // Pass !waitedHalf so that just in case we somehow wind up here without having
            // dumped the halfway stacks, we properly re-initialize the trace file.
            final File stack = ActivityManagerService.dumpStackTraces(
                    !waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);

            // Give some extra time to make sure the stack traces get written.
            // The system's been hanging for a minute, another second or two won't hurt much.
            SystemClock.sleep(2000);

            // Pull our own kernel thread stacks as well if we're configured for that
            if (RECORD_KERNEL_THREADS) {
                dumpKernelStackTraces();
            }

            // Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log
            doSysRq('w');
            doSysRq('l');

            // Try to add the error to the dropbox, but assuming that the ActivityManager
            // itself may be deadlocked.  (which has happened, causing this statement to
            // deadlock and the watchdog as a whole to be ineffective)
            Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
                    public void run() {
                        mActivity.addErrorToDropBox(
                                "watchdog", null, "system_server", null, null,
                                subject, null, stack, null);
                    }
                };
            dropboxThread.start();
            try {
                dropboxThread.join(2000);  // wait up to 2 seconds for it to return.
            } catch (InterruptedException ignored) {}

            IActivityController controller;
            synchronized (this) {
                controller = mController;
            }
            if (controller != null) {
                Slog.i(TAG, "Reporting stuck state to activity controller");
                try {
                    Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
                    // 1 = keep waiting, -1 = kill system
                    int res = controller.systemNotResponding(subject);
                    if (res >= 0) {
                        Slog.i(TAG, "Activity controller requested to coninue to wait");
                        waitedHalf = false;
                        continue;
                    }
                } catch (RemoteException e) {
                }
            }

            // Only kill the process if the debugger is not attached.
            if (Debug.isDebuggerConnected()) {
                debuggerWasConnected = 2;
            }
            if (debuggerWasConnected >= 2) {
                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
            } else if (debuggerWasConnected > 0) {
                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
            } else if (!allowRestart) {
                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
            } else {
                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
                for (int i=0; i<blockedCheckers.size(); i++) {
                    Slog.w(TAG, blockedCheckers.get(i).getName() + " stack trace:");
                    StackTraceElement[] stackTrace
                            = blockedCheckers.get(i).getThread().getStackTrace();
                    for (StackTraceElement element: stackTrace) {
                        Slog.w(TAG, "    at " + element);
                    }
                }
                Slog.w(TAG, "*** GOODBYE!");
                //這回真有問題了菱鸥，所以就把自己干掉吧。
                Process.killProcess(Process.myPid());
                System.exit(10);
            }

            waitedHalf = false;
        }
    }

隔一段時(shí)間給另外一個(gè)線程發(fā)送一條monitor消息躏鱼，那個(gè)線程將檢查各個(gè)service的健康情況氮采。而看門狗會(huì)等待檢查結(jié)果，如果最后沒有返回結(jié)果染苛，那么它會(huì)殺掉systemServer.

3.列隊(duì)檢查
要想支持看門狗的檢查鹊漠，就需要讓這些Service實(shí)現(xiàn)Monitor接口，

public interface Monitor {
void monitor();
}

例如WindowManagerServer

public class WindowManagerService extends IWindowManager.Stub
implements ==Watchdog.Monitor,== WindowManagerPolicy.WindowManagerFuncs

然后Watchdog就會(huì)調(diào)用它們的monitor函數(shù)進(jìn)行檢查了茶行。

那么Service的健康是如何判定的呢躯概。我們以WindowManagerService為例，先看看它是怎么把自己交給看門狗檢查的畔师，代碼如下：

// Add ourself to the Watchdog monitors.
//在構(gòu)造函數(shù)中把自己加入了Watchdog的檢查列隊(duì)中
Watchdog.getInstance().addMonitor(this);

而Watchdog調(diào)用各個(gè)monitor函數(shù)到底又檢查了什么呢娶靡？再看看它實(shí)現(xiàn)的monitor函數(shù)吧。
WindowManagerServer-->

@Override
public void monitor() {
//原來monitor檢查的就是這些service是不是又發(fā)生死鎖了
synchronized (mWindowMap) { }
}

原來看锉，watchdog最怕系統(tǒng)服務(wù)死鎖了姿锭，對(duì)于這種情況也只能采取殺系統(tǒng)的方式了。

說明：這種情況我只碰過一次伯铣，原因是一個(gè)函數(shù)占著鎖呻此，但長時(shí)間沒有返回。沒有返回的原因是這個(gè)函數(shù)需要和硬件交互懂傀，而硬件又沒有及時(shí)返回趾诗。

最后編輯于：2017.12.04 02:25:00

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市蹬蚁，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌郑兴，老刑警劉巖犀斋，帶你破解...
沈念sama閱讀 216,470評(píng)論 6贊 501
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異情连，居然都是意外死亡叽粹，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,393評(píng)論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門却舀，熙熙樓的掌柜王于貴愁眉苦臉地迎上來虫几，“玉大人，你說我怎么就攤上這事挽拔×玖常” “怎么了？”我有些...
開封第一講書人閱讀 162,577評(píng)論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵螃诅，是天一觀的道長啡氢。經(jīng)常有香客問我，道長亭枷，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 58,176評(píng)論 1贊 292
?港島之戀（遺憾婚禮）
正文為了忘掉前任叨粘，我火速辦了婚禮升敲，結(jié)果婚禮上冻晤，老公的妹妹穿的比我還像新娘绸吸。我一直安慰自己，他們只是感情好攘轩，可當(dāng)我...
茶點(diǎn)故事閱讀 67,189評(píng)論 6贊 388
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布度帮。她就那樣靜靜地躺著笨篷，像睡著了一般瓣履。火紅的嫁衣襯著肌膚如雪袖迎。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,155評(píng)論 1贊 299
城市分裂傳說
那天辜贵，我揣著相機(jī)與錄音托慨，去河邊找鬼连霉。笑死嗡靡，一個(gè)胖子當(dāng)著我的面吹牛讨彼，可吹牛的內(nèi)容都是我干的哈误。我是一名探鬼主播躏嚎，決...
沈念sama閱讀 40,041評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼卢佣，長吁一口氣：“原來是場噩夢啊……” “哼虚茶！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起婆殿，我...
開封第一講書人閱讀 38,903評(píng)論 0贊 274
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤婆芦，失蹤者是張志新（化名）和其女友劉穎喂饥，沒想到半個(gè)月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體荆陆，經(jīng)...
沈念sama閱讀 45,319評(píng)論 1贊 310
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,539評(píng)論 2贊 332
?白月光啟示錄
正文我和宋清朗相戀三年棠枉，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了辈讶。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片娄猫。...
茶點(diǎn)故事閱讀 39,703評(píng)論 1贊 348
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖月幌，靈堂內(nèi)的尸體忽然破棺而出扯躺，到底是詐尸還是另有隱情，我是刑警寧澤倍啥，帶...
沈念sama閱讀 35,417評(píng)論 5贊 343
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布澎埠，位于F島的核電站，受9級(jí)特大地震影響氮趋，放射性物質(zhì)發(fā)生泄漏弟塞。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,013評(píng)論 3贊 325
男人毒藥：我在死后第九天來索命
文/蒙蒙一摧冀、第九天我趴在偏房一處隱蔽的房頂上張望索昂。院中可真熱鬧扩借，春花似錦、人聲如沸潮罪。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,664評(píng)論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽孽锥。三九已至，卻和暖如春惜辑，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背碎节。一陣腳步聲響...
開封第一講書人閱讀 32,818評(píng)論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工钓株，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留陌僵，地道東北人。一個(gè)月前我還...
沈念sama閱讀 47,711評(píng)論 2贊 368
代替公主和親
正文我出身青樓受葛，卻偏偏與公主長得像总滩，于是被迫代替她去往敵國和親巡雨。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,601評(píng)論 2贊 353

Watchdog看門狗分析

那system_server是如何使用Watchdog來為自己服務(wù)的呢玛瘸？

創(chuàng)建和初始化Watchdog

推薦閱讀更多精彩內(nèi)容