Android WatchDog源碼分析

watchdog 主要監(jiān)控運(yùn)行在system_server中服務(wù)的線程始绍，比如ams隐孽，一旦發(fā)現(xiàn)阻塞將輸出調(diào)用棧信息，甚至重啟system_server進(jìn)程

1. watchdog框架圖

watchdog.png

a. watchdog 繼承自Thread
b. HandlerChecker作為watchdog的內(nèi)部類，實(shí)現(xiàn)Runnable接口飒责，用于檢查保存Handler線程的狀態(tài)惠赫、回調(diào)監(jiān)視器方法把鉴；
c. RebootRequestReceiver 監(jiān)聽(tīng)重啟廣播(Intent.ACTION_REBOOT)；當(dāng)收到廣播后儿咱，調(diào)用PowerMS重啟系統(tǒng)庭砍；
d. BinderThreadMonitor 監(jiān)視binder線程是否可用；
監(jiān)視器將阻塞直到有一個(gè)可用的binder線程處理來(lái)自IPC的請(qǐng)求混埠，目的是為了確保其他進(jìn)程可以與服務(wù)通信怠缸；

2. watchdog 初始化過(guò)程

2.1 watchdog啟動(dòng)

#SystemServer.java
private void startOtherServices() {
        ...
        mSystemServiceManager.startBootPhase(SystemService.PHASE_WAIT_FOR_DEFAULT_DISPLAY);
        ...
        final Watchdog watchdog = Watchdog.getInstance();
        watchdog.init(context, mActivityManagerService); // 注冊(cè)RebootRequestReceiver
        ...
        mActivityManagerService.systemReady(new Runnable() {
            @Override
            public void run() {
                Slog.i(TAG, "Making services ready");
                mSystemServiceManager.startBootPhase(
                        SystemService.PHASE_ACTIVITY_MANAGER_READY);
                ...
                startSystemUi(context); // 啟動(dòng)SystemUI
                ...
                Watchdog.getInstance().start(); // 啟動(dòng)watchdog線程
                ...
                mSystemServiceManager.startBootPhase(
                        SystemService.PHASE_THIRD_PARTY_APPS_CAN_START);
                ...
            }
        }
}

2.2 watchdog構(gòu)造函數(shù)初始化

static final long DEFAULT_TIMEOUT = DB ? 10*1000 : 60*1000; // 判斷阻塞默認(rèn)時(shí)間60s

private Watchdog() {
    // The shared foreground thread is the main checker.  It is where we
    // will also dispatch monitor checks and do other work.
    mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
            "foreground thread", DEFAULT_TIMEOUT);
    mHandlerCheckers.add(mMonitorChecker);
    // Add checker for main thread.  We only do a quick check since there
    // can be UI running on the thread.
    mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
            "main thread", DEFAULT_TIMEOUT));
    // Add checker for shared UI thread.
    mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
            "ui thread", DEFAULT_TIMEOUT));
    // And also check IO thread.
    mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
            "i/o thread", DEFAULT_TIMEOUT));
    // And the display thread.
    mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
            "display thread", DEFAULT_TIMEOUT));
    // Initialize monitor for Binder threads.
    addMonitor(new BinderThreadMonitor());
}

此階段主要添加監(jiān)聽(tīng)的線程前臺(tái)（foreground thread）、主線程（main thread）钳宪、UI線程（ui thread）揭北、IO線程（i/o thread）、Disaplay（display thread）以及Binder（BinderThreadMonitor）到mHandlerCheckers吏颖。

3. WatchDog 監(jiān)聽(tīng)機(jī)制

線程運(yùn)行在SystemServer中

public void run() {
    boolean waitedHalf = false;
    boolean mSFHang = false;
    while (true) {
        final ArrayList<HandlerChecker> blockedCheckers;
        String subject;
        mSFHang = false;
        final boolean allowRestart;
        synchronized (this) {
            long timeout = CHECK_INTERVAL; // (DB ? 10*1000 : 60*1000) /2
            long SFHangTime;
            for (int i=0; i<mHandlerCheckers.size(); i++) {
                HandlerChecker hc = mHandlerCheckers.get(i);
                hc.scheduleCheckLocked(); //3.1  向watchdog 監(jiān)控的線程的Handler 發(fā)送消息
            }
            long start = SystemClock.uptimeMillis();
            while (timeout > 0) { // 等待30s在向下執(zhí)行
                try {
                    wait(timeout); 
                } catch (InterruptedException e) {
                    Log.wtf(TAG, e);
                }
                timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start); // ?> 30s
            }
            
            final int waitState = evaluateCheckerCompletionLocked(); // 3.3 計(jì)算HandlerChecker的狀態(tài)
            if (waitState == COMPLETED) {
                // The monitors have returned; reset
                continue;
            } else if (waitState == WAITING) {
                // still waiting but within their configured intervals; back off and recheck
                continue;
            } else if (waitState == WAITED_HALF) {
                if (!waitedHalf) { // 首次阻塞超過(guò)30s
                    // We've waited half the deadlock-detection interval.  Pull a stack
                    // trace and wait another half.
                ArrayList<Integer> pids = new ArrayList<Integer>();
                pids.add(Process.myPid());
                ActivityManagerService.dumpStackTraces(true, pids, null, null,
                        NATIVE_STACKS_OF_INTEREST); // 第一次 打印調(diào)用棧
                    waitedHalf = true;
                }
                continue;
            }
            blockedCheckers = getBlockedCheckersLocked(); // 3.5 獲取超時(shí)(debug?10s:60s)的線程隊(duì)列
            subject = describeCheckersLocked(blockedCheckers); // 3.7  獲取描述信息
            allowRestart = mAllowRestart;
        }

        // If we got here, that means that the system is most likely hung.
        // First collect stack traces from all threads of the system process.
        // Then kill this process so that the system will restart.
        // 系統(tǒng)可能掛機(jī)搔体，手機(jī)棧信息，然后重啟system_server
        ArrayList<Integer> pids = new ArrayList<Integer>();
        pids.add(Process.myPid());
        if (mPhonePid > 0) pids.add(mPhonePid);
        // Pass !waitedHalf so that just in case we somehow wind up here without having
        // dumped the halfway stacks, we properly re-initialize the trace file.
        final File stack = ActivityManagerService.dumpStackTraces(
                !waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);
        // 打印NATIVE_STACKS_OF_INTEREST native 進(jìn)程棧信息
       /*
    // Which native processes to dump into dropbox's stack traces
    public static final String[] NATIVE_STACKS_OF_INTEREST = new String[] {
        "/system/bin/audioserver",
        "/system/bin/cameraserver",
        "/system/bin/drmserver",
        "/system/bin/mediadrmserver",
        "/system/bin/mediaserver",
        "/system/bin/sdcard",
        "/system/bin/surfaceflinger",
        "media.codec",     // system/bin/mediacodec
        "media.extractor", // system/bin/mediaextractor
        "com.android.bluetooth",  // Bluetooth service
        };
        */        

        // Give some extra time to make sure the stack traces get written.
        // The system's been hanging for a minute, another second or two won't hurt much.
        SystemClock.sleep(2000); // 確保棧信息打印完成


        // Pull our own kernel thread stacks as well if we're configured for that
        if (RECORD_KERNEL_THREADS) {
            dumpKernelStackTraces();
        }

        // Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log
        // dump kernel中線程阻塞的信息
        doSysRq('w');
        doSysRq('l');

        // Try to add the error to the dropbox, but assuming that the ActivityManager
        // itself may be deadlocked.  (which has happened, causing this statement to
        // deadlock and the watchdog as a whole to be ineffective)
        Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
                public void run() {
                    mActivity.addErrorToDropBox(
                            "watchdog", null, "system_server", null, null,
                            name, null, stack, null);
                }
            };
        dropboxThread.start();
        
        Slog.v(TAG, "** save all info before killnig system server **");
        mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null, subject, null, null, null);
        
        
            if ((mSFHang == false) && (controller != null)) {
                Slog.i(TAG, "Reporting stuck state to activity controller");
                try {
                    Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
                    Slog.i(TAG, "Binder.setDumpDisabled");
                    // 1 = keep waiting, -1 = kill system
                    int res = controller.systemNotResponding(subject);
                    if (res >= 0) {
                        Slog.i(TAG, "Activity controller requested to coninue to wait");
                        waitedHalf = false;
                        continue;
                    }
                    Slog.i(TAG, "Activity controller requested to reboot");
                } catch (RemoteException e) {
                }
            }

            // Only kill the process if the debugger is not attached.
            if (Debug.isDebuggerConnected()) {
                debuggerWasConnected = 2;
            }
            if (debuggerWasConnected >= 2) {
                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
            } else if (debuggerWasConnected > 0) {
                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
            } else if (!allowRestart) {
                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
            } else {
                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
                // 打印每一個(gè)超時(shí)線程的棧信息
                for (int i=0; i<blockedCheckers.size(); i++) {
                    Slog.w(TAG, blockedCheckers.get(i).getName() + " stack trace:");
                    StackTraceElement[] stackTrace
                            = blockedCheckers.get(i).getThread().getStackTrace();
                    for (StackTraceElement element: stackTrace) {
                        Slog.w(TAG, "    at " + element);
                    }
                }
                Slog.w(TAG, "*** GOODBYE!");
                Process.killProcess(Process.myPid()); // kill system_server
                System.exit(10);
            }
    }
}

3.1 HandlerChecker::scheduleCheckLocked

public void scheduleCheckLocked() {
    if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
        // If the target looper has recently been polling, then
        // there is no reason to enqueue our checker on it since that
        // is as good as it not being deadlocked.  This avoid having
        // to do a context switch to check the thread.  Note that we
        // only do this if mCheckReboot is false and we have no
        // monitors, since those would need to be executed at this point.
        mCompleted = true;
        return;
    }

    if (!mCompleted) {
        // we already have a check in flight, so no need
        // 同一時(shí)刻半醉，只允許一個(gè)任務(wù)
        return;
    }

    mCompleted = false; // 任務(wù)未完成
    mCurrentMonitor = null;
    mStartTime = SystemClock.uptimeMillis(); // 記錄開(kāi)始時(shí)間
    mHandler.postAtFrontOfQueue(this); // 將消息發(fā)送到監(jiān)聽(tīng)線程的MQ, 將調(diào)用HandlerChecker的run方法處理
}

3.2 HandlerChecker::run

public void run() {
    final int size = mMonitors.size();
    for (int i = 0 ; i < size ; i++) {
        synchronized (Watchdog.this) {
            mCurrentMonitor = mMonitors.get(i);
        }
        mCurrentMonitor.monitor(); // 調(diào)用具體服務(wù)的monitor方法
    }

    synchronized (Watchdog.this) {
        mCompleted = true;
        mCurrentMonitor = null;
    }
}

3.2.1 AMS::monitor

/** In this method we try to acquire our lock to make sure that we have not deadlocked */
public void monitor() {
    synchronized (this) { }
}

通過(guò)嘗試獲取鎖判斷是否發(fā)生死鎖疚俱；

3.3 WatchDog::evaluateCheckerCompletionLocked

private int evaluateCheckerCompletionLocked() {
    int state = COMPLETED;
    for (int i=0; i<mHandlerCheckers.size(); i++) {
        HandlerChecker hc = mHandlerCheckers.get(i);
        state = Math.max(state, hc.getCompletionStateLocked()); // 獲取狀態(tài)
        // state: 取watchdog所監(jiān)控線程中 最大的state
    }
    return state;
}

a. 當(dāng)COMPLETED或WAITING,則不處理;
b. 當(dāng)WAITED_HALF(超過(guò)30s)且為首次, 則第一次輸出system_server 棧信息;
c. 當(dāng)OVERDUE, 則輸出更多信息（AMS::dumpStackTraces、Kernel::dumpKernelStackTraces缩多、dropbox信息）

3.4 WatchDog::getCompletionStateLocked

public int getCompletionStateLocked() {
    if (mCompleted) {
        return COMPLETED;
    } else {
        long latency = SystemClock.uptimeMillis() - mStartTime; // 檢測(cè)monitor的消息添加到Handler時(shí)間 mStartTime
        if (latency < mWaitMax/2) { // 0~30s
            return WAITING;
        } else if (latency < mWaitMax) { // 30~60s
            return WAITED_HALF;
        }
    }
    return OVERDUE; // > 60s
}

3.5 獲取阻塞的Checkers

private ArrayList<HandlerChecker> getBlockedCheckersLocked() {
    ArrayList<HandlerChecker> checkers = new ArrayList<HandlerChecker>();
    for (int i=0; i<mHandlerCheckers.size(); i++) {
        HandlerChecker hc = mHandlerCheckers.get(i);
        if (hc.isOverdueLocked()) {
            checkers.add(hc);
        }
    }
    return checkers;
}

3.6 WatchDog::isOverdueLocked 超時(shí)判斷

public boolean isOverdueLocked() {
    // mWaitMax = (DEFAULT_TIMEOUT = DB ? 10*1000 : 60*1000)
    return (!mCompleted) && (SystemClock.uptimeMillis() > mStartTime + mWaitMax);
}

3.7 WatchDog::describeCheckersLocked

private String describeCheckersLocked(ArrayList<HandlerChecker> checkers) {
    StringBuilder builder = new StringBuilder(128);
    for (int i=0; i<checkers.size(); i++) {
        if (builder.length() > 0) {
            builder.append(", ");
        }
        builder.append(checkers.get(i).describeBlockedStateLocked()); // 所有阻塞HandlerChecker的信息
    }
    return builder.toString();
}

3.8 HandlerChecker::describeBlockedStateLocked

public String describeBlockedStateLocked() {
    if (mCurrentMonitor == null) { // 非前臺(tái)
        return "Blocked in handler on " + mName + " (" + getThread().getName() + ")";
    } else { // mMonitorChecker(foreground thread)
        return "Blocked in monitor " + mCurrentMonitor.getClass().getName()
                + " on " + mName + " (" + getThread().getName() + ")";
    }
}

4. 定制watchdog贝艮龋活進(jìn)程

定制watchdog生命周期依賴原生watchdog
a. watchdog調(diào)用systemReady時(shí)夯尽，啟動(dòng)線程并處于等待狀態(tài)wait
b. 在定制的watchdog提前設(shè)定好要啟動(dòng)線程intent
監(jiān)聽(tīng)進(jìn)程生命周期
a. ams::startProcessLocked
b. ams::handleAppDiedLocked
重啟
a. 當(dāng)AMS通過(guò)handleAppDiedLocked通知線程死亡，則在1啟動(dòng)線程的隊(duì)列中添加任務(wù)登馒，喚醒（notifyAll)watchdog 重啟進(jìn)程匙握；

最后編輯于：2017.12.10 21:43:30

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市陈轿，隨后出現(xiàn)的幾起案子圈纺，更是在濱河造成了極大的恐慌，老刑警劉巖麦射，帶你破解...
沈念sama閱讀 206,378評(píng)論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件蛾娶，死亡現(xiàn)場(chǎng)離奇詭異，居然都是意外死亡潜秋，警方通過(guò)查閱死者的電腦和手機(jī)蛔琅，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,356評(píng)論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)峻呛，“玉大人罗售，你說(shuō)我怎么就攤上這事」呈觯” “怎么了寨躁？”我有些...
開(kāi)封第一講書人閱讀 152,702評(píng)論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長(zhǎng)牙勘。經(jīng)常有香客問(wèn)我职恳，道長(zhǎng)，這世上最難降的妖魔是什么方面？我笑而不...
開(kāi)封第一講書人閱讀 55,259評(píng)論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任放钦，我火速辦了婚禮，結(jié)果婚禮上恭金，老公的妹妹穿的比我還像新娘操禀。我一直安慰自己，他們只是感情好蔚叨，可當(dāng)我...
茶點(diǎn)故事閱讀 64,263評(píng)論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布床蜘。她就那樣靜靜地躺著，像睡著了一般蔑水。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上扬蕊，一...
開(kāi)封第一講書人閱讀 49,036評(píng)論 1贊 285
城市分裂傳說(shuō)
那天搀别，我揣著相機(jī)與錄音，去河邊找鬼尾抑。笑死歇父，一個(gè)胖子當(dāng)著我的面吹牛蒂培，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播榜苫，決...
沈念sama閱讀 38,349評(píng)論 3贊 400
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼护戳，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼！你這毒婦竟也來(lái)了垂睬？” 一聲冷哼從身側(cè)響起媳荒，我...
開(kāi)封第一講書人閱讀 36,979評(píng)論 0贊 259
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎驹饺，沒(méi)想到半個(gè)月后钳枕，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 43,469評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡赏壹，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,938評(píng)論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年鱼炒，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片蝌借。...
茶點(diǎn)故事閱讀 38,059評(píng)論 1贊 333
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡昔瞧，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出菩佑，到底是詐尸還是另有隱情硬爆，我是刑警寧澤，帶...
沈念sama閱讀 33,703評(píng)論 4贊 323
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布擎鸠，位于F島的核電站缀磕，受9級(jí)特大地震影響，放射性物質(zhì)發(fā)生泄漏劣光。R本人自食惡果不足惜袜蚕，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,257評(píng)論 3贊 307
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望绢涡。院中可真熱鬧牲剃，春花似錦、人聲如沸雄可。這莊子的主人今日做“春日...
開(kāi)封第一講書人閱讀 30,262評(píng)論 0贊 19
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)数苫。三九已至聪舒，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間虐急，已是汗流浹背箱残。一陣腳步聲響...
開(kāi)封第一講書人閱讀 31,485評(píng)論 1贊 262
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人被辑。一個(gè)月前我還...
沈念sama閱讀 45,501評(píng)論 2贊 354
代替公主和親
正文我出身青樓燎悍，卻偏偏與公主長(zhǎng)得像，于是被迫代替她去往敵國(guó)和親盼理。傳聞我的和親對(duì)象是個(gè)殘疾皇子谈山，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,792評(píng)論 2贊 345