Watchdog的作用是監(jiān)控系統(tǒng)服務(wù)有沒(méi)有卡住润歉,如果有模狭,就會(huì)重啟系統(tǒng)服務(wù)。
1. Watchdog的啟動(dòng)
SystemServer中創(chuàng)建Watchdog踩衩,并讓它的run跑起來(lái):
final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
...
Watchdog.getInstance().start();
2. 監(jiān)控服務(wù)
2.1 注冊(cè)監(jiān)控服務(wù)
系統(tǒng)服務(wù)首先會(huì)向Watchdog注冊(cè)自己
Watchdog.getInstance().addMonitor(this);
Watchdog.getInstance().addThread(mHandler);
注冊(cè)的內(nèi)容會(huì)放入Watchdog:
public void addMonitor(Monitor monitor) {
// 將monitor對(duì)象添加到Monitor Checker中嚼鹉,
// 在Watchdog初始化時(shí)贩汉,可以看到Monitor Checker本身也是一個(gè)HandlerChecker對(duì)象
mMonitors.add(monitor);
}
public void addThread(Handler thread, long timeoutMillis) {
synchronized (this) {
if (isAlive()) {
throw new RuntimeException("Threads can't be added once the Watchdog is running");
}
final String name = thread.getLooper().getThread().getName();
// 為Handler構(gòu)建一個(gè)HandlerChecker對(duì)象,其實(shí)就是**Looper Checker**
mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis));
}
HandlerChecker會(huì)調(diào)用Monitor檢測(cè)服務(wù)狀態(tài)锚赤,同時(shí)根據(jù)檢測(cè)狀態(tài)做下一步處理匹舞。
2.2 Watchdog的監(jiān)控方式
Watchdog會(huì)在自己的run()方法中不斷進(jìn)行監(jiān)控:
public void run() {
while (true) {
synchronized (this) {
long timeout = CHECK_INTERVAL;
// 1. 開(kāi)始監(jiān)控
// Make sure we (re)spin the checkers that have become idle within
// this wait-and-check interval
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
hc.scheduleCheckLocked();
}
// 2. 給監(jiān)控線程一點(diǎn)時(shí)間(30s)
long start = SystemClock.uptimeMillis();
while (timeout > 0) {
...
try {
wait(timeout);
} catch (InterruptedException e) {
Log.wtf(TAG, e);
}
...
timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
}
// 3. 檢查HandlerChecker的完成狀態(tài)
final int waitState = evaluateCheckerCompletionLocked();
if (waitState == COMPLETED) {
...
continue;
} else if (waitState == WAITING) {
...
continue;
} else if (waitState == WAITED_HALF) {
...
continue;
}
// 4. 存在超時(shí)的HandlerChecker
blockedCheckers = getBlockedCheckersLocked();
subject = describeCheckersLocked(blockedCheckers);
allowRestart = mAllowRestart;
}
...
// 5. 保存日志,判斷是否需要?dú)⒌粝到y(tǒng)進(jìn)程
Slog.w(TAG, "*** GOODBYE!");
Process.killProcess(Process.myPid());
2.3 監(jiān)控服務(wù)時(shí)做了什么
HandlerChecker的scheduleCheckLocked用于發(fā)布一個(gè)HandlerChecker自己的run的執(zhí)行任務(wù):
public void scheduleCheckLocked() {
...
mStartTime = SystemClock.uptimeMillis();
mHandler.postAtFrontOfQueue(this);
}
執(zhí)行到被監(jiān)控的服務(wù)的monitor():
public void run() {
final int size = mMonitors.size();
for (int i = 0 ; i < size ; i++) {
synchronized (Watchdog.this) {
mCurrentMonitor = mMonitors.get(i);
}
mCurrentMonitor.monitor();
}
// 這里mCompleted置true线脚。如果死鎖或者等待超時(shí)沒(méi)來(lái)得及置true赐稽,會(huì)在檢測(cè)時(shí)被認(rèn)為服務(wù)出現(xiàn)了問(wèn)題
synchronized (Watchdog.this) {
mCompleted = true;
mCurrentMonitor = null;
}
}
服務(wù)的monitor其實(shí)就是嘗試獲取一下鎖:
public void monitor() {
synchronized (this) { }
}
為什么獲取鎖就可以監(jiān)控服務(wù)狀態(tài)?因?yàn)橄到y(tǒng)服務(wù)會(huì)被很多客戶端調(diào)用浑侥,需要處理多線程姊舵,就必然會(huì)用到鎖。如果出現(xiàn)死鎖寓落,或者某次調(diào)用卡住了占著鎖不放括丁,Watchdog就獲取不到鎖,就可以認(rèn)為服務(wù)出現(xiàn)了異常伶选。
2.3 檢查狀態(tài)
通過(guò)服務(wù)注冊(cè)的Handler(存在Watchdog的mHandlerCheckers中)史飞,來(lái)判斷服務(wù)的狀態(tài)。主要就是看Monitor方法是否順利完成仰税,若沒(méi)有完成就計(jì)算耗時(shí)情況构资。
private int evaluateCheckerCompletionLocked() {
int state = COMPLETED;
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
state = Math.max(state, hc.getCompletionStateLocked());
}
return state;
}
public int getCompletionStateLocked() {
if (mCompleted) {
return COMPLETED;
} else {
long latency = SystemClock.uptimeMillis() - mStartTime;
if (latency < mWaitMax/2) {
return WAITING;
} else if (latency < mWaitMax) {
return WAITED_HALF;
}
}
return OVERDUE;
}
2.4 異常處理
出現(xiàn)異常有兩種處理:
1,殺異常的進(jìn)程
Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: XXX
Watchdog: XXX
Watchdog: "*** GOODBYE!
2肖卧,重啟
Rebooting system because:xxx