1.基本介紹
Google在Android 8.0加入該新功能份殿,稱之為rescue party救援程序。
主要監(jiān)控系統(tǒng)核心程序出現(xiàn)循環(huán)崩潰的時候,會啟動該程序德召,根據(jù)不同的救援級別做出一系列操作辑舷,看是否可恢復(fù)設(shè)備喻犁,最嚴(yán)重的時候則是通過進(jìn)入recovery然后提供用戶清空用戶數(shù)據(jù)恢復(fù)出廠設(shè)置解決。
代碼:
frameworks\base\services\core\java\com\android\server\RescueParty.java
1.級別
private static final int LEVEL_NONE = 0;
private static final intLEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS = 1;
private static final intLEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES = 2;
private static final intLEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS = 3;
private static final intLEVEL_FACTORY_RESET = 4;
2.觸發(fā)場景:
(1)system_server 在 5 分鐘內(nèi)重啟 5 次以上調(diào)整一次級別何缓。
(2)永久性系統(tǒng)應(yīng)用在 30 秒內(nèi)崩潰 5 次以上調(diào)整一次級別肢础。
2.分析
Threshold?
類 Threshold :這個類主要實(shí)現(xiàn)對監(jiān)控進(jìn)程的崩潰次數(shù)的計數(shù)邏輯,每監(jiān)控一個進(jìn)程則實(shí)例化一個對應(yīng)的對象碌廓,進(jìn)程標(biāo)識為uid传轰。
主要變量:
private final int uid;監(jiān)控進(jìn)程的uid
private final int triggerCount; 監(jiān)控進(jìn)程崩潰次數(shù)
private final long triggerWindow;監(jiān)控進(jìn)程對應(yīng)的時間邊界
主要方法:
public abstract int getCount();獲取崩潰次數(shù)
public abstract void setCount(int count);設(shè)置更新后的崩潰次數(shù)
public abstract long getStart();獲取該統(tǒng)計周期的起始時間
public abstract void setStart(long start);設(shè)置該統(tǒng)計周期的起始時間
public void reset() {重置崩潰次數(shù)和起始時間
? setCount(0);
? setStart(0);
}
public boolean incrementAndTest() {//通過調(diào)用這個函數(shù)實(shí)現(xiàn)崩潰次數(shù)更新和判斷是否超出該周期內(nèi)邊界時間限制
? finallong now = SystemClock.elapsedRealtime();//獲取當(dāng)前系統(tǒng)時間
? finallong window = now - getStart();//第一次的時候因?yàn)間etstart為0,所以都會大于triggerWindow谷婆,之后則通過window判斷目標(biāo)進(jìn)程是否已經(jīng)超出該周期的邊界時間限制慨蛙。
? if(window > triggerWindow) {//時間超出限制,開啟新統(tǒng)計周期
???????? setCount(1);
???????? setStart(now);
???????? returnfalse;
? }else {
???????? intcount = getCount() + 1;//崩潰統(tǒng)計次數(shù)加1
???????? setCount(count);
???????? EventLogTags.writeRescueNote(uid,count, window);
???????? Slog.w(TAG,"Noticed " + count + " events for UID " + uid + " inlast "
?????????????????????? +(window / 1000) + " sec");
???????? return(count >= triggerCount);//當(dāng)崩潰次數(shù)等于或者大于5次纪挎,返回true
? }
}
前文提到該救援程序主要實(shí)現(xiàn)對system_server和常駐進(jìn)程監(jiān)控期贫,這里分開分析
system_server進(jìn)程監(jiān)控
首先說下類BootThreshold繼承了Threshold
幾個需要說明的點(diǎn)
(1)監(jiān)控uid為android.os.Process.ROOT_UID =0,即zygote 進(jìn)程廷区,因?yàn)閟ystem_server 重啟必然導(dǎo)致zygote重啟
????triggerCount = 5
? ?????? triggerWindow = 300 *DateUtils.SECOND_IN_MILLIS
? 構(gòu)造函數(shù):
? publicBootThreshold() {
???????super(android.os.Process.ROOT_UID, 5, 300 * DateUtils.SECOND_IN_MILLIS);
?}
綜上:統(tǒng)計周期時間邊界為300s即5分鐘唯灵,次數(shù)限制5次
System_server重啟次數(shù)和周期起始時間寫入Settingsprovide
統(tǒng)計次數(shù)對應(yīng)的鍵值???? private static final StringPROP_RESCUE_BOOT_COUNT = "sys.rescue_boot_count";
統(tǒng)計周期的起始時間對應(yīng)的鍵值private static final String PROP_RESCUE_BOOT_START ="sys.rescue_boot_start";
預(yù)編譯的時候就實(shí)例BootThreshold給對象sBoot
private static final Threshold sBoot = newBootThreshold();
監(jiān)控方法,在system_server每次啟動過程中有如下調(diào)用
SystemServer.startBootstrapServices
?==>RescueParty.noteBoot(mSystemContext);
?public static void noteBoot(Context context) {
? if(isDisabled()) return;
? if(sBoot.incrementAndTest()) {//如果5分鐘內(nèi)崩潰次數(shù)等于5次隙轻,則為true
???????? sBoot.reset();//首先重置統(tǒng)計信息
???????? incrementRescueLevel(sBoot.uid);//調(diào)整system_server的救援等級
???????? executeRescueLevel(context);//執(zhí)行救援操作
? }
}
private static voidincrementRescueLevel(int triggerUid)
???//每調(diào)用一次埠帕,救援等級+1,救援等級被寫入到SettingsProvide的"sys.rescue_level" 鍵值對中保存玖绿,默認(rèn)為LEVEL_NONE敛瓷,最高級別為LEVEL_FACTORY_RESET
? finalint level = MathUtils.constrain(
??????????????? SystemProperties.getInt(PROP_RESCUE_LEVEL,LEVEL_NONE) + 1,
??????????????? LEVEL_NONE,LEVEL_FACTORY_RESET);
? SystemProperties.set(PROP_RESCUE_LEVEL,Integer.toString(level));
? EventLogTags.writeRescueLevel(level,triggerUid);
? //調(diào)用PKMS的接口logCriticalInfo,寫入等級更新的log斑匪,并保存在PKMS的log信息記錄文件中呐籽,目錄/data/system/uiderrors.txt
? PackageManagerService.logCriticalInfo(Log.WARN,"Incremented rescue level to "
??????????????? +levelToString(level) + " triggered by UID " + triggerUid);
}
private static voidexecuteRescueLevel(Context context) {
? finalint level = SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE);//獲取救援等級
? if(level == LEVEL_NONE) return;
? Slog.w(TAG,"Attempting rescue level " + levelToString(level));
? try{
???????? executeRescueLevelInternal(context,level);//根據(jù)不同等級執(zhí)行相關(guān)救援操作
???????? EventLogTags.writeRescueSuccess(level);
???????? PackageManagerService.logCriticalInfo(Log.DEBUG,
?????????????????????? "Finishedrescue level " + levelToString(level));//寫入log到uiderrors.txt
? }catch (Throwable t) {
???????? finalString msg = ExceptionUtils.getCompleteMessage(t);
???????? EventLogTags.writeRescueFailure(level,msg);
???????? PackageManagerService.logCriticalInfo(Log.ERROR,
?????????????????????? "Failedrescue level " + levelToString(level) + ": " + msg);
? }
}
private static voidexecuteRescueLevelInternal(Context context, int level) throws Exception {
? switch(level) {
? ??? 救援等級1-3通過更深入的重置Setting屬性設(shè)置來實(shí)現(xiàn),4等級即最高等級通過進(jìn)入recovery,讓客戶重置data分區(qū)實(shí)現(xiàn)狡蝶。
???????? caseLEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS:
??????????????? resetAllSettings(context,Settings.RESET_MODE_UNTRUSTED_DEFAULTS);//主要針對非系統(tǒng)進(jìn)程的屬性設(shè)置進(jìn)行重置
??????????????? break;
???????? caseLEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES:
??????????????? resetAllSettings(context,Settings.RESET_MODE_UNTRUSTED_CHANGES);//針對非系統(tǒng)進(jìn)程屬性庶橱,來自系統(tǒng)默認(rèn)的屬性重置,其他刪除
??????????????? break;
???????? caseLEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS:
??????????????? resetAllSettings(context,Settings.RESET_MODE_TRUSTED_DEFAULTS);//所有進(jìn)程系統(tǒng)默認(rèn)的屬性重置贪惹,其他刪除
??????????????? break;
???????? caseLEVEL_FACTORY_RESET://進(jìn)入recovery
??????????????? RecoverySystem.rebootPromptAndWipeUserData(context,TAG);//進(jìn)recovery
??????????????? break;
? }
}
private static voidresetAllSettings(Context context, int mode) throws Exception {
? Exceptionres = null;
? finalContentResolver resolver = context.getContentResolver();
? try{//重置系統(tǒng)級Setting 設(shè)置
???????? Settings.Global.resetToDefaultsAsUser(resolver,null, mode, UserHandle.USER_SYSTEM);
? }catch (Throwable t) {
???????? res= new RuntimeException("Failed to reset global settings", t);
? }
? for(int userId : getAllUserIds()) {//多用戶的時候苏章,所有用戶的Setting設(shè)置都要重置
???????? try{
??????????????? Settings.Secure.resetToDefaultsAsUser(resolver,null, mode, userId);
???????? }catch (Throwable t) {
??????????????? res= new RuntimeException("Failed to reset secure settings for " +userId, t);
???????? }
? }
? if(res != null) {
???????? throwres;
? }
}
常駐進(jìn)程崩潰
AppThreshold 繼承Threshold,主要實(shí)現(xiàn)對常駐應(yīng)用進(jìn)程的監(jiān)控
幾個需要說明的點(diǎn)
(1)監(jiān)控uid為傳入崩潰的應(yīng)用uid
????triggerCount = 5
? ?triggerWindow = 30 *DateUtils.SECOND_IN_MILLIS
? ?綜上:統(tǒng)計周期時間邊界為30s奏瞬,次數(shù)限制5次
? publicAppThreshold(int uid) {
???????? super(uid,5, 30 * DateUtils.SECOND_IN_MILLIS);
? }
次數(shù)和周期統(tǒng)計交給對象自己的變量count和start保存
區(qū)別于system_server重啟的監(jiān)控枫绅,應(yīng)用進(jìn)程比較多,建立一個array列表去保存uid 和匹配的AppThreshold對象硼端。
private static SparseArray<Threshold>sApps = new SparseArray<>();
當(dāng)應(yīng)用進(jìn)程出現(xiàn)Crash的時候并淋,都會回調(diào)到AMS,AMS調(diào)用appErrors.crashApplicationInner方法珍昨,這個方法里面有如下邏輯
ProcessRecord r
if (r != null && r.persistent) {//當(dāng)前Crash的進(jìn)程是否是常駐進(jìn)程县耽,是的話進(jìn)入并傳入uid
? RescueParty.notePersistentAppCrash(mContext,r.uid);
}
public static voidnotePersistentAppCrash(Context context, int uid) {
? if(isDisabled()) return;
? //為每一個崩潰過的常駐進(jìn)程實(shí)例化一個AppThreshold,并放在sApps保存
? Thresholdt = sApps.get(uid);
? if(t == null) {
???????? t= new AppThreshold(uid);
???????? sApps.put(uid,t);
? }
? 然后通過uid匹配獲取的AppThreshold進(jìn)行計數(shù)統(tǒng)計等操作曼尊,詳情同上文酬诀,不再贅述。
? if(t.incrementAndTest()) {
???????? t.reset();
???????? incrementRescueLevel(t.uid);
???????? executeRescueLevel(context);
? }
}
禁止場景
(1)PROP_ENABLE_RESCUE屬性值為false骆撇,并且PROP_DISABLE_RESCUE為true
(2)eng版本下
(3)手機(jī)連接usb模式
private static boolean isDisabled() {
? if(SystemProperties.getBoolean(PROP_ENABLE_RESCUE, false)) {
???????? returnfalse;
? }
//是否為eng版本
? if(Build.IS_ENG) {
???????? Slog.v(TAG,"Disabled because of eng build");
???????? returntrue;
? }
//是否有連接usb
? if(Build.IS_USERDEBUG && isUsbActive()) {
???????? Slog.v(TAG,"Disabled because of active USB connection");
???????? returntrue;
? }
? if(SystemProperties.getBoolean(PROP_DISABLE_RESCUE, false)) {
???????? Slog.v(TAG,"Disabled because of manual property");
???????? returntrue;
? }
? returnfalse;
}
其他場景
SettingProvide public的時候也會更新一次救援級別
/frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
installSystemProviders()->RescueParty.onSettingsProviderPublished(mContext);
???public static void onSettingsProviderPublished(Context context) {
???????executeRescueLevel(context);
??? }
服務(wù)初始化
voidcrashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,
intcallingPid,intcallingUid) {
瞒御。。神郊。
// If a persistent app is stuck in a crash loop, the device isn't very
// usable, so we want to consider sending out a rescue party.
if(r !=null&& r.persistent) {
RescueParty.notePersistentAppCrash(mContext, r.uid);
}
AppErrorResult result =newAppErrorResult();
TaskRecord task;
}
處理方式:
代碼路徑如下:
? ? /frameworks/base/services/core/java/com/android/server/RescueParty.java
? ? 關(guān)閉可以直接
? ? ? ? private static boolean isDisabled() {
? ? ? ? ? ? return true;
? ? ? ? ? ? ....
? ? ? ? }
? ? ? ? 進(jìn)入recovery 的命令:
? ? ? private static void executeRescueLevelInternal(Context context, int level) throws Exception {
? ? ? ? ? ? ....
? ? ? ? case LEVEL_FACTORY_RESET:
RecoverySystem.rebootPromptAndWipeUserData(context,TAG);
break;
}