深入卡頓優(yōu)化

前言

我們經(jīng)常會遇到卡頓問題而且卡頓問題往往很難解決與復(fù)現(xiàn) 非常的依賴卡頓現(xiàn)場所以我們來深入分析一下卡頓優(yōu)化

卡頓分析方法與工具

查看CPU性能

我們可以通過/proc/stat獲得這個CPU的使用情況也可以通過/proc/[pid]/stat得到某個CPU的使用情況

卡頓排查工具

TraceView

我們可以通過TraceView直觀的查看每個方法的耗時找到不符合預(yù)期的函數(shù)調(diào)用但是TraceView可能本身開銷比較大會影響我們的判斷
Systrace

我們在布局優(yōu)化那邊已經(jīng)提到過Systrace的使用優(yōu)點是輕量級系統(tǒng)級別也有很多使用Systrace 但是我們需要過濾大部分短函數(shù)
CPU Profile

Android Studio 提供了CPU Profile 來讓我們直觀的查看CPU的使用情況
- Sample Java Methods 的功能類似于 Traceview 的 sample 類型鲸沮。
- Trace Java Methods 的功能類似于 Traceview 的 instrument 類型晋涣。
- Trace System Calls 的功能類似于 systrace短蜕。
- SampleNative (API Level 26+) 的功能類似于 Simpleperf。

StrictMode

if (BuildConfig.DEBUG) {
        StrictMode.setThreadPolicy(new StrictMode.ThreadPolicy.Builder()
                .detectCustomSlowCalls()
                .detectDiskReads()
                .detectDiskWrites()
                .detectNetwork()// or .detectAll() for all detectable problems
                .penaltyLog()
                .build());
        StrictMode.setVmPolicy(new StrictMode.VmPolicy.Builder()
                .detectLeakedSqlLiteObjects()
                .setClassInstanceLimit(NewsItem.class, 1)
                .detectLeakedClosableObjects() //API等級11
                .penaltyLog()
                .build());
    }

我們可以在Debug環(huán)境下開啟嚴(yán)苛模式系統(tǒng)會自動檢測出一些異常情況或者一些不符合預(yù)期的情況嚴(yán)苛模式主要分為兩種檢測策略

線程策略檢測一些自定義的耗時調(diào)用磁盤網(wǎng)絡(luò)io等等
虛擬機策略檢測一些數(shù)據(jù)庫調(diào)用內(nèi)存泄漏以及檢測實例數(shù)量

Profilo

Profilo是FaceBook開源的一個檢測卡頓信息的庫
它有以下幾個優(yōu)點:
1. 集成 atrace 功能
2. 快速獲取JAVA堆棧 (我們也可以參考他的捕獲方式)

線上自動化卡頓分析檢測

下面詳細講一下如何做線上自動化卡頓分析

為啥要做線上卡頓分析檢測?

我們可能會遇到一些反饋應(yīng)用體驗太卡搶購的時候卡了幾秒? 然后我們卻復(fù)現(xiàn)不出來因為用戶現(xiàn)場對卡頓很重要所以我們需要加入線上自動化卡頓分析
在上面我們已經(jīng)學(xué)習(xí)了幾種工具的使用可以方便的線下分析卡頓接下來我們會使用幾個方法來幫助我們分析卡頓

AndroidPerformanceMonitor

我們可以使用AndroidPerformanceMonitor庫來很方便檢測卡頓并且可以彈出Notification來查看卡頓堆棧

看一下使用配置

package com.dsg.androidperformance.block;

import android.content.Context;
import android.util.Log;

import com.github.moduth.blockcanary.BlockCanaryContext;
import com.github.moduth.blockcanary.internal.BlockInfo;

import java.io.File;
import java.util.LinkedList;
import java.util.List;

/**
 * @author DSG
 * @Project AndroidPerformance
 * @date 2020/7/18
 * @describe
 */
public class AppBlockCanaryContext extends BlockCanaryContext {

    /**
     * Implement in your project.
     *
     * @return Qualifier which can specify this installation, like version + flavor.
     */
    public String provideQualifier() {
        return "unknown";
    }

    /**
     * Implement in your project.
     *
     * @return user id
     */
    public String provideUid() {
        return "uid";
    }

    /**
     * Network type
     *
     * @return {@link String} like 2G, 3G, 4G, wifi, etc.
     */
    public String provideNetworkType() {
        return "unknown";
    }

    /**
     * Config monitor duration, after this time BlockCanary will stop, use
     * with {@code BlockCanary}'s isMonitorDurationEnd
     *
     * @return monitor last duration (in hour)
     */
    public int provideMonitorDuration() {
        return -1;
    }

    /**
     * Config block threshold (in millis), dispatch over this duration is regarded as a BLOCK. You may set it
     * from performance of device.
     *
     * @return threshold in mills
     */
    public int provideBlockThreshold() {
        return 500;
    }

    /**
     * Thread stack dump interval, use when block happens, BlockCanary will dump on main thread
     * stack according to current sample cycle.
     * <p>
     * Because the implementation mechanism of Looper, real dump interval would be longer than
     * the period specified here (especially when cpu is busier).
     * </p>
     *
     * @return dump interval (in millis)
     */
    public int provideDumpInterval() {
        return provideBlockThreshold();
    }

    /**
     * Path to save log, like "/blockcanary/", will save to sdcard if can.
     *
     * @return path of log files
     */
    public String providePath() {
        return "/blockcanary/";
    }

    /**
     * If need notification to notice block.
     *
     * @return true if need, else if not need.
     */
    public boolean displayNotification() {
        return true;
    }

    /**
     * Implement in your project, bundle files into a zip file.
     *
     * @param src  files before compress
     * @param dest files compressed
     * @return true if compression is successful
     */
    public boolean zip(File[] src, File dest) {
        return false;
    }

    /**
     * Implement in your project, bundled log files.
     *
     * @param zippedFile zipped file
     */
    public void upload(File zippedFile) {
        throw new UnsupportedOperationException();
    }


    /**
     * Packages that developer concern, by default it uses process name,
     * put high priority one in pre-order.
     *
     * @return null if simply concern only package with process name.
     */
    public List<String> concernPackages() {
        return null;
    }

    /**
     * Filter stack without any in concern package, used with @{code concernPackages}.
     *
     * @return true if filter, false it not.
     */
    public boolean filterNonConcernStack() {
        return false;
    }

    /**
     * Provide white list, entry in white list will not be shown in ui list.
     *
     * @return return null if you don't need white-list filter.
     */
    public List<String> provideWhiteList() {
        LinkedList<String> whiteList = new LinkedList<>();
        whiteList.add("org.chromium");
        return whiteList;
    }

    /**
     * Whether to delete files whose stack is in white list, used with white-list.
     *
     * @return true if delete, false it not.
     */
    public boolean deleteFilesInWhiteList() {
        return true;
    }

    /**
     * Block interceptor, developer may provide their own actions.
     */
    public void onBlock(Context context, BlockInfo blockInfo) {
        Log.i("main1","blockInfo "+blockInfo.toString());
    }
}

我們可以看到有很多自定義的配置項我們可以配置一些白名單不參與檢測卡頓耗時標(biāo)準(zhǔn)等等

然后需要在Application中調(diào)用BlockCanary.install(this, new AppBlockCanaryContext()).start();就完成接入

原理分析

AndroidPerformanceMonitor的原理也很簡單就是自定義了Looper對象的Printer對象在調(diào)用msg.target.dispatchMessage(msg);前后可以開啟一個延時任務(wù) 如果dispatchMessage在延時時間里完成了我們就認(rèn)為沒有發(fā)生卡頓否則就開啟子線程生成當(dāng)前堆棧信息

AndroidPerformanceMonitor源碼分析

我們主要就通過BlockCanary.install(this, new AppBlockCanaryContext()).start();方法來接入
看一下start方法

 public void start() {
        if (!mMonitorStarted) {
            mMonitorStarted = true;
            Looper.getMainLooper().setMessageLogging(mBlockCanaryCore.monitor);
        }
    }

和我們前面講的一樣會使用自定義的Printer對象來實現(xiàn) 看一下monitor對象的println方法

@Override
    public void println(String x) {
        if (mStopWhenDebugging && Debug.isDebuggerConnected()) {
            return;
        }
        if (!mPrintingStarted) {
            mStartTimestamp = System.currentTimeMillis();
            mStartThreadTimestamp = SystemClock.currentThreadTimeMillis();
            mPrintingStarted = true;
            //開啟延時任務(wù)
            startDump();
        } else {
            final long endTime = System.currentTimeMillis();
            mPrintingStarted = false;
            //是否超過阻塞時間 默認(rèn)每3000毫秒就會采集一次堆棧信息
            if (isBlock(endTime)) {
                notifyBlockEvent(endTime);
            }
            //關(guān)閉
            stopDump();
        }
    }

startDump會分別啟動堆采樣器和cpu采樣器來對任務(wù)棧進行采集我們?nèi)pu采樣器來看一下通過下面代碼我們可以發(fā)現(xiàn) 會開啟一個任務(wù)來采集堆棧

 public void start() {
        if (mShouldSample.get()) {
            return;
        }
        mShouldSample.set(true);

        HandlerThreadFactory.getTimerThreadHandler().removeCallbacks(mRunnable);
        HandlerThreadFactory.getTimerThreadHandler().postDelayed(mRunnable,
                BlockCanaryInternals.getInstance().getSampleDelay());
    }
    
long getSampleDelay() {
        return (long) (BlockCanaryInternals.getContext().provideBlockThreshold() * 0.8f);
    }

看一下如何采集cpu信息

@Override
    protected void doSample() {
        BufferedReader cpuReader = null;
        BufferedReader pidReader = null;

        try {
            cpuReader = new BufferedReader(new InputStreamReader(
                    new FileInputStream("/proc/stat")), BUFFER_SIZE);
            String cpuRate = cpuReader.readLine();
            if (cpuRate == null) {
                cpuRate = "";
            }
              
            if (mPid == 0) {
                mPid = android.os.Process.myPid();
            }
            //手機cpu信息 我們在文章開頭也講到過
            pidReader = new BufferedReader(new InputStreamReader(
                    new FileInputStream("/proc/" + mPid + "/stat")), BUFFER_SIZE);
            String pidCpuRate = pidReader.readLine();
            if (pidCpuRate == null) {
                pidCpuRate = "";
            }
              //分析cpu信息
            parse(cpuRate, pidCpuRate);
        } catch (Throwable throwable) {
            Log.e(TAG, "doSample: ", throwable);
        } finally {
            try {
                if (cpuReader != null) {
                    cpuReader.close();
                }
                if (pidReader != null) {
                    pidReader.close();
                }
            } catch (IOException exception) {
                Log.e(TAG, "doSample: ", exception);
            }
        }
    }

我們看到會查看"/proc/" + mPid + "/stat"這個文件但是這個文件在高版本上可能會沒有權(quán)限查看

如果發(fā)生卡頓就分析卡頓日志

setMonitor(new LooperMonitor(new LooperMonitor.BlockListener() {

            @Override
            public void onBlockEvent(long realTimeStart, long realTimeEnd,
                                     long threadTimeStart, long threadTimeEnd) {
                // Get recent thread-stack entries and cpu usage
                ArrayList<String> threadStackEntries = stackSampler
                        .getThreadStackEntries(realTimeStart, realTimeEnd);
                if (!threadStackEntries.isEmpty()) {
                    BlockInfo blockInfo = BlockInfo.newInstance()
                            .setMainThreadTimeCost(realTimeStart, realTimeEnd, threadTimeStart, threadTimeEnd)
                            .setCpuBusyFlag(cpuSampler.isCpuBusy(realTimeStart, realTimeEnd))
                            .setRecentCpuRate(cpuSampler.getCpuRateInfo())
                            .setThreadStackEntries(threadStackEntries)
                            .flushString();
                    LogWriter.save(blockInfo.toString());

                    if (mInterceptorChain.size() != 0) {
                    //遍歷所有攔截器 分別調(diào)用onBlock 這里會打印日志 彈出Notification 我們還會實現(xiàn)自定義卡頓手機操作
                        for (BlockInterceptor interceptor : mInterceptorChain) {
                            interceptor.onBlock(getContext().provideContext(), blockInfo);
                        }
                    }
                }
            }
        }, getContext().provideBlockThreshold(), getContext().stopWhenDebugging()));

AndroidPerformanceMonitor使用總結(jié)

使用mLogging的方式會有監(jiān)控盲區(qū)的問題所以AndroidPerformanceMonitor采用高頻采集的方式分析(每1s采集一次堆棧信息)

我們在使用這個庫的過程中還是遇到了一些問題需要我們自己去修復(fù)一下

Notification在8.0以上必須要channel id
在高版本中 /cpu/pid/stat 文件已經(jīng)沒有權(quán)限讀取了

ANR分析

ANR發(fā)生的情況比較多有幾下幾種

按鍵事件5s內(nèi)未執(zhí)行完成 KEY_DISPATCHING_TIMEOUT_MS
前臺廣播10s 后臺廣播20s未完成
前臺服務(wù)20s 后臺服務(wù)200s未完成

//AMS
static final int BROADCAST_FG_TIMEOUT = 10*1000;
static final int BROADCAST_BG_TIMEOUT = 60*1000;

//ATMS
KEY_DISPATCHING_TIMEOUT_MS

WatchDog源碼分析

當(dāng)ANR發(fā)生時系統(tǒng)收到異常終止信息寫入進程ANR信息包括當(dāng)時進程的堆棧 CPU IO等情況并且寫入/data/anr目錄下我們可以通過FileObserver監(jiān)聽這個文件變化查看是否發(fā)生ANR 但是在高版本中這個文件需要ROOT權(quán)限才可以查看

所以我們可以使用WatchDog這個庫來幫助我們分析手機ANR

這個庫的原理也比較簡單

獲取當(dāng)前線程的Handler 然后發(fā)送一個runnable runnable里面執(zhí)行的內(nèi)容就是將一個局部變量+1
等待5s后查看局部變量是否+1 如果沒有加那么就認(rèn)為發(fā)生了ANR
如果發(fā)生了ANR 就手機當(dāng)前堆棧信息并輸出log 或者執(zhí)行用戶自定義操作

來看一下源碼
ANRWatchDog繼承自 Thread 所以我們來看一下run方法

@Override
    public void run() {
         //修改線程名
        setName("|ANR-WatchDog|");

        int lastTick;
        int lastIgnored = -1;
        while (!isInterrupted()) {
            lastTick = _tick;
            //往主線程post一個任務(wù)
            _uiHandler.post(_ticker);
            try {
                //睡眠5s(默認(rèn))
                Thread.sleep(_timeoutInterval);
            }
            catch (InterruptedException e) {
                //處理中斷
                _interruptionListener.onInterrupted(e);
                return ;
            }

            // If the main thread has not handled _ticker, it is blocked. ANR.
            //如果沒變 表示發(fā)生了ANR
            if (_tick == lastTick) {
                if (!_ignoreDebugger && Debug.isDebuggerConnected()) {
                    if (_tick != lastIgnored)
                        Log.w("ANRWatchdog", "An ANR was detected but ignored because the debugger is connected (you can prevent this with setIgnoreDebugger(true))");
                    lastIgnored = _tick;
                    continue ;
                }

                ANRError error;
                if (_namePrefix != null)
                    error = ANRError.New(_namePrefix, _logThreadsWithoutStackTrace);
                else
                    error = ANRError.NewMainOnly();//獲取主線程堆棧的堆棧信息
                    //拋出異常
                _anrListener.onAppNotResponding(error);
                return;
            }
        }
    }
    
  //默認(rèn)的ANR響應(yīng)處理 直接拋出異常 所以遇到ANR直接就會閃退了
  private static final ANRListener DEFAULT_ANR_LISTENER = new ANRListener() {
        @Override public void onAppNotResponding(ANRError error) {
            throw error;
        }
    };

監(jiān)控盲區(qū)

先來解釋一下什么是監(jiān)控盲區(qū) 舉個??
假如我們認(rèn)為卡頓的閾值是2s 那么A方法中會調(diào)用B C方法 B方法耗時1.5s C方法耗時0.5s 這時候卡頓發(fā)生了我們收集信息當(dāng)前任務(wù)堆棧是C方法而不是實際的B方法也就是監(jiān)控盲區(qū)

監(jiān)控盲區(qū)線下方案

線下時我們可以直接用TraceView 直觀明了可以直接看到每個方法的耗時可以很快的定位到耗時

監(jiān)控盲區(qū)線上方案

上面我們有講過AndroidPerformanceMonitor 這個庫使用mLogging來做監(jiān)控但是只能知道系統(tǒng)當(dāng)前任務(wù)棧并不知道Message是被誰拋出

所以我們可以會使用統(tǒng)一Handler 這樣我們就可以收集sendMessageAtTime 和 dispatchMessages方法

看一下代碼

package com.optimize.performance.handler;

import android.os.Handler;
import android.os.Looper;
import android.os.Message;
import android.util.Log;

import com.optimize.performance.utils.LogUtils;

import org.json.JSONObject;

public class SuperHandler extends Handler {

    private long mStartTime = System.currentTimeMillis();

    public SuperHandler() {
        super(Looper.myLooper(), null);
    }

    public SuperHandler(Callback callback) {
        super(Looper.myLooper(), callback);
    }

    public SuperHandler(Looper looper, Callback callback) {
        super(looper, callback);
    }

    public SuperHandler(Looper looper) {
        super(looper);
    }

    @Override
    public boolean sendMessageAtTime(Message msg, long uptimeMillis) {
        boolean send = super.sendMessageAtTime(msg, uptimeMillis);
        if (send) {
                //收集message堆棧信息
            GetDetailHandlerHelper.getMsgDetail().put(msg, Log.getStackTraceString(new Throwable()).replace("java.lang.Throwable", ""));
        }
        return send;
    }

    @Override
    public void dispatchMessage(Message msg) {
        mStartTime = System.currentTimeMillis();
        super.dispatchMessage(msg);

        if (GetDetailHandlerHelper.getMsgDetail().containsKey(msg)
                && Looper.myLooper() == Looper.getMainLooper()) {
            JSONObject jsonObject = new JSONObject();
            try {
                    //收集耗時
                jsonObject.put("Msg_Cost", System.currentTimeMillis() - mStartTime);
                //收集堆棧
                jsonObject.put("MsgTrace", msg.getTarget() + " " + GetDetailHandlerHelper.getMsgDetail().get(msg));
                   //這里可以做自定義操作
                LogUtils.i("MsgDetail " + jsonObject.toString());
                GetDetailHandlerHelper.getMsgDetail().remove(msg);
            } catch (Exception e) {
            }
        }
    }

}

我們還會使用一個輔助類來存放msg對應(yīng)堆棧信息

public class GetDetailHandlerHelper {

    private static ConcurrentHashMap<Message, String> sMsgDetail = new ConcurrentHashMap<>();

    public static ConcurrentHashMap<Message, String> getMsgDetail() {
        return sMsgDetail;
    }

}

這樣我們就可以收集msg耗時和拋出msg的堆棧信息

關(guān)于全局替換Handler 我們可以使用AOP的方式來實現(xiàn) 可以使用滴滴出行的開源庫DroidAssist

image.png

可以通過替換的方式將所有Handler替換成我們的SuperHandler

總結(jié)

卡頓問題分析牽扯的知識點會比較多我們可能會學(xué)習(xí)比較吃力但是堅持下去收獲還是會很大
在分析卡頓的過程中
我們需要線下和線上同時重點關(guān)注線下使用ARTHook,第三方庫以及TraceView 盡量在實驗室環(huán)境將卡頓問題暴露出來線上使用SuperHandler和ANRWatchDog來收集卡頓和ANR信息

我們還可以通過之前講過的啟動優(yōu)化布局優(yōu)化的知識點來優(yōu)化卡頓問題可以將一些耗時操作延時或者異步執(zhí)行使用異步Inflate X2C 預(yù)加載數(shù)據(jù)減少IO等待等方法來優(yōu)化卡頓問題

但是要優(yōu)雅的優(yōu)化代碼