背景
我遇到一個很有意思的ANR問題邓了,從trace分析來看偶器,Launcher應用在不斷地queuebuffer搀矫,但是SurfaceFlinger對應的buffer的數(shù)量沒有增加坤次,也就意味著buffer無法被消費,等到3個buffer都是用掉了揪胃,應用也就dequeuebuffer不出來了璃哟,雖然dequeuebuffer的timeout時間是4s小于anr的5s氛琢,但是還是有可能觸發(fā)anr喊递。
一、應用在不斷地queuebuffer阳似,但是SurfaceFlinger對應的buffer的數(shù)量沒有增加
因為BlastBufferQueue被引入之后骚勘,queuebuffer之后,需要在調用Transaction的apply才能讓SurfaceFlinger對應的buffer數(shù)量增加撮奏,才能被消費
void BLASTBufferQueue::acquireNextBufferLocked(
const std::optional<SurfaceComposerClient::Transaction*> transaction) {
....
t->setBuffer(mSurfaceControl, buffer, fence, bufferItem.mFrameNumber, releaseBufferCallback);
....
if (applyTransaction) {//情況一俏讹、常規(guī)的情況
// All transactions on our apply token are one-way. See comment on mAppliedLastTransaction
t->setApplyToken(mApplyToken).apply(false, true);//sf就會立馬更新buffer的數(shù)量
mAppliedLastTransaction = true;
mLastAppliedFrameNumber = bufferItem.mFrameNumber;
} else {//情況二、特殊的情況
//sf不會立馬更新畜吊,而是需要等sync機制泽疆,等所有窗口ready之后才能apply Transaction
t->setBufferHasBarrier(mSurfaceControl, mLastAppliedFrameNumber);
mAppliedLastTransaction = false;
}
}
很明顯Trace中就是應該走了上面代碼中情況二
,我一開始懷疑是我之前[076]SHELL TRANSITIONS
這個文章中講的BLASTSyncEngine機制導致了這個問題玲献,然后我就嘗試打開這個日志殉疼。
adb shell wm logging enable-text WM_DEBUG_SYNC_ENGINE
但是事實并不如人意梯浪,沒有出現(xiàn)我預料中的WindowManager: SyncGroup
的日志,然后我就繼續(xù)跟代碼瓢娜,有沒有別的機制可以也走情況二
二挂洛、反向跟代碼
2.1 applyTransaction在什么情況下為false
2.1.1
中首先acquireNextBufferLocked
中transaction
不為空,繼續(xù)反推
2.1.2
中syncTransactionSet
為true眠砾,也就說mTransactionReadyCallback
不為空
2.1.3
中有人調用了syncNextTransaction
最后設置了mTransactionReadyCallback
虏劲。
frameworks/native/libs/gui/BLASTBufferQueue.cpp
//2.1.1
void BLASTBufferQueue::acquireNextBufferLocked(
const std::optional<SurfaceComposerClient::Transaction*> transaction) {
SurfaceComposerClient::Transaction localTransaction;
bool applyTransaction = true;
SurfaceComposerClient::Transaction* t = &localTransaction;
if (transaction) {//transaction不為空
t = *transaction;
applyTransaction = false;
}
}
//2.1.2
void BLASTBufferQueue::onFrameAvailable(const BufferItem& item) {
SurfaceComposerClient::Transaction* prevTransaction = nullptr;
{
BBQ_TRACE();
std::unique_lock _lock{mMutex};
const bool syncTransactionSet = mTransactionReadyCallback != nullptr;//mTransactionReadyCallback不為空
if (syncTransactionSet) {//syncTransactionSet為true
acquireNextBufferLocked(mSyncTransaction);
} else if (!mWaitForTransactionCallback) {
acquireNextBufferLocked(std::nullopt);
}
}
if (prevCallback) {
prevCallback(prevTransaction);
}
}
//2.1.3
void BLASTBufferQueue::syncNextTransaction(
std::function<void(SurfaceComposerClient::Transaction*)> callback,
bool acquireSingleBuffer) {
BBQ_TRACE();
mTransactionReadyCallback = callback;//設置了callback
}
3.2 誰會調用syncNextTransaction
因為出問題的是app,我不信他會直接調用BLASTBufferQueue.cpp
的syncNextTransaction
褒颈,肯定調用BLASTBufferQueue.java
的syncNextTransaction
調用的流程如下:
BLASTBufferQueue.java
.syncNextTransaction->
android_graphics_BLASTBufferQueue.cpp
.nativeSyncNextTransaction->
BLASTBufferQueue.cpp
.syncNextTransaction
frameworks/base/graphics/java/android/graphics/BLASTBufferQueue.java
public void syncNextTransaction(boolean acquireSingleBuffer,
Consumer<SurfaceControl.Transaction> callback) {
nativeSyncNextTransaction(mNativeObject, callback, acquireSingleBuffer);//跳轉到3.2.1
}
public void syncNextTransaction(Consumer<SurfaceControl.Transaction> callback) {
syncNextTransaction(true /* acquireSingleBuffer */, callback);
}
frameworks/base/core/jni/android_graphics_BLASTBufferQueue.cpp
//3.2.1
static void nativeSyncNextTransaction(JNIEnv* env, jclass clazz, jlong ptr, jobject callback,
jboolean acquireSingleBuffer) {
sp<BLASTBufferQueue> queue = reinterpret_cast<BLASTBufferQueue*>(ptr);
JavaVM* vm = nullptr;
LOG_ALWAYS_FATAL_IF(env->GetJavaVM(&vm) != JNI_OK, "Unable to get Java VM");
if (!callback) {
queue->syncNextTransaction(nullptr, acquireSingleBuffer);
} else {
auto globalCallbackRef =
std::make_shared<JGlobalRefHolder>(vm, env->NewGlobalRef(callback));
queue->syncNextTransaction(
[globalCallbackRef](SurfaceComposerClient::Transaction* t) {
JNIEnv* env = getenv(globalCallbackRef->vm());
env->CallVoidMethod(globalCallbackRef->object(), gTransactionConsumer.accept,
env->NewObject(gTransactionClassInfo.clazz,
gTransactionClassInfo.ctor,
reinterpret_cast<jlong>(t)));
},
acquireSingleBuffer);
}
}
接下來那就搜源碼中所有調用BLASTBufferQueue.java
的syncNextTransaction
的代碼柒巫,好在結果不多,憑借我十年的工作經驗哈肖,繼續(xù)跟registerCallbacksForSync
2.3 onReadyToSync最終觸發(fā)了syncNextTransaction
繼續(xù)反推代碼
2.3.1
中syncBuffer
和syncBufferCallback
不為空
2.3.2
中mSyncBufferCallback
是不為空的
2.3.3和2.3.4
中SurfaceSyncer.SyncTarget
將會調用onReadyToSync
然后設置mSyncBufferCallback
吻育,接下來就看誰調用了onReadyToSync
。
我準備用這條神奇日志來跟蹤淤井,因為繼續(xù)反向跟太累了
Log.v("kobewang", "onReadyToSync", new Exception("kobewang"));
frameworks/base/core/java/android/view/ViewRootImpl.java
//2.3.1
private void registerCallbacksForSync(boolean syncBuffer,
final SurfaceSyncer.SyncBufferCallback syncBufferCallback) {
mAttachInfo.mThreadedRenderer.registerRtFrameCallback(new FrameDrawingCallback() {
@Override
public void onFrameDraw(long frame) {
}
@Override
public HardwareRenderer.FrameCommitCallback onFrameDraw(int syncResult, long frame) {
//開始繪制的時候設置syncBufferCallback
if (syncBuffer) {
mBlastBufferQueue.syncNextTransaction(syncBufferCallback::onBufferReady);
}
}
}
}
//2.3.2
private boolean performDraw() {
boolean usingAsyncReport = isHardwareEnabled() && mSyncBufferCallback != null;//mSyncBufferCallback不為空
if (usingAsyncReport) {
registerCallbacksForSync(mSyncBuffer, mSyncBufferCallback);
} else if (mHasPendingTransactions) {
}
}
//2.3.3
private void readyToSync(SurfaceSyncer.SyncBufferCallback syncBufferCallback) {
mSyncBufferCallback = syncBufferCallback;
}
//2.3.4
public final SurfaceSyncer.SyncTarget mSyncTarget = new SurfaceSyncer.SyncTarget() {
@Override
public void onReadyToSync(SurfaceSyncer.SyncBufferCallback syncBufferCallback) {
Log.v("kobewang", "onReadyToSync", new Exception("kobewang"));//我添加了一個日志
readyToSync(syncBufferCallback);//最后設置了syncBufferCallback
}
@Override
public void onSyncComplete() {
mHandler.postAtFrontOfQueue(() -> {
if (--mNumSyncsInProgress == 0 && mAttachInfo.mThreadedRenderer != null) {
HardwareRenderer.setRtAnimationsEnabled(true);
}
});
}
};
2.4 神奇的日志發(fā)揮了神奇的作用
堆棧中的行數(shù)可能對應不上布疼,因為我屏蔽了一些我們公司的代碼,我用AOSP的代碼來表達意思
12-07 17:57:29.435 8956 8956 V kobewang: onReadyToSync
12-07 17:57:29.435 8956 8956 V kobewang: java.lang.Exception: kobewang
12-07 17:57:29.435 8956 8956 V kobewang: at android.view.ViewRootImpl$9.onReadyToSync(ViewRootImpl.java:11501)
12-07 17:57:29.435 8956 8956 V kobewang: at android.window.SurfaceSyncer$SyncSet.addSyncableSurface(SurfaceSyncer.java:352)
12-07 17:57:29.435 8956 8956 V kobewang: at android.window.SurfaceSyncer.addToSync(SurfaceSyncer.java:231)
12-07 17:57:29.435 8956 8956 V kobewang: at android.window.SurfaceSyncer.addToSync(SurfaceSyncer.java:210)
12-07 17:57:29.435 8956 8956 V kobewang: at com.android.systemui.animation.ViewRootSync.synchronizeNextDraw(ViewRootSync.kt:7)
12-07 17:57:29.435 8956 8956 V kobewang: at com.android.systemui.animation.ViewRootSync.synchronizeNextDraw(ViewRootSync.kt:11)
12-07 17:57:29.435 8956 8956 V kobewang: at com.android.launcher3.taskbar.TaskbarLauncherStateController.onIconAlignmentRatioChanged(TaskbarLauncherStateController.java:88)
最關鍵的代碼onIconAlignmentRatioChanged
币狠,很明顯Launcher希望mLauncher.getHotseat()
和mControllers.taskbarActivityContext.getDragLayer()
這兩個View在下一幀同時顯示游两,利用的機制就是SurfaceSyncer
。
private void onIconAlignmentRatioChanged(Supplier<Float> alignmentSupplier) {
// Sync the first frame where we swap taskbar and hotseat.
if (firstFrameVisChanged && mCanSyncViews && !Utilities.IS_RUNNING_IN_TEST_HARNESS) {
ViewRootSync.synchronizeNextDraw(mLauncher.getHotseat(),
mControllers.taskbarActivityContext.getDragLayer(),
() -> {});
}
}
frameworks/base/packages/SystemUI/animation/src/com/android/systemui/animation/ViewRootSync.kt
object ViewRootSync {
private var surfaceSyncer: SurfaceSyncer? = null
/**
* Synchronize the next draw between the view roots of [view] and [otherView], then run [then].
*
* Note that in some cases, the synchronization might not be possible (e.g. WM consumed the
* next transactions) or disabled (temporarily, on low ram devices). In this case, [then] will
* be called without synchronizing.
*/
fun synchronizeNextDraw(
view: View,
otherView: View,
then: () -> Unit
) {
if (!view.isAttachedToWindow || view.viewRootImpl == null ||
!otherView.isAttachedToWindow || otherView.viewRootImpl == null ||
view.viewRootImpl == otherView.viewRootImpl) {
// No need to synchronize if either the touch surface or dialog view is not attached
// to a window.
then()
return
}
surfaceSyncer = SurfaceSyncer().apply {
val syncId = setupSync(Runnable { then() })
addToSync(syncId, view)
addToSync(syncId, otherView)
markSyncReady(syncId)
}//利用SurfaceSyncer實現(xiàn)兩個view的同一幀顯示
}
/**
* A Java-friendly API for [synchronizeNextDraw].
*/
@JvmStatic
fun synchronizeNextDraw(view: View, otherView: View, then: Runnable) {
synchronizeNextDraw(view, otherView, then::run)
}
}
mLauncher.getHotseat()
就是最開頭背景中trace中dequeue timeout窗口漩绵,mControllers.taskbarActivityContext.getDragLayer()
就是對應Taskbar贱案,所以目前來看問題就出在了為什么Taskbar沒有完成繪制。
三止吐、為什么Taskbar沒有完成繪制
當我把Taskbar也加進來的時候宝踪,和Launcher的主線程加進來,真相大白碍扔,原來某個service stop導致了原來Taskbar被銷毀了瘩燥。
現(xiàn)場還原
一開始調用以下代碼希望trace中mLauncher.getHotseat()
其實就是QuickstepLauncher
與Taskbar利用SurfaceSyncer的功能進行同步顯示。
ViewRootSync.synchronizeNextDraw(mLauncher.getHotseat(),
mControllers.taskbarActivityContext.getDragLayer(),
() -> {});
雖然有一定的窗口銷毀的判斷不同。
if (!view.isAttachedToWindow || view.viewRootImpl == null ||
!otherView.isAttachedToWindow || otherView.viewRootImpl == null ||
view.viewRootImpl == otherView.viewRootImpl) {
// No need to synchronize if either the touch surface or dialog view is not attached
// to a window.
then()
return
}
但是窗口銷毀恰好發(fā)生在這個判斷之后厉膀,所以等mLauncher.getHotseat()
繪制完了,Taskbar
因為窗口被銷毀了二拐,導致沒有繪制完服鹅,最終導致了mLauncher.getHotseat()
一直在等舊的Taskbar
繪制完成,這怎么可能還等的到呢百新。
四企软、SurfaceSyncer連這種情況就沒有考慮到嘛?
一開始我還覺得不可能google工程師沒有考慮到這個問題饭望,但是我看了SurfaceSyncer的代碼仗哨,的確發(fā)現(xiàn)SurfaceSyncer就是沒考慮這種情況聚蝶。
當時我和同事溝通就覺得,應該有個timeout機制藻治,例如1s以后需要同步顯示的Surface其中一個沒有繪制完成碘勉,剩下的Surface對應的Transation就應該apply出去。
4.1 相同的app在android 14上會界面卡桩卵,但是不會anr
新的線索验靡,然后我去看android 14的代碼,結果發(fā)現(xiàn)SurfaceSyncer已經被SurfaceSyncGroup代替了雏节,然后我就憑直覺搜了timeout胜嗓,果然命中。
一旦timeout觸發(fā)钩乍,就會調用4.1.1
中runnable
的代碼辞州, mPendingSyncs.clear()
之后調用4.1.2
的markSyncReady
,然后調用4.1.3
中checkIfSyncIsComplete
寥粹,最后調用4.1.4
中transaction.apply()
变过,這樣子就可以解決問題中這種情況。
frameworks/base/core/java/android/window/SurfaceSyncGroup.java
public static final int TRANSACTION_READY_TIMEOUT = 1000 * Build.HW_TIMEOUT_MULTIPLIER;
//4.1.1
private void addTimeout() {
Looper looper = null;
synchronized (sHandlerThreadLock) {
if (sHandlerThread == null) {
sHandlerThread = new HandlerThread("SurfaceSyncGroupTimer");
sHandlerThread.start();
}
looper = sHandlerThread.getLooper();
}
synchronized (mLock) {
if (mTimeoutAdded || mTimeoutDisabled || looper == null) {
// We only need one timeout for the entire SurfaceSyncGroup since we just want to
// ensure it doesn't stay stuck forever.
return;
}
if (mHandler == null) {
mHandler = new Handler(looper);
}
mTimeoutAdded = true;
}
Runnable runnable = () -> {
Log.e(TAG, "Failed to receive transaction ready in " + TRANSACTION_READY_TIMEOUT
+ "ms. Marking SurfaceSyncGroup(" + mName + ") as ready");
// Clear out any pending syncs in case the other syncs can't complete or timeout due to
// a crash.
synchronized (mLock) {
mPendingSyncs.clear();//timeout時間到了就把mPendingSyncs清空
}
markSyncReady();//重新觸發(fā)sync的確認涝涤,因為mPendingSyncs已經為空了媚狰,就可以觸發(fā)
};
mHandler.postDelayed(runnable, this, TRANSACTION_READY_TIMEOUT);
}
//4.1.2
public void markSyncReady() {
if (DEBUG) {
Log.d(TAG, "markSyncReady " + mName);
}
if (Trace.isTagEnabled(Trace.TRACE_TAG_VIEW)) {
Trace.instantForTrack(Trace.TRACE_TAG_VIEW, mTrackName, "markSyncReady");
}
synchronized (mLock) {
if (mHasWMSync) {
try {
WindowManagerGlobal.getWindowManagerService().markSurfaceSyncGroupReady(mToken);
} catch (RemoteException e) {
}
}
mSyncReady = true;
checkIfSyncIsComplete();//確認是否已經sync ready
}
}
//4.1.3
private void checkIfSyncIsComplete() {
if (mFinished) {
if (DEBUG) {
Log.d(TAG, "SurfaceSyncGroup=" + mName + " is already complete");
}
mTransaction.apply();
return;
}
if (Trace.isTagEnabled(Trace.TRACE_TAG_VIEW)) {
Trace.instantForTrack(Trace.TRACE_TAG_VIEW, mTrackName,
"checkIfSyncIsComplete mSyncReady=" + mSyncReady
+ " mPendingSyncs=" + mPendingSyncs.size());
}
if (!mSyncReady || !mPendingSyncs.isEmpty()) { //mPendingSyncs.isEmpty()為true
if (DEBUG) {
Log.d(TAG, "SurfaceSyncGroup=" + mName + " is not complete. mSyncReady="
+ mSyncReady + " mPendingSyncs=" + mPendingSyncs.size());
}
return;
}
if (DEBUG) {
Log.d(TAG, "Successfully finished sync id=" + mName);
}
mTransactionReadyConsumer.accept(mTransaction);//這里就會apply了
mFinished = true;
if (mTimeoutAdded) {
mHandler.removeCallbacksAndMessages(this);
}
}
//4.1.4
mTransactionReadyConsumer = (transaction) -> {
if (Trace.isTagEnabled(Trace.TRACE_TAG_VIEW)) {
Trace.asyncTraceForTrackBegin(Trace.TRACE_TAG_VIEW, mTrackName,
"Invoke transactionReadyCallback="
+ transactionReadyCallback.hashCode(), hashCode());
}
lastCallback.accept(null);
try {
transactionReadyCallback.onTransactionReady(transaction);
} catch (RemoteException e) {
transaction.apply();//這里就會apply了
}
if (Trace.isTagEnabled(Trace.TRACE_TAG_VIEW)) {
Trace.asyncTraceForTrackEnd(Trace.TRACE_TAG_VIEW, mTrackName, hashCode());
}
};
五、總結
回到問題的最開始阔拳,現(xiàn)在你覺得這個bug是系統(tǒng)的問題還是應用的問題崭孤,可能覺得多數(shù)情況下,這類問題就是應用組和系統(tǒng)組之間互相扯皮糊肠。
應用組:為什么14是好的辨宠,13有問題。
系統(tǒng)組:為什么就你Launcher有問題货裹,別的應用沒問題嗤形。
誰都不愿意仔細去分析,運氣好呢泪酱,可能這個bug在別的改動下影響service stop的時機派殷,導致無法復現(xiàn)這個問題还最,最后就不了了之了墓阀。
如果說[011]一個看似是系統(tǒng)問題的應用問題的解決過程這個問題是多年學習Binder之后的體現(xiàn),那這個問題就是我多年學習整個Android顯示框架之后的體現(xiàn)拓轻,整個過程斯撮,其實我沒有拿到出問題的機器,只能讓同事幫忙加日志扶叉,抓trace勿锅,自己在跟蹤代碼分析帕膜,整個分析過程并沒有文章中的那么順暢,也走了很多岔路溢十,其實這個問題在一年前別項目已經報出過了垮刹,但是由于當時無法找到必現(xiàn)路徑,而且當時這問題也沒到我頭上分析张弛,一直沒有找到root cause荒典,這次總算把這個問題根因找到了,很開心吞鸭,從中也學到了很多東西寺董。
尾巴
最后同事問我有沒有整體的一個刷新流程圖,從input事件到顯示的教程刻剥,以及看Trace的技巧遮咖,我真的很難回答這個問題,我只能讓他去看我的B站視頻以及https://www.androidperformance.com/中介紹的trace的技巧造虏,但是事實上就算看過了御吞,去解決實際問題的時候,任何一個知識點的欠缺就需要你去補漓藕,養(yǎng)兵千日魄藕,用在一時,知識學習也是一樣撵术,平時不斷地積累背率,然后工作中不斷對已經學習知識點深入理解,鞏固嫩与,最后才能不斷地進步寝姿。