Apple Metal 2 6.CPU和GPU 同步

原文https://developer.apple.com/documentation/metal/fundamental_lessons/cpu_and_gpu_synchronization

CPU and GPU Synchronization

Demonstrates how to update buffer data and synchronize access between the CPU and GPU.

Overview

In this sample you'll learn how to properly update and render animated resources that are shared between the CPU and the graphics processing unit (GPU). In particular, you'll learn how to modify data each frame, avoid data access hazards, and execute CPU and GPU work in parallel.

CPU/GPU Parallelism and Shared Resource Access

The CPU and GPU are separate, asynchronous processors. In a Metal app or game, the CPU encodes commands and the GPU executes commands. This sequence is repeated in every frame, and a full frame's work is completed when both the CPU and the GPU have finished their work. The CPU and the GPU can work in parallel and don't need to wait for each other to finish their work. For example, the GPU can execute commands for frame 1 while the CPU encodes commands for frame 2.

This CPU/GPU parallelism has great advantages for your Metal app or game, effectively enabling you to run on two processors at once. However, these processors still work together and usually access the same shared resources, such as vertex buffers or fragment textures. Shared resource access must be handled with care; otherwise, the CPU and GPU might access a shared resource at the same time, resulting in a race condition and corrupted data.

This sample, like most Metal apps or games, renders animated content by updating vertex data in each frame. Consider the following sequence:

  1. The render loop starts a new frame.
  2. The CPU writes new vertex data into the vertex buffer.
  3. The CPU encodes render commands and commits a command buffer.
  4. The GPU begins executing the command buffer.
  5. The GPU reads vertex data from the vertex buffer.
  6. The GPU renders pixels to a drawable.
  7. The render loop completes the frame.

In this sequence, the CPU and GPU both share a single vertex buffer. If the processors wait for each other to finish their work before beginning their own work, there are no access conflicts for the shared vertex buffer. This model avoids access conflicts, but wastes valuable processing time: when one processor is working, the other is idle.

Metal is designed to maximize CPU and GPU parallelism, so these processors should be kept busy and should be working simultaneously. Ideally, the GPU should be reading vertex data for frame 1 while the CPU is writing vertex data for frame 2. However, sharing a single vertex buffer means that the CPU could overwrite the previous frame's vertex data before the GPU has read it, resulting in unsightly rendering artifacts.

To reduce processor idle time and avoid access conflicts, vertex data can be shared by using multiple buffers instead of a single buffer. For example, the CPU and GPU can share vertex buffer 1 for frame 1, vertex buffer 2 for frame 2, vertex buffer 3 for frame 3, and so on. In this model, shared vertex data stays consistent throughout each frame and the processors access different vertex buffers simultaneously.

Implement a Triple Buffer Model

This sample renders hundreds of small quads, also known as sprites. To animate the sprites, the sample updates their positions at the start of each frame and writes them into a vertex buffer. When a frame is complete, the CPU and GPU no longer need the vertex buffer for that frame. Discarding a used vertex buffer and creating a new one for each frame is wasteful. Instead, a more sustainable model can be implemented with a FIFO queue of reusable vertex buffers.

The maximum number of buffers in the queue is defined by the value of MaxBuffersInFlight, set to 3. This constant value defines the maximum number of frames that can be worked on simultaneously by any part of the device; this includes the app, the driver, or the display. The app only works on a frame in the CPU or the GPU, but the OS itself may work on a frame at the driver or display level. Using three buffers gives the device enough leeway to work efficiently and effectively. Too few buffers can result in processor stalls and resource contention, whereas too many buffers can result in increased memory overhead and frame latency.

for(NSUInteger bufferIndex = 0; bufferIndex < MaxBuffersInFlight; bufferIndex++)
{
    _vertexBuffers[bufferIndex] = [_device newBufferWithLength:spriteVertexBufferSize
                                                       options:MTLResourceStorageModeShared];
}

At the start of the drawInMTKView: render loop, the sample iterates through each of the buffers in the _vertexBuffer array, updating only one buffer per frame. At the end of every third frame, after all three buffers have been used, the sample cycles back to the start of the array and updates the contents of the _vertexBuffer[0] buffer.

Manage the Rate of Work

To avoid overwriting data prematurely, the sample must ensure that the GPU has processed the contents of a buffer before reusing it. Otherwise, the CPU could overwrite the vertex data that was written three frames earlier but has not been read by the GPU yet. This condition occurs when the CPU produces work for the GPU faster than the GPU can complete it.

This sample uses semaphores to wait for full frame completions in case the CPU is running too far ahead of the GPU.

_inFlightSemaphore = dispatch_semaphore_create(MaxBuffersInFlight);

At the start of the render loop, the semaphore checks for a proceed or wait signal. If a buffer can be used or reused, the CPU work proceeds; otherwise, it waits until a buffer is available.

dispatch_semaphore_wait(_inFlightSemaphore, DISPATCH_TIME_FOREVER);

The GPU can't signal the semaphore directly, but it can issue a completion callback to the CPU.

[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer)
{
    dispatch_semaphore_signal(block_sema);
}];

The addCompletedHandler: method registers a block of code that that is called immediately after the GPU has finished executing a command buffer. This command buffer is the same one that committed a vertex buffer for a frame, so receiving the completion callback indicates that the vertex buffer can be safely reused.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末催蝗,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 217,542評(píng)論 6 504
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件豌鸡,死亡現(xiàn)場(chǎng)離奇詭異炭剪,居然都是意外死亡每币,警方通過(guò)查閱死者的電腦和手機(jī)创倔,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,822評(píng)論 3 394
  • 文/潘曉璐 我一進(jìn)店門(mén)嗡害,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái),“玉大人畦攘,你說(shuō)我怎么就攤上這事霸妹。” “怎么了知押?”我有些...
    開(kāi)封第一講書(shū)人閱讀 163,912評(píng)論 0 354
  • 文/不壞的土叔 我叫張陵叹螟,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我台盯,道長(zhǎng)罢绽,這世上最難降的妖魔是什么? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 58,449評(píng)論 1 293
  • 正文 為了忘掉前任静盅,我火速辦了婚禮良价,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘蒿叠。我一直安慰自己明垢,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,500評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布市咽。 她就那樣靜靜地躺著痊银,像睡著了一般。 火紅的嫁衣襯著肌膚如雪魂务。 梳的紋絲不亂的頭發(fā)上曼验,一...
    開(kāi)封第一講書(shū)人閱讀 51,370評(píng)論 1 302
  • 那天,我揣著相機(jī)與錄音粘姜,去河邊找鬼鬓照。 笑死,一個(gè)胖子當(dāng)著我的面吹牛孤紧,可吹牛的內(nèi)容都是我干的豺裆。 我是一名探鬼主播,決...
    沈念sama閱讀 40,193評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼号显,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼臭猜!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起押蚤,我...
    開(kāi)封第一講書(shū)人閱讀 39,074評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤蔑歌,失蹤者是張志新(化名)和其女友劉穎,沒(méi)想到半個(gè)月后揽碘,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體次屠,經(jīng)...
    沈念sama閱讀 45,505評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡园匹,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,722評(píng)論 3 335
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了劫灶。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片裸违。...
    茶點(diǎn)故事閱讀 39,841評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖本昏,靈堂內(nèi)的尸體忽然破棺而出供汛,到底是詐尸還是另有隱情,我是刑警寧澤涌穆,帶...
    沈念sama閱讀 35,569評(píng)論 5 345
  • 正文 年R本政府宣布怔昨,位于F島的核電站,受9級(jí)特大地震影響宿稀,放射性物質(zhì)發(fā)生泄漏朱监。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,168評(píng)論 3 328
  • 文/蒙蒙 一原叮、第九天 我趴在偏房一處隱蔽的房頂上張望赫编。 院中可真熱鬧,春花似錦奋隶、人聲如沸擂送。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 31,783評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)嘹吨。三九已至,卻和暖如春境氢,著一層夾襖步出監(jiān)牢的瞬間蟀拷,已是汗流浹背。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 32,918評(píng)論 1 269
  • 我被黑心中介騙來(lái)泰國(guó)打工萍聊, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留问芬,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 47,962評(píng)論 2 370
  • 正文 我出身青樓寿桨,卻偏偏與公主長(zhǎng)得像此衅,于是被迫代替她去往敵國(guó)和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子亭螟,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,781評(píng)論 2 354

推薦閱讀更多精彩內(nèi)容

  • 今天忙到這個(gè)時(shí)候才有時(shí)間靜靜的寫(xiě)一下東西挡鞍。 其實(shí)今天本來(lái)都想要不想寫(xiě)了的,覺(jué)得都這么晚了预烙,干脆就明天再寫(xiě)明天交了吧...
    一個(gè)堅(jiān)持的小寫(xiě)渣閱讀 546評(píng)論 0 0
  • 梅花展顏蕊自寒墨微, 三更留夢(mèng)夜珊闌。 莫嘆人生能幾何扁掸, 今生結(jié)得來(lái)生緣翘县!
    紫竹清夢(mèng)閱讀 416評(píng)論 0 3
  • 兒子今早去托輔練字了衰琐,這個(gè)禮拜老師布置的作業(yè),兒子沒(méi)有記在小本子上炼蹦。不知道有什么作業(yè),自己也不著急狸剃。我不...
    翔兒媽媽閱讀 95評(píng)論 0 0
  • 民國(guó)名媛陸小曼掐隐,一出生就是含著金鑰匙,頂級(jí)高配版地出生方式讓她在人生的前半生的確贏得很多鮮花和掌聲钞馁。再加上自身才氣頗高虑省,
    love漢娜閱讀 120評(píng)論 0 0