Apple Metal 2 5.你好,計(jì)算

原文https://developer.apple.com/documentation/metal/fundamental_lessons/hello_compute

你好,并行計(jì)算

示例代碼下載

概覽

In the Basic Texturing sample, you learned how to render a 2D image by applying a texture to a single quad.
In this sample, you’ll learn how to execute compute-processing workloads in Metal for image processing. In particular, you’ll learn how to work with the compute processing pipeline and write kernel functions.

GPGPU 原理

Graphics processing units (GPUs) were originally designed to process large amounts of graphics data, such as vertices or fragments, in a very fast and efficient manner. This design is evident in the GPU hardware architecture itself, which has many processing cores that execute workloads in parallel.
Throughout the history of GPU design, the parallel-processing architecture has remained fairly consistent, but the processing cores have become increasingly programmable. This change enabled GPUs to move away from a fixed-function pipeline toward a programmable pipeline, a change that also enabled general-purpose GPU (GPGPU) programming.
In the GPGPU model, the GPU can be used for any kind of processing task and isn’t limited to graphics data. For example, GPUs can be used for cryptography, machine learning, physics, or finance. In Metal, GPGPU workloads are known as compute-processing workloads, or compute.
Graphics and compute workloads are not mutually exclusive; Metal provides a unified framework and language that enables seamless integration of graphics and compute workloads. In fact, this sample demonstrates this integration by:
Using a compute pipeline that converts a color image to a grayscale image
Using a graphics pipeline that renders the grayscale image to a quad surface

創(chuàng)建計(jì)算處理管線

The compute processing pipeline is made up of only one stage, a programmable kernel function, that executes a compute pass. The kernel function reads from and writes to resources directly, without passing resource data through various pipeline stages.
A MTLComputePipelineState object represents a compute processing pipeline. Unlike a graphics rendering pipeline, you can create a MTLComputePipelineState object with a single kernel function, without using a pipeline descriptor.

// Create a compute kernel function
id <MTLFunction> kernelFunction = [defaultLibrary newFunctionWithName:@"grayscaleKernel"];

// Create a compute kernel
_computePipelineState = [_device newComputePipelineStateWithFunction:kernelFunction
                                                               error:&error];

編寫(xiě)核函數(shù)

This sample loads image data into a texture and then uses a kernel function to convert the texture’s pixels from color to grayscale. The kernel function processes the pixels independently and concurrently.

注意:
An equivalent algorithm can be written for and executed by the CPU. However, a GPU solution is faster because the texture’s pixels don’t need to be processed sequentially.

The kernel function in this sample is called grayscaleKernel and its signature is shown below:

kernel void
grayscaleKernel(texture2d<half, access::read>  inTexture  [[texture(AAPLTextureIndexInput)]],
                texture2d<half, access::write> outTexture [[texture(AAPLTextureIndexOutput)]],
                uint2                          gid         [[thread_position_in_grid]])

The function takes the following resource parameters:
inTexture: A read-only, 2D texture that contains the input color pixels.
outTexture: A write-only, 2D texture that stores the output grayscale pixels.
Textures that specify a read access qualifier can be read from using the read() function. Textures that specify a write access qualifier can be written to using the write() function.
A kernel function executes once per thread, which is analogous to how a vertex function executes once per vertex. Threads are organized into a 3D grid; an encoded compute pass specifies how many threads to process by declaring the size of the grid. Because this sample processes a 2D texture, the threads are arranged in a 2D grid where each thread corresponds to a unique texel.
The kernel function’s gid parameter uses the [[thread_position_in_grid]] attribute qualifier, which locates a thread within the compute grid. Each execution of the kernel function has a unique gid value that enables each thread to work distinctly.
A grayscale pixel has the same value for each of its RGB components. This value can be calculated by simply averaging the RGB components of a color pixel, or by applying certain weights to each component. This sample uses the Rec. 709 luma coefficients for the color-to-grayscale conversion.


half4 inColor  = inTexture.read(gid);
half  gray     = dot(inColor.rgb, kRec709Luma);
outTexture.write(half4(gray, gray, gray, 1.0), gid);

執(zhí)行計(jì)算管道

A MTLComputeCommandEncoder object contains the commands for executing a compute pass, including references to the kernel function and its resources. Unlike a render command encoder, you can create a MTLComputeCommandEncoder without using a pass descriptor.

id <MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];

[computeEncoder setComputePipelineState:_computePipelineState];

[computeEncoder setTexture:_inputTexture
                   atIndex:AAPLTextureIndexInput];

[computeEncoder setTexture:_outputTexture
                   atIndex:AAPLTextureIndexOutput];

A compute pass must specify the number of times to execute a kernel function. This number corresponds to the grid size, which is defined in terms of threads and threadgroups. A threadgroup is a 3D group of threads that are executed concurrently by a kernel function. In this sample, each thread corresponds to a unique texel, and the grid size must be at least the size of the 2D image. For simplicity, this sample uses a 16 x 16 threadgroup size which is small enough to be used by any GPU. In practice, however, selecting an efficient threadgroup size depends on both the size of the data and the capabilities of a specific device.


// Set the compute kernel's thread group size of 16x16
_threadgroupSize = MTLSizeMake(16, 16, 1);

// Calculate the number of rows and columsn of thread groups given the width of our input image.
//   Ensure we cover the entire image (or more) so we process every pixel.
_threadgroupCount.width  = (_inputTexture.width  + _threadgroupSize.width -  1) / _threadgroupSize.width;
_threadgroupCount.height = (_inputTexture.height + _threadgroupSize.height - 1) / _threadgroupSize.height;

The sample finalizes the compute pass by issuing a dispatch call and ending the encoding of compute commands.

[computeEncoder dispatchThreadgroups:_threadgroupCount
               threadsPerThreadgroup:_threadgroupSize];

[computeEncoder endEncoding];

The sample then continues to encode the rendering commands first introduced in the Basic Texturing sample. The commands for the compute pass and the render pass use the same grayscale texture, are appended into the same command buffer, and are submitted to the GPU at the same time. However, the grayscale conversion in the compute pass is always executed before the quad rendering in the render pass.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末揖膜,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子姥芥,更是在濱河造成了極大的恐慌剪验,老刑警劉巖般甲,帶你破解...
    沈念sama閱讀 219,188評(píng)論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件身坐,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡,警方通過(guò)查閱死者的電腦和手機(jī)晶密,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,464評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門(mén),熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)模她,“玉大人稻艰,你說(shuō)我怎么就攤上這事〕蘧唬” “怎么了连锯?”我有些...
    開(kāi)封第一講書(shū)人閱讀 165,562評(píng)論 0 356
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)用狱。 經(jīng)常有香客問(wèn)我,道長(zhǎng)拼弃,這世上最難降的妖魔是什么夏伊? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 58,893評(píng)論 1 295
  • 正文 為了忘掉前任,我火速辦了婚禮吻氧,結(jié)果婚禮上溺忧,老公的妹妹穿的比我還像新娘。我一直安慰自己盯孙,他們只是感情好鲁森,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,917評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布。 她就那樣靜靜地躺著振惰,像睡著了一般歌溉。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上骑晶,一...
    開(kāi)封第一講書(shū)人閱讀 51,708評(píng)論 1 305
  • 那天痛垛,我揣著相機(jī)與錄音,去河邊找鬼桶蛔。 笑死匙头,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的仔雷。 我是一名探鬼主播蹂析,決...
    沈念sama閱讀 40,430評(píng)論 3 420
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼碟婆!你這毒婦竟也來(lái)了电抚?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書(shū)人閱讀 39,342評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤脑融,失蹤者是張志新(化名)和其女友劉穎喻频,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體肘迎,經(jīng)...
    沈念sama閱讀 45,801評(píng)論 1 317
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡甥温,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,976評(píng)論 3 337
  • 正文 我和宋清朗相戀三年锻煌,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片姻蚓。...
    茶點(diǎn)故事閱讀 40,115評(píng)論 1 351
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡宋梧,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出狰挡,到底是詐尸還是另有隱情捂龄,我是刑警寧澤,帶...
    沈念sama閱讀 35,804評(píng)論 5 346
  • 正文 年R本政府宣布加叁,位于F島的核電站倦沧,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏它匕。R本人自食惡果不足惜展融,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,458評(píng)論 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望豫柬。 院中可真熱鬧告希,春花似錦、人聲如沸烧给。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 32,008評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)础嫡。三九已至指么,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間驰吓,已是汗流浹背涧尿。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 33,135評(píng)論 1 272
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留檬贰,地道東北人姑廉。 一個(gè)月前我還...
    沈念sama閱讀 48,365評(píng)論 3 373
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像翁涤,于是被迫代替她去往敵國(guó)和親桥言。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,055評(píng)論 2 355

推薦閱讀更多精彩內(nèi)容

  • 這時(shí)就可以得到結(jié)果了葵礼。其中resp_json_payload的結(jié)果類(lèi)似如下:
    MingSha閱讀 321評(píng)論 0 0
  • 針對(duì)現(xiàn)在網(wǎng)上各種“約”的軟件肆虐号阿,我非常懷疑這種風(fēng)氣導(dǎo)向的正確性。長(zhǎng)久以來(lái)色情行業(yè)之所以茍延殘喘鸳粉,生生不息的原因是...
    維枷閱讀 515評(píng)論 0 0
  • mv 命令是一個(gè)與cp類(lèi)似的命令扔涧,但是它并非創(chuàng)建文件或目錄的復(fù)制品/副本。不管你在使用什么版本的Linux系統(tǒng),m...
    流川楓丶閱讀 11,468評(píng)論 0 1
  • 今天學(xué)習(xí)了走向世界的整體枯夜,這節(jié)課主要由第二次工業(yè)革命入手弯汰,講述了第二次工業(yè)革命的發(fā)明帶人類(lèi)進(jìn)入了“電氣時(shí)代...
    一吻天荒LY閱讀 185評(píng)論 0 0
  • 以前我很愛(ài)看重生小說(shuō)。因?yàn)橹厣牢磥?lái)的事湖雹,就算沒(méi)有別的天賦咏闪,這種預(yù)知也已經(jīng)是一根金手指,可以買(mǎi)房買(mǎi)彩買(mǎi)股票摔吏,提前...
    念念1999閱讀 644評(píng)論 0 1