## 簡(jiǎn)介
[Chimera: Collaborative Preemption
for Multitasking on a Shared GPU](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=ieArtTcAAAAJ&citation_for_view=ieArtTcAAAAJ:2osOgNQ5qMEC)
![在這里輸入圖片描述][1]
*? ? Keywords:
*? ? Graphics Processing Unit;
*? ? Preemptive Multitasking;
*? ? ContextSwitch;
*? ? Idempotence
----------
## 問題的提出
> Preemptive multitasking on CPUs has been primarily supported through context switching. However, the same preemption strategy incurs substantial overhead due to the large context in GPUs.
由于GPU的上下文很大,因此上下文切換不適用于GPU的搶占技術(shù)
>? ? overhead comes in two dimensions: a preempting kernel suffers from a long preemption latency, and the system throughput is wasted during the switch
這里的開銷主要體現(xiàn)兩方面: 搶占延時(shí)長蜻韭,系統(tǒng)的吞吐量浪費(fèi)
## 解決方案的提出
>? ? we propose Chimera, a collaborative preemption approach that can precisely control the overhead for multitasking on GPUs
提出Chimera氧敢,一種適用于多任務(wù)GPU的協(xié)作搶占方法框杜,可以精確控制上述的開銷煌张。
>? ? Chimera can achieve a specified preemption latency while minimizing throughput overhead
Chimera 能實(shí)現(xiàn)指定的搶占延遲,同時(shí)最小化吞吐量挑格。
>? ? Chimera achieves the goal by intelligently selecting which SMs to preempt and how each thread block will be preempted.
Chimera智能選擇搶占哪個(gè)SM师幕,以及如何搶占每個(gè)線程塊
>? ? Chimera first introduces streaming multiprocessor (SM) flushing, which can instantly preempt an SM by detecting and exploiting idempotent execution
技術(shù)之一: flushing
>? ? Chimera utilizes flushing collaboratively with two previously proposed preemption techniques for GPUs, namely context switching and draining to minimize throughput overhead while achieving a required preemption latency.
技術(shù)之二: context switching,技術(shù)之三:draining
####? 技術(shù)解釋
1.? ? Context switching
>? ? Context switching [17, 29] stores the context of currently running thread blocks, and preempts an SM with a new kernel.
Context switching 是保存當(dāng)前運(yùn)行線程塊的上下文抹凳,并用新內(nèi)核搶占SM遏餐。
1.? ? Draining
>? ? Draining [12, 29] stops issuing new thread blocks to the SM and waits until the SM finishes its currently running thread blocks.
Draining 是停止分配新線程塊給SM,等待當(dāng)前線程塊運(yùn)行完赢底,再搶占該SM(彬彬有禮)失都。
1.? ? Flushing
>? ? Flushing drops the execution of running thread blocks and preempts the SM almost instantly.
Flushing 是取消正在運(yùn)行的線程塊,立即搶占SM(很粗暴)幸冻。
## 本文的contribution
1.? 分析GPU的刷新條件: 放寬冪等的語義定義
1.? 分析 搶占技術(shù)( context switching, draining, and flushing) 與線程運(yùn)行過程的定量關(guān)系
1.? Chimera的實(shí)現(xiàn): 根據(jù)不同的搶占技術(shù)的開銷來智能選擇搶占哪個(gè)SM以及如何搶占線程塊粹庞。
## 實(shí)驗(yàn)結(jié)果評(píng)估
> Evaluations show that Chimera violates the deadline for only 0.2% of preemption requests when a 15μs preemption latency constraint is used. For multi-programmed workloads, Chimera can improve the average normalized turnaround time by 5.5x, and system throughput by 12.2%
改善平均周轉(zhuǎn)時(shí)間和吞吐量。
#### 3. Architecture
##### 3.1 GPU Scheduler with PreemptiveMultitasking
An SM partitioning policy in the kernel scheduler tells
howmany SMs each kernelwill run on
Chimera consists of two parts: estimating costs of preemption
for each technique, and selecting SMs to preempt
with corresponding preempting techniques.
Chimera can directly compare the estimated
cost of each preemption technique
##### 3.2 Cost Estimation
estimate the cost of each
preemption technique precisely for each SM.
First, Chimera
measures the total number of executed instructions for each
thread block to determine the progress of each thread block
Second, Chimera also measures the progress of each
thread block in cycles
instructions-per-cycle (IPC) or cycles-per-instruction (CPI)
estimate the preemption latency of context switching
using the same method
##### 3.3 Preemption Selection
how Chimera selects a subset of
SMs and techniques to preempt.
The time complexity of algorithm 1 is O(NT logT +
NlogN),
Thus, the impact
of the selection algorithm in Chimera is negligible to the
preemption latency.
##### 3.4 SM Flushing
We relax the idempotence condition by looking at thread
blocks individually with the notion of time
4. Results
[1]: https://coding.net/api/project/178029/files/403854/imagePreview