摘要:
本文提出了一個基于OSS(Optimal-Solution Space) Model構(gòu)建和機(jī)器學(xué)習(xí)預(yù)測的Stencil自動優(yōu)化框架萄涯∷J簦框架通過feature extractor提取architecture,algorithm般堆,input等多個維度的特征構(gòu)建OSS却邓,并訓(xùn)練off-line的模型以提供online的預(yù)測视译。通過與SDSL,PATUS等state-of-the-art自動優(yōu)化系統(tǒng)的比較入蛆,F(xiàn)AST可以極快的速度取得相當(dāng)?shù)膬?yōu)化性能响蓉。
Motivation:
文章基于的一個主要觀察是,two stencil computations share the same (near-)optimal solutions if they have high similarity in computing features.
OSS:
可以將OSS理解為選取K最優(yōu)的策略哨毁,而非得到單一的最優(yōu)解枫甲。
我們希望做得是從feature vector得到OSS。
具體的映射則不是f->OSS的映射扼褪,而是x(feature difference)到OR(Overlapping Ratio)的映射想幻。
y的定義參見原文,不再列舉更多公式话浇。
一個值得注意的問題是脏毯,OSS規(guī)模的選擇。規(guī)模太小幔崖,準(zhǔn)確率肯定似乎有問題的食店,規(guī)模大了開銷又上去了渣淤。通過OSS規(guī)模和OR以及Performance Lower Bound的關(guān)系,作者發(fā)現(xiàn)得到如下結(jié)論:
a small OSS covers most of the solutions with the highest performance.
larger OSSs have higher OR they share more optimal (near-optimal) solutions with each other.
代碼生成
eDSL codes--->high level language(native code)--->auto-tuned code(blocking吉嫩,OpenMP价认,unrolling,SIMDization自娩,Compiler flag etc)
評估
Dataset
FDTD????3D 5-point stencil with order-1????computational electrodynamics.
HEAT????3D 7-point stencil with order-1????chemical di?usion
WAVE????3D 25-points stencil with order-4? ? fluid dynamics,
POISSON????3D 19-points stencil with order-1????mechanical engineering
HIMENO????3D 19-points with order-1? ? UNKNOWN
Comparation
Baseline:straightforward implementation
SDSL:The stencil domain specic language (SDSL)
Patus:The Patus stencil optimization framework
術(shù)語習(xí)得
自動優(yōu)化策略:
>search-based 檢索空間大用踩,研究者采用pruning,heuristic searching等優(yōu)化手段
>prediction-based 開銷小椒功,但并不好構(gòu)建捶箱。(對輸入敏感,near-optimal和optimal區(qū)分度不夠)
DSL(Domain Specific Language)
對于一些特定領(lǐng)域的問題动漾,構(gòu)建專門的DSL語言進(jìn)行描述丁屎。
執(zhí)行source to source的轉(zhuǎn)換,將DSL轉(zhuǎn)為某高級語言(C/CUDA等)旱眯;再對高級語言進(jìn)行優(yōu)化和代碼生成晨川。
我們可以認(rèn)為理想情況下,相關(guān)領(lǐng)域的專家可以非常容易地利用DSL進(jìn)行算法設(shè)計删豺,而不需掌握很多編程語言的知識共虑。不過為了DSL,我們顯然需要構(gòu)建相應(yīng)的編譯器以實現(xiàn)代碼的轉(zhuǎn)換呀页,比如使用ROSE妈拌,LLVM/Clang。
polyhedral compiler optimization
參見SDSL論文
相關(guān)文章
多平臺自動優(yōu)化
S. Hong, H. Cha, E. Sedlar, and K. Olukotun. Green-marl: a dsl for easy and ecient graph analysis. 2012
C. Matthias, S. Olaf, and B. Helmar. Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. 2011
T. Lutz, C. Fensch, and M. Cole. Partans: An autotuning framework for stencil computation on multi-gpu systems. 2013
M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame. A script-based autotuning compiler system to generate high-performance cuda code.