K8s & HPC
Requirement
Hello, Kubernetes newbie here. I'm currently running docker on a inhouse hpc (on each node) and submiting to them using a queuing/scheduling system (Slurm). I'd like to replicate similar capabilities in the cloud. E.g. workflow would be (might be wrong): give start signal from local hpc, upload docker container and scripts, create kubernetes cluster (probably around 10 nodes with 16 cores each), submit jobs, let them run, bring back data locally, shutdown cloud.
Does that sound right? Where should i start? Thanks.
http://www.d1net.com/storage/enterprise/442273.html
SGE :Grid Engine
AWS上的HPC
使用starCluster來構(gòu)建cluster,使用的queue system 是SGE
http://star.mit.edu/cluster/
https://github.com/spagnuolocarmine/amazonhpc
https://aws.amazon.com/solutions/case-studies/san-francisco-state-university/
Azure上的HPC
https://github.com/Azure/azure-bigcompute
HPC上的應(yīng)用,通常用MPI去開發(fā)并行計算的application.
并行計算
http://www.konvigne.com/index.php/2017/03/21/200/
并行計算和分布式計算的不同之處:
分布式計算是一種把需要進(jìn)行大量計算的工程數(shù)據(jù)分區(qū)成小塊承璃,由多臺聯(lián)網(wǎng)計算機(jī)分別處理,在上傳處理結(jié)果后,將結(jié)果統(tǒng)一合并得出數(shù)據(jù)結(jié)論的科學(xué)
首先來比較下分布式計算與并行計算的異同。其相同之處都是將復(fù)雜任務(wù)化簡為多個子任務(wù)诵棵,然后在多臺計算機(jī)同時運(yùn)算弧关。不同之處在于分布式計算是一個比較松散的結(jié)構(gòu),實(shí)時性要求不高髓废,可以跨越局域網(wǎng)在因特網(wǎng)部署運(yùn)行,大量的公益性項目(如黑洞探索该抒、藥物研究慌洪、蛋白質(zhì)結(jié)構(gòu)分析等)大多采用這種方式,而并行計算是需要各節(jié)點(diǎn)之間通過高速網(wǎng)絡(luò)進(jìn)行較為頻繁地通信凑保,節(jié)點(diǎn)之間具有較強(qiáng)的關(guān)聯(lián)性冈爹,主要部署在局域網(wǎng)內(nèi)。
在分布式計算的算法中欧引,我們更加關(guān)注的是計算機(jī)間的通信而不是算法的步驟频伤,因為分布式計算的通信代價比起單節(jié)點(diǎn)對整體性能的影響權(quán)重要大得多。