因?yàn)榻谝獙?xiě)嵌套for循環(huán),由于運(yùn)算量有點(diǎn)大,耗時(shí)比較久。所以就在谷歌上搜了搜有沒(méi)有辦法可以提升python for loop的速度抹估,然后就發(fā)現(xiàn)了非常好用的模塊:Numba
Numba makes Python code fast
官方網(wǎng)址:http://numba.pydata.org/
首先如果你沒(méi)安裝的話,可以通過(guò)pip install numba --user
裝一下舆床,或者如果你已經(jīng)安裝了Anaconda3的話棋蚌,那直接用conda安裝的python3就有這個(gè)模塊。
tips:用anaconda管理模塊挨队、軟件,解決環(huán)境沖突問(wèn)題蒿往,省時(shí)省力盛垦,附上linux上的安裝小教程
# download from tsinghua mirror site
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-5.3.1-Linux-x86_64.sh
# check the help message
bash Anaconda3-5.3.1-Linux-x86_64.sh -h
# then install or install into Nonexistent Custom Directory by adding -p
bash Anaconda3-5.3.1-Linux-x86_64.sh
# add to the environment
echo ". /home/saber/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc
Numba的用法很簡(jiǎn)單,一般是加速某個(gè)函數(shù)瓤漏。如果你想加速函數(shù)x腾夯,只需要在定義函數(shù)x的時(shí)候,在def前一行加上一個(gè)裝飾器@jit就行了(就簡(jiǎn)單的一行代碼)蔬充。
下面以筆者寫(xiě)的小例子進(jìn)行介紹蝶俱,這個(gè)例子主要計(jì)算a1到a2所有數(shù)的加和,并用time模塊來(lái)檢測(cè)函數(shù)的運(yùn)行時(shí)間:
from numba import jit
import time
#define function A without numba
def func_A(a1,a2):
A_result=0
for i in range(a1,a2):
A_result+=i
return A_result
#define func A1 with numba
#just add the @jit
@jit
def func_A1(a1,a2):
A1_result=0
for i in range(a1,a2):
A1_result+=i
return A1_result
#record the elasped time
def time_func(func_A_i,*args):
start = time.time()
func_A_i(*args)
end = time.time()
print("Elasped time of func %s is %.4e"%(func_A_i.__name__,end-start))
time_func(func_A,1,10000000)
time_func(func_A,1,10000000)
print()
time_func(func_A1,1,10000000)
time_func(func_A1,1,10000000)
其實(shí)能發(fā)現(xiàn)兩個(gè)函數(shù)的主體是完全一樣的饥漫,最主要的不同是在func_A1前面加了一句@jit榨呆。
運(yùn)行結(jié)果如下:
Elasped time of func func_A is 5.4757e-01
Elasped time of func func_A is 5.3267e-01
Elasped time of func func_A1 is 5.3686e-02
Elasped time of func func_A1 is 4.7684e-06
細(xì)心的讀者可能發(fā)現(xiàn)了,我對(duì)每個(gè)函數(shù)都運(yùn)行了2次庸队,func_A的時(shí)間幾乎一致积蜻,func_A1第二次的時(shí)間比第一次少了四個(gè)數(shù)量級(jí)闯割,這是因?yàn)榈诙蔚臅r(shí)間才是numba加速后函數(shù)執(zhí)行的時(shí)間。
通俗理解竿拆,numba第一次讀取函數(shù)時(shí)宙拉,會(huì)將函數(shù)轉(zhuǎn)換為計(jì)算更快的語(yǔ)言,這是編譯的過(guò)程丙笋,會(huì)消耗一些時(shí)間谢澈,之后numba將編譯存儲(chǔ)起來(lái),下次遇見(jiàn)同類型的數(shù)據(jù)御板,直接讀取編譯澳化,計(jì)算得到結(jié)果。官方解釋如下:
First, recall that Numba has to compile your function for the argument types given before it executes the machine code version of your function, this takes time. However, once the compilation has taken place Numba caches the machine code version of your function for the particular types of arguments presented. If it is called again the with same types, it can reuse the cached version instead of having to compile again.
所以總的來(lái)說(shuō)numba加速后速度提升還是很大的稳吮,特別是對(duì)有想加速python腳本需求的人來(lái)說(shuō)缎谷。
歡迎關(guān)注公眾號(hào):"生物信息學(xué)"