性能分析和調(diào)優(yōu)工具簡(jiǎn)介<a id="sec-1" name="sec-1"></a>
總會(huì)遇到一個(gè)時(shí)候你會(huì)想提高程序執(zhí)行效率,想看看哪部分耗時(shí)長(zhǎng)成為瓶頸疼燥,想知道程序運(yùn)行時(shí)內(nèi)存和CPU使用情況温艇。這時(shí)候你會(huì)需要一些方法對(duì)程序進(jìn)行性能分析和調(diào)優(yōu)鹅很。
Context Manager<a id="sec-1-1" name="sec-1-1"></a>
可以上下文管理器自己實(shí)現(xiàn)一個(gè)計(jì)時(shí)器, 參見(jiàn)之前的介紹 timeit
文章里做的那樣,通過(guò)定義類的 __enter__
和 __exit__
方法來(lái)實(shí)現(xiàn)對(duì)管理的函數(shù)計(jì)時(shí), 類似如:
# timer.py
import time
class Timer(object):
def __init__(self, verbose=False):
self.verbose = verbose
def __enter__(self):
self.start = time.time()
return self
def __exit__(self, *args):
self.end = time.time()
self.secs = self.end - self.start
self.msecs = self.secs * 1000 # 毫秒
if self.verbose:
print 'elapsed time: %f ms' % self.msecs
使用方式如下:
from timer import Timer
with Timer() as t:
foo()
print "=> foo() spends %s s" % t.secs
Decorator<a id="sec-1-2" name="sec-1-2"></a>
然而我認(rèn)為裝飾器的方式更加優(yōu)雅
import time
from functools import wraps
def timer(function):
@wraps(function)
def function_timer(*args, **kwargs):
t0 = time.time()
result = function(*args, **kwargs)
t1 = time.time()
print ("Total time running %s: %s seconds" %
(function.func_name, str(t1-t0))
)
return result
return function_timer
使用就很簡(jiǎn)單了:
@timer
def my_sum(n):
return sum([i for i in range(n)])
if __name__ == "__main__":
my_sum(10000000)
運(yùn)行結(jié)果:
? python profile.py
Total time running my_sum: 0.817697048187 seconds
系統(tǒng)自帶的time命令<a id="sec-1-3" name="sec-1-3"></a>
使用示例如下:
? time python profile.py
Total time running my_sum: 0.854454040527 seconds
python profile.py 0.79s user 0.18s system 98% cpu 0.977 total
上面的結(jié)果說(shuō)明: 執(zhí)行腳本消耗0.79sCPU時(shí)間厚者, 0.18秒執(zhí)行內(nèi)核函數(shù)消耗的時(shí)間躁劣,總共0.977s時(shí)間。
其中库菲, total時(shí)間 - (user時(shí)間 + system時(shí)間) = 消耗在輸入輸出和系統(tǒng)執(zhí)行其它任務(wù)消耗的時(shí)間
python timeit 模塊<a id="sec-1-4" name="sec-1-4"></a>
可以用來(lái)做benchmark, 可以方便的重復(fù)一個(gè)程序執(zhí)行的次數(shù)习绢,來(lái)查看程序可以運(yùn)行多塊。具體參考之前寫(xiě)的文章蝙昙。
cProfile<a id="sec-1-5" name="sec-1-5"></a>
直接看帶注釋的使用示例吧闪萄。
#coding=utf8
def sum_num(max_num):
total = 0
for i in range(max_num):
total += i
return total
def test():
total = 0
for i in range(40000):
total += i
t1 = sum_num(100000)
t2 = sum_num(200000)
t3 = sum_num(300000)
t4 = sum_num(400000)
t5 = sum_num(500000)
test2()
return total
def test2():
total = 0
for i in range(40000):
total += i
t6 = sum_num(600000)
t7 = sum_num(700000)
return total
if __name__ == "__main__":
import cProfile
# # 直接把分析結(jié)果打印到控制臺(tái)
# cProfile.run("test()")
# # 把分析結(jié)果保存到文件中
# cProfile.run("test()", filename="result.out")
# 增加排序方式
cProfile.run("test()", filename="result.out", sort="cumulative")
cProfile將分析的結(jié)果保存到result.out文件中,但是以二進(jìn)制形式存儲(chǔ)的奇颠,想直接查看的話用提供的 pstats
來(lái)查看败去。
import pstats
# 創(chuàng)建Stats對(duì)象
p = pstats.Stats("result.out")
# strip_dirs(): 去掉無(wú)關(guān)的路徑信息
# sort_stats(): 排序,支持的方式和上述的一致
# print_stats(): 打印分析結(jié)果烈拒,可以指定打印前幾行
# 和直接運(yùn)行cProfile.run("test()")的結(jié)果是一樣的
p.strip_dirs().sort_stats(-1).print_stats()
# 按照函數(shù)名排序圆裕,只打印前3行函數(shù)的信息, 參數(shù)還可為小數(shù),表示前百分之幾的函數(shù)信息
p.strip_dirs().sort_stats("name").print_stats(3)
# 按照運(yùn)行時(shí)間和函數(shù)名進(jìn)行排序
p.strip_dirs().sort_stats("cumulative", "name").print_stats(0.5)
# 如果想知道有哪些函數(shù)調(diào)用了sum_num
p.print_callers(0.5, "sum_num")
# 查看test()函數(shù)中調(diào)用了哪些函數(shù)
p.print_callees("test")
截取一個(gè)查看test()調(diào)用了哪些函數(shù)的輸出示例:
? python python profile.py
Random listing order was used
List reduced from 6 to 2 due to restriction <'test'>
Function called...
ncalls tottime cumtime
profile.py:24(test2) -> 2 0.061 0.077 profile.py:3(sum_num)
1 0.000 0.000 {range}
profile.py:10(test) -> 5 0.073 0.094 profile.py:3(sum_num)
1 0.002 0.079 profile.py:24(test2)
1 0.001 0.001 {range}
profile.Profile<a id="sec-1-5-1" name="sec-1-5-1"></a>
cProfile還提供了可以自定義的類,可以更精細(xì)的分析, 具體看文檔荆几。
格式如: class profile.Profile(timer=None, timeunit=0.0, subcalls=True, builtins=True)
下面這個(gè)例子來(lái)自官方文檔:
import cProfile, pstats, StringIO
pr = cProfile.Profile()
pr.enable()
# ... do something ...
pr.disable()
s = StringIO.StringIO()
sortby = 'cumulative'
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print s.getvalue()
lineprofiler<a id="sec-1-6" name="sec-1-6"></a>
lineprofiler是一個(gè)對(duì)函數(shù)進(jìn)行逐行性能分析的工具吓妆,可以參見(jiàn)github項(xiàng)目說(shuō)明,地址: https://github.com/rkern/line_profiler
示例<a id="sec-1-6-1" name="sec-1-6-1"></a>
#coding=utf8
def sum_num(max_num):
total = 0
for i in range(max_num):
total += i
return total
@profile # 添加@profile 來(lái)標(biāo)注分析哪個(gè)函數(shù)
def test():
total = 0
for i in range(40000):
total += i
t1 = sum_num(10000000)
t2 = sum_num(200000)
t3 = sum_num(300000)
t4 = sum_num(400000)
t5 = sum_num(500000)
test2()
return total
def test2():
total = 0
for i in range(40000):
total += i
t6 = sum_num(600000)
t7 = sum_num(700000)
return total
test()
通過(guò) kernprof
命令來(lái)注入分析吨铸,運(yùn)行結(jié)果如下:
? kernprof -l -v profile.py
Wrote profile results to profile.py.lprof
Timer unit: 1e-06 s
Total time: 3.80125 s
File: profile.py
Function: test at line 10
Line # Hits Time Per Hit % Time Line Contents
==============================================================
10 @profile
11 def test():
12 1 5 5.0 0.0 total = 0
13 40001 19511 0.5 0.5 for i in range(40000):
14 40000 19066 0.5 0.5 total += i
15
16 1 2974373 2974373.0 78.2 t1 = sum_num(10000000)
17 1 58702 58702.0 1.5 t2 = sum_num(200000)
18 1 81170 81170.0 2.1 t3 = sum_num(300000)
19 1 114901 114901.0 3.0 t4 = sum_num(400000)
20 1 155261 155261.0 4.1 t5 = sum_num(500000)
21 1 378257 378257.0 10.0 test2()
22
23 1 2 2.0 0.0 return total
hits(執(zhí)行次數(shù)) 和 time(耗時(shí)) 值高的地方是有比較大優(yōu)化空間的地方行拢。
memoryprofiler<a id="sec-1-7" name="sec-1-7"></a>
類似于"lineprofiler"對(duì)基于行分析程序內(nèi)存使用情況的模塊。github 地址:https://github.com/fabianp/memory_profiler 诞吱。ps:安裝 psutil
, 會(huì)分析的更快舟奠。
同樣是上面"lineprofiler"中的代碼竭缝,運(yùn)行 python -m memory_profiler profile.py
命令生成結(jié)果如下:
? python -m memory_profiler profile.py
Filename: profile.py
Line # Mem usage Increment Line Contents
================================================
10 24.473 MiB 0.000 MiB @profile
11 def test():
12 24.473 MiB 0.000 MiB total = 0
13 25.719 MiB 1.246 MiB for i in range(40000):
14 25.719 MiB 0.000 MiB total += i
15
16 335.594 MiB 309.875 MiB t1 = sum_num(10000000)
17 337.121 MiB 1.527 MiB t2 = sum_num(200000)
18 339.410 MiB 2.289 MiB t3 = sum_num(300000)
19 342.465 MiB 3.055 MiB t4 = sum_num(400000)
20 346.281 MiB 3.816 MiB t5 = sum_num(500000)
21 356.203 MiB 9.922 MiB test2()
22
23 356.203 MiB 0.000 MiB return total
TODO objgraph<a id="sec-1-8" name="sec-1-8"></a>
參考資料:<a id="sec-2" name="sec-2"></a>
- https://docs.python.org/2/library/profile.html#module-cProfile
- http://xianglong.me/article/analysis-python-application-performance-using-cProfile/
- http://www.cnblogs.com/btchenguang/archive/2012/02/03/2337112.html
- https://www.huyng.com/posts/python-performance-analysis
- http://www.marinamele.com/7-tips-to-time-python-scripts-and-control-memory-and-cpu-usage