[toc]
4 迭代器與生成器
for xxx in xxx理解
__iter__幾乎是for xx in xx 此設(shè)計(jì)的
- for xxx in xxx在迭代器中應(yīng)用
- 調(diào)用iter()方法,返回一個(gè)對(duì)象 itmes = iter(object)
- 調(diào)用此對(duì)象 items.next(), 捕捉到StopIteration后 結(jié)束
########for xxx in xxx理解
###########1
it = iter(lines) # it = lines.__iter__()
while True:
try:
next(it)
except StopIteration:
print 'finish iteration'
break
################2
class ABC(object):
def __iter__(self):
return ABC2()
class ABC2(object):
num = 100
def next(self):
self.num -= 1
if self.num < 0:
raise StopIteration()
return self.num
for i in ABC():
print i
- for xxx in xxx在生成器
運(yùn)行生成器內(nèi)部代碼,將yield的逐個(gè)值取出
####for xxx in xxx理解
def abc(x):
print 123
yield x
for i in abc('hello world')
print i
#abc('hello world')內(nèi)置__iter__,next,send,throw方法
##類(lèi)的實(shí)現(xiàn)
class ABC(object):
def __iter__(self):
return self.abc('hello world')
def abc(self, x):
print 123
yield x
for i in ABC():
print i
#和下面的等同
items = iter(ABC())
while True:
try:
print items.next()
except StopIteration:
break
迭代器
- 在python中實(shí)現(xiàn)了
__iter__
方法的對(duì)象可以迭代的,即可以調(diào)用內(nèi)置函數(shù)iter()方法,return對(duì)象. - 實(shí)現(xiàn)了next()方法的對(duì)象實(shí)迭代器,得到iter()返回的對(duì)象,不斷調(diào)用next()方法
class Fib(object):
def __init__(self):
self.a, self.b = 0, 1
def __iter__(self):
return self
def next(self):
self.a, self.b = self.b, self.a + self.b
if self.a > 100:
raise StopIteration()
return self.a
fetch = iter(Fib()) #獲取用于迭代的對(duì)象
while True:
#不斷調(diào)用next(),捕捉到StopIteration結(jié)束循環(huán)
try:
i = fetch.next()
print i
except StopIteration:
break
#等同for xxx in XXX
for i in Fib():
print i
生成器
生成器是對(duì)象.保留棧幀的上下文
def abc(x):
print 123
yield x
abc("hello world") # 此時(shí)并不會(huì)打印出123, 這一步執(zhí)行產(chǎn)生一個(gè)生成器對(duì)象
abc("hello world").next() #此時(shí)才去執(zhí)行生成器內(nèi)的代碼
4.4 實(shí)現(xiàn)深度優(yōu)先的遍歷樹(shù)形節(jié)點(diǎn)的生成器
- for 循環(huán)的理解
- yield 其他對(duì)象的 yield
class Node(object):
def __init__(self, value):
self._value = value
self._children =[]
def __repr__(self):
return "Node{!r}".format(self._value)
def __iter__(self):
return iter(self._children)
def add_child(self, node):
return self._children.append(node)
def depth_first(self):
"""
yield 相于return, 再次調(diào)用的時(shí)候,從原先的地方執(zhí)行
將其他的對(duì)象的yield后 再次yield出來(lái),分為兩次
"""
yield self
for c in self:
#下面的代碼相當(dāng)于yield from c.depth_first()
for items in c.depth_first():
yield items
if __name__ == '__main__':
root = Node(0)
child1 = Node(1)
child2 = Node(2)
child11 = Node(11)
child21 = Node(21)
root.add_child(child1)
root.add_child(child2)
child1.add_child(child11)
child2.add_child(child21)
child11.add_child(Node(100))
child21.add_child(Node(200))
child21.add_child(Node(201))
"""
對(duì)于for循環(huán)的理解:
ch 為形參,接受 root.depth_first() 返還的實(shí)參(相當(dāng)于return)
root.depth_first()必須實(shí)現(xiàn)迭代協(xié)議,可以為生成器
"""
for ch in root.depth_first():
print ch
depth_first() 方法首先返回(yield)本身 并迭代每一個(gè)節(jié)點(diǎn)的depth_first()方法,并返回(yield)對(duì)應(yīng)元素
- 傳統(tǒng)方法的實(shí)現(xiàn)凯沪,缺點(diǎn)繁瑣
class Node2(object):
def __init__(self, value):
self._value = value
self._children = []
def __repr__(self):
return "Node{!r}".format(self._value)
def __iter__(self):
return iter(self._children)
def add_child(self, node):
return self._children.append(node)
def depth_first(self):
return DepthFirstIterator(self)
class DepthFirstIterator(object):
def __init__(self, start_node):
self._node = start_node
self._children_iter = None
self._child_iter = None
def __iter__(self):
return self
def __next__(self):
if self._children_iter is None:
self._children_iter = iter(self._node)
return self._node
elif self._child_iter:
try:
nextchild = next(self._child_iter)
return nextchild
except StopIteration:
self._child_iter = None
return next(self)
else:
self._child_iter = next(self._children_iter).depth_first()
return next(self)
4.5 反向迭代
- 采用內(nèi)置的函數(shù) reversed()
- 必須實(shí)現(xiàn)內(nèi)置的reversed()方法
class Countdown(object):
def __init__(self, start):
self.start = start
def __iter__(self):
n = self.start
while n > 0:
yield n
n -= 1
def __reversed__(self):
n = 1
while n <= self.start:
yield n
n += 1
if __name__ == '__main__':
for rr in reversed(Countdown(30)):
print rr
for rr in Countdown(30):
print rr
4.6 帶有外部參數(shù)生成器函數(shù)
from collections import deque
class LineHistory:
def __init__(self, lines, hislen=3):
self.lines = lines
self.history = deque(maxlen=hislen)
def __iter__(self):
for lineno, line in enumerate(self.lines, 1):
self.history.append((lineno, line))
yield line
def clear(self):
self.history.clear()
4.7 迭代器切片
得到迭代器生成的切片對(duì)象
import itertools
def count(n):
while True:
yield n
n += 1
c = count(0)
# c[10:20] >>>TypeError: 'generator' object has no attribute '__getitem__'
for items in itertools.islice(c, 10, 21):
print items
函數(shù) islice() 返回一個(gè)可以生成指定元素的迭代器,它通過(guò)遍
歷并丟棄直到切片開(kāi)始索引位置的所有元素第焰。然后才開(kāi)始一個(gè)個(gè)的返回元素,并直到切片結(jié)束索引位置。缺點(diǎn)不能重復(fù)使用迭代器里面的數(shù)據(jù)
4.8 跳過(guò)不需要的迭代部分
妨马?挺举??跳過(guò)一個(gè)可迭代對(duì)象的開(kāi)始部分烘跺,對(duì)后面的不影響湘纵?
創(chuàng)建一個(gè)迭代器,
只要函數(shù)predicate(item)為T(mén)rue滤淳,就丟棄iterable中的項(xiàng)梧喷,
如果predicate返回False,就會(huì)生成iterable中的項(xiàng)和所有后續(xù)項(xiàng)脖咐。
from itertools import dropwhile
with open('manage.py') as f:
for line in dropwhile(lambda line: line.startwith("#"), f):
print line
4.9 排列組合實(shí)現(xiàn)
比如排列A23 铺敌,組合 C23等
from itertools import permutations,combinations屁擅, combinations_with_replacement
items = ['a', 'b', 'c']
for c in permutations(items) # 排列A33
for c in permutations(items, 2) # 排列A33
for c in combinations(items, 3) # 組合 C23
for c in combinations_with_replacement(items, 3) # 同一元素重復(fù)使用 3*3*3
4.10 序列上索引迭代
my_list = ['a', 'b', 'c']
for idx, val in enumerate(my_list, 1):
print(idx, val)
這種情況在你遍歷文件時(shí)想在錯(cuò)誤消息中使用行號(hào)定位時(shí)候非常有用:
def parse_data(filename):
with open(filename, 'rt') as f:
for lineno, line in enumerate(f, 1):
fields = line.split()
try:
count = int(fields[1])
...
except ValueError as e:
print('Line {}: Parse error: {}'.format(lineno, e))
data = [ (1, 2), (3, 4), (5, 6), (7, 8) ]
for n, (x, y) in enumerate(data):
4.11 迭代多個(gè)序列 zip()
zip() 會(huì)創(chuàng)建一個(gè)迭代器來(lái)作為結(jié)果返回
- 基本用法 壓縮
a = [1, 2, 3]
b = ['w', 'x', 'y', 'z']
for i in zip(a,b):
print(i)
>>>(1,'w')
>>>(2,'x')
>>>(3,'y')
from itertools import zip_longest
for i in zip_longest(a,b,fillvalue=None):
print(i)
>>>(1, 'w')
>>>(2, 'x')
>>>(3, 'y')
>>>(None, 'z')
- 打包字典偿凭,變?yōu)榱斜?/li>
headers = ['name', 'shares', 'price']
values = ['ACME', 100, 490.1]
s = dict(zip(headers,values))
list(zip(headers, values))
- zip() 可以接受多于兩個(gè)的序列的參數(shù) zip(a, b, c)
4.12 不同集合上元素的迭代 chain()
from itertools import chain
a = [1, 2, 3, 4]
b = ['x', 'y', 'z']
for x in chain(a, b):
print(x)
a,b可以為不同的類(lèi)型 chain(set派歌,list)甚至是chain(dict弯囊,list)
# Inefficent
for x in a + b:
# Better
for x in chain(a, b):
第一種方案中, a + b 操作會(huì)創(chuàng)建一個(gè)全新的序列并要求 a 和 b 的類(lèi)型一致
chian() 不會(huì)有這一步,所以如果輸入序列非常大的時(shí)候會(huì)很省內(nèi)存。并且當(dāng)可迭代
對(duì)象類(lèi)型不一樣的時(shí)候 chain() 同樣可以很好的工作硝皂。
4.13 創(chuàng)建數(shù)據(jù)管道
os.walk 從文件夾某個(gè)位置開(kāi)始遍歷
# x為當(dāng)前的目錄 y為當(dāng)前目錄下包含的文件夾 z 為當(dāng)前目錄下的文件
for x, y, z, in os.walk(r"D:\Workspace\sell"):
for zpieces in z :
print '{}{}'.format(x,zpieces)
fnmatch.filter(filellist, filepat)
filelist為list則返回符合filepart的文件
filelist為str 則返回布爾值
# encoding:utf-8
import os
import fnmatch
import gzip
import bz2
import re
def gen_find(filepat, top):
"""
根據(jù)filepat的文件類(lèi)型,查找當(dāng)前目錄下的文件
"""
for path, dirlist, filelist in os.walk(top):
# 過(guò)濾符合格式的地址并返回
for name in fnmatch.filter(filelist, filepat):
yield os.path.join(path, name) # 文件的絕對(duì)地址的生成器
def gen_opener(filenames):
"""
打開(kāi)文件,yield文件,并關(guān)閉
"""
for filename in filenames: # 從生成器中取出絕對(duì)地址 filename為地址 filenames為含地址的生成器
if filename.endswith('.gz'):
f = gzip.open(filename, 'rt')
elif filename.endswith('.bz2'):
f = bz2.open(filename,"rt")
##todo 可能有問(wèn)題
else:
f = open(filename, "r")
yield f # 文件對(duì)象的生成器
f.close()
def gen_concatenate(iterators):
for it in iterators: #it為文件對(duì)象常挚,iterators是文件對(duì)象生成器
for items in it: # items 句子 it文件對(duì)象
yield items # 拋出句子生成器 在外部用for xx in xx得到
def gen_grep(pattern, lines):
"""
匹配文中的語(yǔ)句
"""
pat = re.compile(pattern)
for line in lines:
if pat.search(line):
yield line
lognames = gen_find("*.py", r"D:\Workspace\sell")
files = gen_opener(lognames)
lines = gen_concatenate(files)
pylines = gen_grep(r'^class ', lines) # 打印類(lèi)名
for line in pylines:
print line
#todo 不太懂
#bytecolumn = (line.rsplit(None,1)[1] for line in pylines)
#bytes = (int(x) for x in bytecolumn if x != '-')
#print('Total', sum(bytes))
看不懂嵌套的生成器,請(qǐng)看下面的例子
def gen1 ():
for i in [[1,2,3,4,5],[6,7,8,9,0]]:
yield i
def gen2 (i):
for j in i:
for k in j:
yield k
g1 = gen1()
g2 = gen2(g1)
for x in g2:
print x
不太靠譜的理解 稽物,for xx in xx 可以解開(kāi)生成器奄毡,要想得到生成器里的內(nèi)容,for xx in xx 層數(shù)大于生成器嵌套的層數(shù)贝或。
4.14 遞歸生成器展開(kāi)嵌套的序列
原代碼采用 yield from 實(shí)現(xiàn)python2 不支持可用 for i in xx : yield i 代替
# encoding:utf-8
from collections import Iterable
def flatten(items, ignore_types=(str, bytes)):
for x in items:
#isinstance(x, Iterable) 判斷是否可以迭代 吼过,可以則繼續(xù)遞歸
#not isinstance(x, ignore_types),排除字符串咪奖,字節(jié)盗忱,這兩者也可以迭代
if isinstance(x, Iterable) and not isinstance(x, ignore_types):
for i in flatten(x):
yield i
else:
yield x
items1 = [1, 2, [3, 4, [5, 6], 7], 8]
items2 = ['Dave', 'Paula', ['Thomas', 'Lewis']]
l1 = [x for x in flatten(items1)]
l2 = [x for x in flatten(items2)]
4.15 有序?qū)ο蠛喜⒃倥判?/h3>
heapq.merge()
heapq.merge 生成器迭代特性意味著它不會(huì)立馬讀取所有序列。這就意味著你可以在非
常長(zhǎng)的序列中使用它,而不會(huì)有太大的開(kāi)銷(xiāo)
import heapq
a = [1, 4, 7, 10]
b = [2, 5, 6, 11]
l = [x for x in heapq.merge(a, b)] ##heapq.merge(a, b)是生成器
>>>[1, 2, 4, 5, 6, 7, 10, 11]
4.16 迭代器代替while循環(huán)
其實(shí)就是用遍歷代替while.
途徑:iter(functiong, status)能夠迭代,具體參考本節(jié)末尾
常見(jiàn)的IO程序,偽代碼
CHUNKSIZE = 8192
def reader(s):
while True:
data = s.recv(CHUNKSIZE)
if data == b'':
break
process_data(data)
f = open("views.py", "r")
reader(f)
#用iter()循環(huán)代替
def reader2(s):
for chunk in iter(lambda : s.recv(CHUNKSIZE),b""):
pass
#process_data(data)
實(shí)例代碼
import sys
f = open("views.py","r")
for chunk in iter(lambda: f.read(10), ""):
n = sys.stdout.write(chunk)
iter()內(nèi)置函數(shù):
單參數(shù)時(shí)Iter(func),fun對(duì)象支持迭代協(xié)議羊赵,不然報(bào)錯(cuò)
兩個(gè)參數(shù)時(shí)Iter(func趟佃,arg)扇谣,它接受一個(gè)可選的 callable 對(duì)象和一個(gè)標(biāo)記 (結(jié)
尾) 值作為輸入?yún)?shù),不斷調(diào)用next(),func返回值和標(biāo)記一樣時(shí)闲昭,拋出StopIteration
x = 0
def func():
global x
x +=1
print x
return x
while True:
i = iter(func,100)
try:
i.next() #renturn值為100的時(shí)候拋出StopIteration
except StopIteration:
print '停止迭代'
break
本章總結(jié)
- 迭代器
迭代器協(xié)議幾乎是為 for xx inxx設(shè)計(jì)的,
什么是迭代器呢罐寨?遵循__iter__,next()
這兩個(gè)協(xié)議的對(duì)象。即為__iter__
指向迭代的某個(gè)對(duì)象序矩,這個(gè)對(duì)象有next()方法鸯绿。不斷的調(diào)用next
(),拋出Stopiteration迭代結(jié)束簸淀。 - 生成器
涉及太多不在本章展開(kāi)