Python Cookbook筆記

Python Cookbook

標簽(空格分隔): Python


Data Structures and Algorithms

Iteration is fundamental to data processing. And when scanning datasets that don’t fit in memory, we need a way to fetch the items lazily, that is, one at a time and on demand. This is what the Iterator pattern is about.

The yield keyword allows the construction of generators, which work as iterators.

Every generator is an iterator: generators fully implement the iterator interface. But an iterator—as defined in the GoF book - retrieves items from a collection.

Every collection in Python is iterable, and iterators are used internally to support:

  • for loops
  • Collection types construction and extension
  • Looping over text files line by line
  • List, dict, and set comprehensions
  • Tuple unpacking
  • Unpacking actual parameters with * in function calls

Sentence Take #1: A Sequence of Words

Example 14-1. sentence.py: A Sentence as a sequence of words.

import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text) 

    def __getitem__(self, index):
        return self.words[index]  

    def __len__(self, index): 
        return len(self.words)

    def __repr__(self):
        # reprlib.repr is a utility function to generate abbreviated string representations of data structures that can be very large.
        return 'Sentence(%s)' % reprlib.repr(self.text)

Example 14-2. Testing iteration on a Sentence instance

>>> s = Sentence('"The time has come," the Walrus said,')
>>> s
Sentence('"The time ha... Walrus said,') 
>>> for word in s: 
...     print(word)
The
time
has
come
the
Walrus
said
>>> list(s)
['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']
>>> s[0]
'The'
>>> s[5]
'Walrus'
>>> s[-1]
'said'

Why Sequences Are Iterable: The iter Function

Whenever the interpreter needs to iterate over an object x, it automatically calls iter(x).

The iter built-in function:

  1. Checks whether the object implements iter, and calls that to obtain an iterator.
  2. If __iter__ is not implemented, but __getitem__ is implemented, Python creates an iterator that attempts to fetch items in order, starting from index 0 (zero).
  3. If that fails, Python raises TypeError, usually saying “C object is not iterable,” where C is the class of the target object.

That is why any Python sequence is iterable: they all implement __getitem__. In fact, the standard sequences also implement __iter__, and yours should too, because the special handling of __getitem__ exists for backward compatibility reasons and may be gone in the future.

An object is considered iterable if it implements the __iter__ method. No subclassing or registration is required.

>>> class Foo:
...     def __iter__(self):
...         pass
...
>>> from collections import abc
>>> issubclass(Foo, abc.Iterable)
True
>>> f = Foo()
>>> isinstance(f, abc.Iterable)
True

As of Python 3.4, the most accurate way to check whether an object x is iterable is to call iter(x) and handle a TypeError exception if it isn’t. This is more accurate than using isinstance(x, abc.Iterable), because iter(x) also considers the legacy __getitem__ method, while the Iterable ABC does not.

Iterables Versus Iterators

It’s important to be clear about the relationship between iterables and iterators: Python obtains iterators from iterables.

Here is a simple for loop iterating over a str. The str 'ABC' is the iterable here.

>>> s = 'ABC'
>>> for char in s:
... print(char)
...
A
B
C

If there was no for statement and we had to emulate the for machinery by hand with a while loop.

>>> s = 'ABC'
>>> it = iter(s)
>>> while True:
...     try:
...         print(next(it)) 
...     except StopIteration:
...         del it
...         break
...
A
B
C

StopIteration signals that the iterator is exhausted. This exception is handled internally in for loops and other iteration contexts like list comprehensions, tuple unpacking, etc.

The standard interface for an iterator has two methods:

  • __next__: Returns the next available item, raising StopIteration when there are no more items.
  • __iter__: Returns self; this allows iterators to be used where an iterable is expected, for example, in a for loop.
p018.png-109.4kB
p018.png-109.4kB

Example 14-3. abc.Iterator class; extracted from Lib/_collections_abc.py

class Iterator(Iterable):
    __slots__ = ()
    
    @abstractmethod
    def __next__(self):
    'Return the next item from the iterator. When exhausted, raise StopIteration'
        raise StopIteration
        
    def __iter__(self):
        return self
        
    @classmethod
    def __subclasshook__(cls, C):
        if cls is Iterator:
            if (any("__next__" in B.__dict__ for B in C.__mro__) and
                any("__iter__" in B.__dict__ for B in C.__mro__)):
                return True
            return NotImplemented

The best way to check if an object x is an iterator is to call isinstance(x, abc.Iterator). Thanks to Iterator.__subclasshook__, this test works even if the class of x is not a real or virtual subclass of Iterator.

You can clearly see how the iterator is built by iter(…) and consumed by next(…) using the Python console.

>>> s3 = Sentence('Pig and Pepper') 
>>> it = iter(s3) 
>>> it
<iterator object at 0x...>
>>> next(it) 
'Pig'
>>> next(it)
'and'
>>> next(it)
'Pepper'
>>> next(it) 
Traceback (most recent call last):
...
StopIteration
>>> list(it) 
[]
>>> list(iter(s3)) 
['Pig', 'and', 'Pepper']

Because the only methods required of an iterator are __next__ and __iter__, there is no way to check whether there are remaining items, other than to call next() and catch StopInteration. Also, it’s not possible to “reset” an iterator. If you need to start over, you need to call iter(…) on the iterable that built the iterator in the first place.

  • iterator: Any object that implements the __next__ no-argument method that returns the next item in a series or raises StopIteration when there are no more items. Python iterators also implement the __iter__ method so they are iterable as well.

Sentence Take #2: A Classic Iterator

The next Sentence class is built according to the classic Iterator design pattern following the blueprint in the GoF book.

Example 14-4. sentence_iter.py: Sentence implemented using the Iterator pattern.

import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):  
        # __iter__ fulfills the iterable protocol by instantiating and returning an iterator.
        return SentenceIterator(self.words) 


class SentenceIterator:

    def __init__(self, words):
        self.words = words 
        self.index = 0 

    def __next__(self):
        try:
            word = self.words[self.index]  
        except IndexError:
            raise StopIteration()  
        self.index += 1  
        return word  

    def __iter__(self): 
        return self

Note that implementing __iter__ in SentenceIterator is not actually needed for this example to work, but the it’s the right thing to do: iterators are supposed to implement both __next__ and __iter__, and doing so makes our iterator pass the issubclass(SentenceInterator, abc.Iterator) test.

Making Sentence an Iterator: Bad Idea

A common cause of errors in building iterables and iterators is to confuse the two. To be clear: iterables have an __iter__ method that instantiates a new iterator every time. Iterators implement a __next__ method that returns individual items, and an __iter__ method that returns self.

Therefore, iterators are also iterable, but iterables are not iterators.

It may be tempting to implement __next__ in addition to __iter__ in the Sentence class, making each Sentence instance at the same time an iterable and iterator over itself. But this is a terrible idea.

A proper implementation of the pattern requires each call to iter(my_itera ble) to create a new, independent, iterator. That is why we need the SentenceIterator class in this example.

An iterable should never act as an iterator over itself. In other words, iterables must implement __iter__, but not __next__. On the other hand, for convenience, iterators should be iterable. An iterator’s __iter__ should just return self.

Sentence Take #3: A Generator Function

A Pythonic implementation of the same functionality uses a generator function to replace the SequenceIterator class.

Example 14-5. sentence_gen.py: Sentence implemented using a generator function.

import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):
        for word in self.words: 
            yield word 
        return #This return is not needed

How a Generator Function Works

Any Python function that has the yield keyword in its body is a generator function.

Here is the simplest function useful to demonstrate the behavior of a generator:

>>> def gen_123(): 
...     yield 1
...     yield 2
...     yield 3
...
>>> gen_123
<function gen_123 at 0x...>
>>> gen_123()
<generator object gen_123 at 0x...> 
>>> for i in gen_123(): 
...     print(i)
1
2
3
>>> g = gen_123()
>>> next(g) 
1
>>> next(g)
2
>>> next(g)
3
>>> next(g) 
Traceback (most recent call last):
...
StopIteration

When we invoke next(…) on the generator object, execution advances to the next yield in the function body, and the next(…) call evaluates to the value yielded when the function body is suspended.

Example 14-6. A generator function that prints messages when it runs.

>>> def gen_AB():
...     print('start')
...     yield 'A' 
...     print('continue')
...     yield 'B' 
...     print('end.') 
...
>>> for c in gen_AB(): 
...     print('-->', c) 
...
start
--> A
continue
--> B
end.

In Example 14-5 __iter__ is a generator function which, when called, builds a generator object that implements the iterator interface, so the SentenceIterator class is no longer needed.

This second version of Sentence is much shorter than the first, but it’s not as lazy as it could be. Nowadays, laziness is considered a good trait, at least in programming languages and APIs. A lazy implementation postpones producing values to the last possible moment. This saves memory and may avoid useless processing as well.

Sentence Take #4: A Lazy Implementation

The Iterator interface is designed to be lazy: next(my_iterator) produces one item at a time.

Our Sentence implementations so far have not been lazy because the __init__ eagerly builds a list of all words in the text, binding it to the self.words attribute. This will entail processing the entire text, and the list may use as much memory as the text itself.

The re.finditer function is a lazy version of re.findall which, instead of a list, returns a generator producing re.MatchObject instances on demand.

Example 14-7. sentence_gen2.py: Sentence implemented using a generator function calling the re.finditer generator function.

import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:

    def __init__(self, text):
        self.text = text

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):
        for match in RE_WORD.finditer(self.text): 
            yield match.group()

Generator functions are an awesome shortcut, but the code can be made even shorter with a generator expression.

Sentence Take #5: A Generator Expression

A generator expression can be understood as a lazy version of a list comprehension. In other words, if a list comprehension is a factory of lists, a generator expression is a factory of generators.

Example 14-8. The gen_AB generator function is used by a list comprehension, then by a generator expression.

>>> def gen_AB(): 
...     print('start')
...     yield 'A'
...     print('continue')
...     yield 'B'
...     print('end.')
...
>>> res1 = [x*3 for x in gen_AB()]
start
continue
end.
>>> for i in res1:
...     print('-->', i)
...
--> AAA
--> BBB
>>> res2 = (x*3 for x in gen_AB())
>>> res2
<generator object <genexpr> at 0x10063c240>
>>> for i in res2:
...     print('-->', i)
...
start
--> AAA
continue
--> BBB
end.

Example 14-9. sentence_genexp.py: Sentence implemented using a generator expression.

import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:

    def __init__(self, text):
        self.text = text

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):
        return (match.group() for match in RE_WORD.finditer(self.text))

Generator Expressions: When to Use Them

In Example 14-9, we saw that a generator expression is a syntactic shortcut to create a generator without defining and calling a function. On the other hand, generator functions are much more flexible: you can code complex logic with multiple statements, and can even use them as coroutines(協(xié)同程序).

Another Example: Arithmetic Progression Generator

Example 14-11. The ArithmeticProgression class.

class ArithmeticProgression:

    def __init__(self, begin, step, end=None):  
        self.begin = begin
        self.step = step
        self.end = end  # None -> "infinite" series

    def __iter__(self):
        result = type(self.begin + self.step)(self.begin)  
        forever = self.end is None 
        index = 0
        while forever or result < self.end: 
            yield result 
            index += 1
            result = self.begin + self.step * index 

Example 14-10. Demonstration of an ArithmeticProgression class.

>>> ap = ArithmeticProgression(0, 1, 3)
>>> list(ap)
[0, 1, 2]
>>> ap = ArithmeticProgression(1, .5, 3)
>>> list(ap)
[1.0, 1.5, 2.0, 2.5]
>>> ap = ArithmeticProgression(0, 1/3, 1)
>>> list(ap)
[0.0, 0.3333333333333333, 0.6666666666666666]
>>> from fractions import Fraction
>>> ap = ArithmeticProgression(0, Fraction(1, 3), 1)
>>> list(ap)
[Fraction(0, 1), Fraction(1, 3), Fraction(2, 3)]
>>> from decimal import Decimal
>>> ap = ArithmeticProgression(0, Decimal('.1'), .3)
>>> list(ap)
[Decimal('0.0'), Decimal('0.1'), Decimal('0.2')]

Example 14-12 shows a generator function called aritprog_gen that does the same job as ArithmeticProgression but with less code.

Example 14-12. The aritprog_gen generator function

def aritprog_gen(begin, step, end=None):
    result = type(begin + step)(begin)
    forever = end is None
    index = 0
    while forever or result < end:
        yield result
        index += 1
        result = begin + step * index

There are plenty of ready-to-use generators in the standard library, and the next section will show an even cooler implementation using the itertools module.

Arithmetic Progression with itertools

The itertools module in Python 3.4 has 19 generator functions that can be combined in a variety of interesting ways.

For example, the itertools.count function returns a generator that produces numbers.

>>> import itertools
>>> gen = itertools.count(1, .5)
>>> next(gen)
1
>>> next(gen)
1.5
>>> next(gen)
2.0
>>> next(gen)
2.5

However, itertools.count never stops, so if you call list(count()), Python will try to build a list larger than available memory and your machine will be very grumpy long before the call fails.

On the other hand, there is the itertools.takewhile function: it produces a generator that consumes another generator and stops when a given predicate evaluates to False.

>>> gen = itertools.takewhile(lambda n: n < 3, itertools.count(1, .5))
>>> list(gen)
[1, 1.5, 2.0, 2.5]

Example 14-13. aritprog_v3.py: this works like the previous aritprog_gen functions.

import itertools


def aritprog_gen(begin, step, end=None):
    first = type(begin + step)(begin)
    ap_gen = itertools.count(first, step)
    if end is not None:
        ap_gen = itertools.takewhile(lambda n: n < end, ap_gen)
    return ap_gen

When implementing generators, know what is available in the standard library, otherwise there’s a good chance you’ll reinvent the wheel.

Generator Functions in the Standard Library

The os.walk generator function is impressive.

I summarize two dozen of them, from the built-in, itertools, and functools modules.

The first group are filtering generator functions: they yield a subset of items produced by the input iterable, without changing the items themselves.

p019.png-82.3kB
p019.png-82.3kB

p020.png-70.2kB
p020.png-70.2kB

Example 14-14. Filtering generator functions examples.

>>> def vowel(c):
...     return c.lower() in 'aeiou'
...
>>> list(filter(vowel, 'Aardvark'))
['A', 'a', 'a']
>>> import itertools
>>> list(itertools.filterfalse(vowel, 'Aardvark'))
['r', 'd', 'v', 'r', 'k']
>>> list(itertools.dropwhile(vowel, 'Aardvark'))
['r', 'd', 'v', 'a', 'r', 'k']
>>> list(itertools.takewhile(vowel, 'Aardvark'))
['A', 'a']
>>> list(itertools.compress('Aardvark', (1,0,1,1,0,1)))
['A', 'r', 'd', 'a']
>>> list(itertools.islice('Aardvark', 4))
['A', 'a', 'r', 'd']
>>> list(itertools.islice('Aardvark', 4, 7))
['v', 'a', 'r']
>>> list(itertools.islice('Aardvark', 1, 7, 2))
['a', 'd', 'a']

The next group are the mapping generators: they yield items computed from each individual item in the input iterable—or iterables, in the case of map and starmap.

p021.png-147kB
p021.png-147kB

Example 14-15. itertools.accumulate generator function examples.

>>> sample = [5, 4, 2, 8, 7, 6, 3, 0, 9, 1]
>>> import itertools
>>> list(itertools.accumulate(sample)) 
[5, 9, 11, 19, 26, 32, 35, 35, 44, 45]
>>> list(itertools.accumulate(sample, min)) 
[5, 4, 2, 2, 2, 2, 2, 0, 0, 0]
>>> list(itertools.accumulate(sample, max)) 
[5, 5, 5, 8, 8, 8, 8, 8, 9, 9]
>>> import operator
>>> list(itertools.accumulate(sample, operator.mul)) 
[5, 20, 40, 320, 2240, 13440, 40320, 0, 0, 0]
>>> list(itertools.accumulate(range(1, 11), operator.mul))
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]

Example 14-16. Mapping generator function examples.

>>> list(enumerate('albatroz', 1))
[(1, 'a'), (2, 'l'), (3, 'b'), (4, 'a'), (5, 't'), (6, 'r'), (7, 'o'), (8, 'z')]
>>> import operator
>>> list(map(operator.mul, range(11), range(11)))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>> list(map(operator.mul, range(11), [2, 4, 8]))
[0, 4, 16]
>>> list(map(lambda a, b: (a, b), range(11), [2, 4, 8]))
[(0, 2), (1, 4), (2, 8)]
>>> import itertools
>>> list(itertools.starmap(operator.mul, enumerate('albatroz', 1))) 
['a', 'll', 'bbb', 'aaaa', 'ttttt', 'rrrrrr', 'ooooooo', 'zzzzzzzz']
>>> sample = [5, 4, 2, 8, 7, 6, 3, 0, 9, 1]
>>> list(itertools.starmap(lambda a, b: b/a,
...     enumerate(itertools.accumulate(sample), 1))) 
[5.0, 4.5, 3.6666666666666665, 4.75, 5.2, 5.333333333333333, 5.0, 4.375, 4.888888888888889, 4.5]

Next, we have the group of merging generators—all of these yield items from multiple input iterables. chain and chain.from_iterable consume the input iterables sequentially (one after the other), while product, zip, and zip_longest consume the input iterables in parallel.

p022.png-186kB
p022.png-186kB

Example 14-17. Merging generator function examples.

>>> list(itertools.chain('ABC', range(2))) 
['A', 'B', 'C', 0, 1]
>>> list(itertools.chain(enumerate('ABC'))) 
[(0, 'A'), (1, 'B'), (2, 'C')]
>>> list(itertools.chain.from_iterable(enumerate('ABC'))) 
[0, 'A', 1, 'B', 2, 'C']
>>> list(zip('ABC', range(5))) 
[('A', 0), ('B', 1), ('C', 2)]
>>> list(zip('ABC', range(5), [10, 20, 30, 40])) 
[('A', 0, 10), ('B', 1, 20), ('C', 2, 30)]
>>> list(itertools.zip_longest('ABC', range(5))) 
[('A', 0), ('B', 1), ('C', 2), (None, 3), (None, 4)]
>>> list(itertools.zip_longest('ABC', range(5), fillvalue='?')) 
[('A', 0), ('B', 1), ('C', 2), ('?', 3), ('?', 4)]

Example 14-18. itertools.product generator function examples.

>>> list(itertools.product('ABC', range(2))) 
[('A', 0), ('A', 1), ('B', 0), ('B', 1), ('C', 0), ('C', 1)]
>>> suits = 'spades hearts diamonds clubs'.split()
>>> list(itertools.product('AK', suits)) 
[('A', 'spades'), ('A', 'hearts'), ('A', 'diamonds'), ('A', 'clubs'), ('K', 'spades'), ('K', 'hearts'), ('K', 'diamonds'), ('K', 'clubs')]
>>> list(itertools.product('ABC')) 
[('A',), ('B',), ('C',)]
>>> list(itertools.product('ABC', repeat=2))
[('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'B'), ('B', 'C'), ('C', 'A'), ('C', 'B'), ('C', 'C')]
>>> list(itertools.product(range(2), repeat=3))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]
>>> rows = itertools.product('AB', range(2), repeat=2)
>>> for row in rows: print(row)
...
('A', 0, 'A', 0)
('A', 0, 'A', 1)
('A', 0, 'B', 0)
('A', 0, 'B', 1)
('A', 1, 'A', 0)
('A', 1, 'A', 1)
('A', 1, 'B', 0)
('A', 1, 'B', 1)
('B', 0, 'A', 0)
('B', 0, 'A', 1)
('B', 0, 'B', 0)
('B', 0, 'B', 1)
('B', 1, 'A', 0)
('B', 1, 'A', 1)
('B', 1, 'B', 0)
('B', 1, 'B', 1)

Some generator functions expand the input by yielding more than one value per input item.

p023.png-159.4kB
p023.png-159.4kB

p024.png-28.9kB
p024.png-28.9kB

Example 14-19. count, cycle, and repeat.

>>> ct = itertools.count() 
>>> next(ct) 
0
>>> next(ct), next(ct), next(ct) 
(1, 2, 3)
>>> list(itertools.islice(itertools.count(1, .3), 3)) 
[1, 1.3, 1.6]
>>> cy = itertools.cycle('ABC') 
>>> next(cy)
'A'
>>> list(itertools.islice(cy, 7)) 
['B', 'C', 'A', 'B', 'C', 'A', 'B']
>>> rp = itertools.repeat(7) 
>>> next(rp), next(rp)
(7, 7)
>>> list(itertools.repeat(8, 4)) 
[8, 8, 8, 8]
>>> list(map(operator.mul, range(11), itertools.repeat(5))) 
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]

Example 14-20. Combinatoric generator functions yield multiple values per input item.

>>> list(itertools.combinations('ABC', 2))
[('A', 'B'), ('A', 'C'), ('B', 'C')]
>>> list(itertools.combinations_with_replacement('ABC', 2))
[('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'B'), ('B', 'C'), ('C', 'C')]
>>> list(itertools.permutations('ABC', 2))
[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
>>> list(itertools.product('ABC', repeat=2))
[('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'B'), ('B', 'C'), ('C', 'A'), ('C', 'B'), ('C', 'C')]

The last group of generator functions we’ll cover in this section are designed to yield all items in the input iterables, but rearranged in some way.

p025.png-55.4kB
p025.png-55.4kB

p026.png-56.9kB
p026.png-56.9kB

Example 14-21. itertools.groupby.

>>> list(itertools.groupby('LLLLAAGGG'))
[('L', <itertools._grouper object at 0x102227cc0>),
('A', <itertools._grouper object at 0x102227b38>),
('G', <itertools._grouper object at 0x102227b70>)]
>>> for char, group in itertools.groupby('LLLLAAAGG'):
...     print(char, '->', list(group))
...
L -> ['L', 'L', 'L', 'L']
A -> ['A', 'A',]
G -> ['G', 'G', 'G']
>>> animals = ['duck', 'eagle', 'rat', 'giraffe', 'bear',
...             'bat', 'dolphin', 'shark', 'lion']
>>> animals.sort(key=len)
>>> animals
['rat', 'bat', 'duck', 'bear', 'lion', 'eagle', 'shark', 'giraffe', 'dolphin']
>>> for length, group in itertools.groupby(animals, len):
...     print(length, '->', list(group))
...
3 -> ['rat', 'bat']
4 -> ['duck', 'bear', 'lion']
5 -> ['eagle', 'shark']
7 -> ['giraffe', 'dolphin']
>>> for length, group in itertools.groupby(reversed(animals), len):
...     print(length, '->', list(group))
...
7 -> ['dolphin', 'giraffe']
5 -> ['shark', 'eagle']
4 -> ['lion', 'bear', 'duck']
3 -> ['bat', 'rat']

Example 14-22. itertools.tee yields multiple generators, each yielding every item of the input generator.

>>> list(itertools.tee('ABC'))
[<itertools._tee object at 0x10222abc8>, <itertools._tee object at 0x10222ac08>]
>>> g1, g2 = itertools.tee('ABC')
>>> next(g1)
'A'
>>> next(g2)
'A'
>>> next(g2)
'B'
>>> list(g1)
['B', 'C']
>>> list(g2)
['C']
>>> list(zip(*itertools.tee('ABC')))
[('A', 'A'), ('B', 'B'), ('C', 'C')]

Iterable Reducing Functions

The functions in Table 14-6 all take an iterable and return a single result. Actually, every one of the builtins listed here can be implemented with functools.reduce, but they exist as built-ins because they address some common use cases more easily.

p027.png-68.3kB
p027.png-68.3kB

p028.png-128.9kB
p028.png-128.9kB

Example 14-23. Results of all and any for some sequences.

>>> all([1, 2, 3])
True
>>> all([1, 0, 3])
False
>>> all([])
True
>>> any([1, 2, 3])
True
>>> any([1, 0, 3])
True
>>> any([0, 0.0])
False
>>> any([])
False
>>> g = (n for n in [0, 0.0, 7, 8])
>>> any(g)
True
>>> next(g)
8

A Closer Look at the iter Function

It can be called with two arguments to create an iterator from a regular function or any callable object. In this usage, the first argument must be a callable to be invoked repeatedly (with no arguments) to yield values, and the second argument is a sentinel: a marker value which, when returned by the callable, causes the iterator to raise StopIteration instead of yielding the sentinel.

The following example shows how to use iter to roll a six-sided die until a 1 is rolled:

>>> def d6():
...     return randint(1, 6)
...
>>> d6_iter = iter(d6, 1)
>>> d6_iter
<callable_iterator object at 0x00000000029BE6A0>
>>> for roll in d6_iter:
...     print(roll)
...
4
3
6
3

A useful example is found in the iter built-in function documentation. This snippet reads lines from a file until a blank line is found or the end of file is reached:

with open('mydata.txt') as fp:
    for line in iter(fp.readline, ''):
        process_line(line)


Context Managers and else Blocks

Context Managers and with Blocks

Context manager objects exist to control a with statement, just like iterators exist to control a for statement.

The with statement was designed to simplify the try/finally pattern, which guarantees that some operation is performed after a block of code, even if the block is aborted because of an exception, a return or sys.exit() call. The code in the finally clause usually releases a critical resource or restores some previous state that was temporarily changed.

The most common example is making sure a file object is closed.

Example 15-1. Demonstration of a file object as a context manager.

>>> with open('mirror.py') as fp:
...     src = fp.read(60)
...
>>> len(src)
60
>>> fp
<_io.TextIOWrapper name='mirror.py' mode='r' encoding='UTF-8'>
>>> fp.closed, fp.encoding
(True, 'UTF-8')
>>> fp.read(60)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file.


Concurrency with Futures

This chapter focuses on the concurrent.futures library introduced in Python 3.2.

Example: Web Downloads in Three Styles

To handle network I/O efficiently, you need concurrency(并發(fā)性), as it involves high latency — so instead of wasting CPU cycles waiting, it’s better to do something else until a response comes back from the network.

I wrote three simple programs to download images of 20 country flags from the Web.

The first one, flags.py, runs sequentially: it only requests the next image when the previous one is downloaded and saved to disk. The other two scripts make concurrent downloads: they request all images practically at the same time, and save the files as they arrive. The flags_threadpool.py script uses the concurrent.futures package, while flags_asyncio.py uses asyncio.

A Sequential Download Script

Example 17-2. flags.py: sequential download script; some functions will be reused by the other scripts.

import os
import time
import sys

import requests  

POP20_CC = ('CN IN US ID BR PK NG BD RU JP '
            'MX PH VN ET EG DE IR TR CD FR').split() 

BASE_URL = 'http://flupy.org/data/flags' 

DEST_DIR = 'downloads/'  


def save_flag(img, filename):
    path = os.path.join(DEST_DIR, filename)
    with open(path, 'wb') as fp:
        fp.write(img)


def get_flag(cc): 
    url = '{}/{cc}/{cc}.gif'.format(BASE_URL, cc=cc.lower())
    resp = requests.get(url)
    return resp.content


def show(text): 
    print(text, end=' ')
    sys.stdout.flush()


def download_many(cc_list):
    for cc in sorted(cc_list):
        image = get_flag(cc)
        show(cc)
        save_flag(image, cc.lower() + '.gif')

    return len(cc_list)


def main(download_many): 
    t0 = time.time()
    count = download_many(POP20_CC)
    elapsed = time.time() - t0
    msg = '\n{} flags downloaded in {:.2f}s'
    print(msg.format(count, elapsed))


if __name__ == '__main__':
    main(download_many)

The requests library by Kenneth Reitz is available on PyPI and is more powerful and easier to use than the urllib.request module from the Python 3 standard library.

Downloading with concurrent.futures

The main features of the concurrent.futures package are the ThreadPoolExecutor and ProcessPoolExecutor classes, which implement an interface that allows you to submit callables for execution in different threads or processes, respectively.

Example 17-3. flags_threadpool.py: threaded download script using futures.ThreadPoolExecutor.

rom concurrent import futures

from flags import save_flag, get_flag, show, main

MAX_WORKERS = 20


def download_one(cc):
    image = get_flag(cc)
    show(cc)
    save_flag(image, cc.lower() + '.gif')
    return cc


def download_many(cc_list):
    # Set the number of worker threads: use the smaller number between the maximum we want to allow (MAX_WORKERS) and the actual items to be processed, so no unnecessary threads are created.
    workers = min(MAX_WORKERS, len(cc_list))
    # Instantiate the ThreadPoolExecutor with that number of worker threads; the executor.__exit__ method will call executor.shutdown(wait=True), which will block until all threads are done.
    with futures.ThreadPoolExecutor(workers) as executor:
        # The map method is similar to the map built-in, except that the download_one function will be called concurrently from multiple threads; it returns a generator that can be iterated over to retrieve the value returned by each function.
        res = executor.map(download_one, sorted(cc_list)) 

    return len(list(res))


if __name__ == '__main__':
    main(download_many)
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末驾荣,一起剝皮案震驚了整個濱河市普泡,隨后出現(xiàn)的幾起案子撼班,更是在濱河造成了極大的恐慌砰嘁,老刑警劉巖勘究,帶你破解...
    沈念sama閱讀 216,744評論 6 502
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異景描,居然都是意外死亡秀撇,警方通過查閱死者的電腦和手機捌袜,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,505評論 3 392
  • 文/潘曉璐 我一進店門虏等,熙熙樓的掌柜王于貴愁眉苦臉地迎上來霍衫,“玉大人侯养,你說我怎么就攤上這事逛揩”缁” “怎么了逞泄?”我有些...
    開封第一講書人閱讀 163,105評論 0 353
  • 文/不壞的土叔 我叫張陵喷众,是天一觀的道長各谚。 經(jīng)常有香客問我,道長到千,這世上最難降的妖魔是什么昌渤? 我笑而不...
    開封第一講書人閱讀 58,242評論 1 292
  • 正文 為了忘掉前任,我火速辦了婚禮憔四,結(jié)果婚禮上膀息,老公的妹妹穿的比我還像新娘望抽。我一直安慰自己履婉,他們只是感情好煤篙,可當(dāng)我...
    茶點故事閱讀 67,269評論 6 389
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著毁腿,像睡著了一般辑奈。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上已烤,一...
    開封第一講書人閱讀 51,215評論 1 299
  • 那天鸠窗,我揣著相機與錄音,去河邊找鬼胯究。 笑死稍计,一個胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的裕循。 我是一名探鬼主播臣嚣,決...
    沈念sama閱讀 40,096評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼剥哑!你這毒婦竟也來了硅则?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 38,939評論 0 274
  • 序言:老撾萬榮一對情侶失蹤株婴,失蹤者是張志新(化名)和其女友劉穎怎虫,沒想到半個月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體困介,經(jīng)...
    沈念sama閱讀 45,354評論 1 311
  • 正文 獨居荒郊野嶺守林人離奇死亡大审,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,573評論 2 333
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了座哩。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片徒扶。...
    茶點故事閱讀 39,745評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖八回,靈堂內(nèi)的尸體忽然破棺而出酷愧,到底是詐尸還是另有隱情,我是刑警寧澤缠诅,帶...
    沈念sama閱讀 35,448評論 5 344
  • 正文 年R本政府宣布溶浴,位于F島的核電站,受9級特大地震影響管引,放射性物質(zhì)發(fā)生泄漏士败。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 41,048評論 3 327
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望谅将。 院中可真熱鬧漾狼,春花似錦、人聲如沸饥臂。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,683評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽隅熙。三九已至稽煤,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間囚戚,已是汗流浹背酵熙。 一陣腳步聲響...
    開封第一講書人閱讀 32,838評論 1 269
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留驰坊,地道東北人匾二。 一個月前我還...
    沈念sama閱讀 47,776評論 2 369
  • 正文 我出身青樓,卻偏偏與公主長得像拳芙,于是被迫代替她去往敵國和親察藐。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 44,652評論 2 354

推薦閱讀更多精彩內(nèi)容

  • 函數(shù)調(diào)用 Built-in Functions abs(x) Return the absolute value ...
    叫我七夜閱讀 1,170評論 0 0
  • pyspark.sql模塊 模塊上下文 Spark SQL和DataFrames的重要類: pyspark.sql...
    mpro閱讀 9,451評論 0 13
  • (稻盛哲學(xué)學(xué)習(xí)會)打卡第106天 姓名:李晨珂 部門:分水碶 組別:待定 【知~學(xué)習(xí)】 誦讀《京瓷哲學(xué)》第一章 度...
    李晨珂分水碶閱讀 134評論 0 0
  • 內(nèi)存優(yōu)化基本可以分為以下幾個方面:下面就分別介紹态鳖。 減少對象的內(nèi)存占用 對內(nèi)存對象進行復(fù)用 避免對象的內(nèi)存泄漏 內(nèi)...
    wangling90閱讀 319評論 0 0
  • 友友可欣是個聰明又善于交談的女孩子浆竭。 如果沒有碰巧先生,可欣大概會一直帶著這樣一個令人羨慕的光環(huán)惨寿,繼續(xù)在女神的道路...
    花脆閱讀 410評論 0 1