接上文500 Lines or Less:A Web Crawler With asyncio Coroutines異步網(wǎng)絡(luò)爬蟲(一)
Coordinating Coroutines
We began by describing how we want our crawler to work. Now it is time to implement it with asyncio coroutines.
我們開始描述我們希望我們的爬蟲如何工作啥寇。 現(xiàn)在是時(shí)候?qū)崿F(xiàn)它與asyncio協(xié)程铁追。
Our crawler will fetch the first page, parse its links, and add them to a queue. After this it fans out across the website, fetching pages concurrently. But to limit load on the client and server, we want some maximum number of workers to run, and no more. Whenever a worker finishes fetching a page, it should immediately pull the next link from the queue. We will pass through periods when there is not enough work to go around, so some workers must pause. But when a worker hits a page rich with new links, then the queue suddenly grows and any paused workers should wake and get cracking. Finally, our program must quit once its work is done.
我們的抓取工具會(huì)抓取第一頁矢沿,解析其鏈接疾捍,并將其添加到隊(duì)列中淌友。 在這之后宛乃,它退出網(wǎng)站唱星,同時(shí)抓取頁面愧旦。 但是為了限制客戶端和服務(wù)器上的負(fù)載铛只,我們希望運(yùn)行一些最大數(shù)量的worker埠胖,并且不會(huì)更多。 每當(dāng)一個(gè)worker完成提取頁面時(shí)淳玩,它應(yīng)該立即從隊(duì)列中拉下一個(gè)鏈接直撤。 當(dāng)沒有足夠的工作量時(shí),我們會(huì)暫停一些worker凯肋。 但是當(dāng)一個(gè)worker點(diǎn)擊一個(gè)富有新鏈接的頁面時(shí)谊惭,隊(duì)列突然增長,任何暫停的worker都應(yīng)該醒來并開始工作侮东。 最后圈盔,我們的程序必須在其工作完成后退出。
Imagine if the workers were threads. How would we express the crawler's algorithm? We could use a synchronized queue[1] from the Python standard library. Each time an item is put in the queue, the queue increments its count of "tasks". Worker threads call task_done
after completing work on an item. The main thread blocks on Queue.join
until each item put in the queue is matched by a task_done
call, then it exits.
想象一下悄雅,如果workers 是線程驱敲。 我們將如何表達(dá)爬蟲的算法? 我們可以使用來自Python標(biāo)準(zhǔn)庫的同步隊(duì)列[^ 5]宽闲。 每次將項(xiàng)目放入隊(duì)列時(shí)众眨,隊(duì)列都會(huì)增加其“任務(wù)”的計(jì)數(shù)握牧。 工作線程在完成對(duì)項(xiàng)目的工作后調(diào)用task_done
。 Queue.join
中的主線程直到每個(gè)項(xiàng)目放入隊(duì)列后阻塞娩梨,然后通過task_done
調(diào)用來匹配沿腰,然后退出。
Coroutines use the exact same pattern with an asyncio queue! First we import it[2]:
協(xié)程使用與asyncio隊(duì)列完全相同的模式狈定! 首先我們導(dǎo)入它[^ 6]:
try:
from asyncio import JoinableQueue as Queue
except ImportError:
# In Python 3.5, asyncio.JoinableQueue is
# merged into Queue.
from asyncio import Queue
We collect the workers' shared state in a crawler class, and write the main logic in its crawl
method. We start crawl
on a coroutine and run asyncio's event loop until crawl
finishes:
我們?cè)谂老x類中收集workers的共享狀態(tài)颂龙,并在其crawl
方法中編寫主邏輯。 我們?cè)趨f(xié)程上啟動(dòng)crawl
并運(yùn)行asyncio的事件循環(huán)纽什,直到crawl
完成:
loop = asyncio.get_event_loop()
crawler = crawling.Crawler('http://xkcd.com',
max_redirect=10)
loop.run_until_complete(crawler.crawl())
The crawler begins with a root URL and max_redirect
, the number of redirects it is willing to follow to fetch any one URL. It puts the pair (URL, max_redirect)
in the queue. (For the reason why, stay tuned.)
爬蟲以根網(wǎng)址和max_redirect
開頭措嵌,the number of redirects it is willing to follow to fetch any one URL(不知道怎么翻譯比較好...先放在這) 它將(URL, max_redirect)
放在隊(duì)列中。 (為什么芦缰,請(qǐng)繼續(xù)關(guān)注企巢。)
class Crawler:
def __init__(self, root_url, max_redirect):
self.max_tasks = 10
self.max_redirect = max_redirect
self.q = Queue()
self.seen_urls = set()
# aiohttp's ClientSession does connection pooling and
# HTTP keep-alives for us.
self.session = aiohttp.ClientSession(loop=loop)
# Put (URL, max_redirect) in the queue.
self.q.put((root_url, self.max_redirect))
The number of unfinished tasks in the queue is now one. Back in our main script, we launch the event loop and the crawl
method:
隊(duì)列中未完成任務(wù)的數(shù)量現(xiàn)在為1。 回到我們的主腳本让蕾,我們啟動(dòng)事件循環(huán)和crawl
方法:
loop.run_until_complete(crawler.crawl())
The crawl
coroutine kicks off the workers. It is like a main thread: it blocks on join
until all tasks are finished, while the workers run in the background.
crawl
協(xié)程啟動(dòng)workers浪规。 它像一個(gè)主線程:它阻塞在join
,直到所有任務(wù)完成探孝,而workers 在后臺(tái)運(yùn)行罗丰。
@asyncio.coroutine
def crawl(self):
"""Run the crawler until all work is done."""
workers = [asyncio.Task(self.work())
for _ in range(self.max_tasks)]
# When all work is done, exit.
yield from self.q.join()
for w in workers:
w.cancel()
If the workers were threads we might not wish to start them all at once. To avoid creating expensive threads until it is certain they are necessary, a thread pool typically grows on demand. But coroutines are cheap, so we simply start the maximum number allowed.
如果workers 是線程,我們可能不希望立即啟動(dòng)它們再姑。 以避免在確定需要之前創(chuàng)建昂貴的線程,線程池通常根據(jù)需要增長找御。 但協(xié)程很便宜元镀,所以我們簡單的啟動(dòng)允許的最大數(shù)量。
It is interesting to note how we shut down the crawler. When the join
future resolves, the worker tasks are alive but suspended: they wait for more URLs but none come. So, the main coroutine cancels them before exiting. Otherwise, as the Python interpreter shuts down and calls all objects' destructors, living tasks cry out:
有趣的是霎桅,我們注意到我們?nèi)绾侮P(guān)閉爬蟲栖疑。 當(dāng)join
future 解析時(shí),worker 任務(wù)活著但是被暫停:它們等待更多的URL滔驶,但沒有來到遇革。 因此,主協(xié)程在退出之前取消它們揭糕。 否則萝快,當(dāng)Python解釋器關(guān)閉并調(diào)用所有對(duì)象的析構(gòu)函數(shù)時(shí),存活的任務(wù)崩潰了:
ERROR:asyncio:Task was destroyed but it is pending!
And how does cancel
work? Generators have a feature we have not yet shown you. You can throw an exception into a generator from outside:
cancel
是如何工作的著角? 生成器具有我們尚未向您展示的功能揪漩。 您可以從外部將異常拋出到生成器中:
>>> gen = gen_fn()
>>> gen.send(None) # Start the generator as usual.
1
>>> gen.throw(Exception('error'))
Traceback (most recent call last):
File "<input>", line 3, in <module>
File "<input>", line 2, in gen_fn
Exception: error
The generator is resumed by throw
, but it is now raising an exception. If no code in the generator's call stack catches it, the exception bubbles back up to the top. So to cancel a task's coroutine:
生成器由throw
恢復(fù),但它現(xiàn)在引發(fā)異常吏口。 如果生成器的調(diào)用棧中沒有代碼捕獲它奄容,則異常冒泡回到頂部冰更。 所以為了取消任務(wù)的協(xié)程:
# Method of Task class.
def cancel(self):
self.coro.throw(CancelledError)
Wherever the generator is paused, at some yield from
statement, it resumes and throws an exception. We handle cancellation in the task's step
method:
無論生成器何時(shí)暫停,在某些yield from
語句中昂勒,它會(huì)恢復(fù)并拋出異常蜀细。 我們?cè)谌蝿?wù)的step
方法中處理取消:
# Method of Task class.
def step(self, future):
try:
next_future = self.coro.send(future.result)
except CancelledError:
self.cancelled = True
return
except StopIteration:
return
next_future.add_done_callback(self.step)
Now the task knows it is cancelled, so when it is destroyed it does not rage against the dying of the light.
現(xiàn)在任務(wù)知道它被取消,所以當(dāng)它被銷毀時(shí)戈盈,它不憤怒反對(duì)死亡奠衔。
Once crawl
has canceled the workers, it exits. The event loop sees that the coroutine is complete (we shall see how later), and it too exits:
一旦crawl
取消了workers,它就退出了奕谭。 事件循環(huán)看到協(xié)程是完成了的(我們將看到稍后)涣觉,它也退出:
loop.run_until_complete(crawler.crawl())
The crawl
method comprises all that our main coroutine must do. It is the worker coroutines that get URLs from the queue, fetch them, and parse them for new links. Each worker runs the work
coroutine independently:
crawl
方法包括我們的主協(xié)程必須做的所有事情。 它是worker 協(xié)程血柳,從隊(duì)列獲取URL官册,獲取它們,并解析它們的新鏈接难捌。 每個(gè)worker 獨(dú)立運(yùn)行work
協(xié)同程序:
@asyncio.coroutine
def work(self):
while True:
url, max_redirect = yield from self.q.get()
# Download page and add new links to self.q.
yield from self.fetch(url, max_redirect)
self.q.task_done()
Python sees that this code contains yield from
statements, and compiles it into a generator function. So in crawl
, when the main coroutine calls self.work
ten times, it does not actually execute this method: it only creates ten generator objects with references to this code. It wraps each in a Task. The Task receives each future the generator yields, and drives the generator by calling send
with each future's result when the future resolves. Because the generators have their own stack frames, they run independently, with separate local variables and instruction pointers.
Python看到這個(gè)代碼包含yield from語句膝宁,并將其編譯成一個(gè)生成器函數(shù)。 因此根吁,在爬蟲運(yùn)行時(shí)员淫,當(dāng)主協(xié)程調(diào)用 self.work
十次時(shí),它實(shí)際上不執(zhí)行此方法:它只創(chuàng)建十個(gè)生成器對(duì)象引用此代碼击敌。 它將每個(gè)任務(wù)包裝在一個(gè)Task中介返。 任務(wù)接收每個(gè)未來的生成器 yields,并通過在未來結(jié)算時(shí)通過調(diào)用send
與每個(gè)future的結(jié)果來驅(qū)動(dòng)生成器沃斤。 因?yàn)樯善骶哂凶约旱亩褩バ运鼈儶?dú)立運(yùn)行,具有單獨(dú)的局部變量和指令指針衡瓶。
The worker coordinates with its fellows via the queue. It waits for new URLs with:
worker 通過隊(duì)列與其同事協(xié)調(diào)徘公。 它等待具有以下內(nèi)容的新網(wǎng)址:
url, max_redirect = yield from self.q.get()
The queue's get
method is itself a coroutine: it pauses until someone puts an item in the queue, then resumes and returns the item.
隊(duì)列的get
方法本身是一個(gè)協(xié)程:它將暫停,直到有人將一個(gè)項(xiàng)目放入隊(duì)列哮针,然后就會(huì)恢復(fù)并返回項(xiàng)目关面。
Incidentally, this is where the worker will be paused at the end of the crawl, when the main coroutine cancels it. From the coroutine's perspective, its last trip around the loop ends when yield from
raises a CancelledError
.
順便說一下,當(dāng)主協(xié)程取消它十厢,worker 將在爬取結(jié)束時(shí)暫停等太。 從協(xié)程的角度來看,當(dāng)“yield from”引發(fā)一個(gè)“CancelledError”時(shí)蛮放,它的最后一次循環(huán)結(jié)束澈驼。
When a worker fetches a page it parses the links and puts new ones in the queue, then calls task_done
to decrement the counter. Eventually, a worker fetches a page whose URLs have all been fetched already, and there is also no work left in the queue. Thus this worker's call to task_done
decrements the counter to zero. Then crawl
, which is waiting for the queue's join
method, is unpaused and finishes.
當(dāng)worker 提取頁面時(shí),它會(huì)解析鏈接并將新的鏈接放入隊(duì)列筛武,然后調(diào)用task_done
來遞減計(jì)數(shù)器缝其。 最終挎塌,worker 獲取已經(jīng)獲取了URL的頁面,并且隊(duì)列中也沒有剩余的工作内边。 因此榴都,這個(gè)worker 對(duì)“task_done”的調(diào)用將計(jì)數(shù)器減少為零。 然后漠其, crawl
嘴高,它等待隊(duì)列的join
方法,被取消暫停并完成和屎。
We promised to explain why the items in the queue are pairs, like:
我們承諾過解釋為什么隊(duì)列中的項(xiàng)目是成對(duì)的拴驮,如:
# URL to fetch, and the number of redirects left.
('http://xkcd.com/353', 10)
New URLs have ten redirects remaining. Fetching this particular URL results in a redirect to a new location with a trailing slash. We decrement the number of redirects remaining, and put the next location in the queue:
新網(wǎng)址有十個(gè)重定向。 獲取此特定網(wǎng)址會(huì)導(dǎo)致重定向到帶有尾部斜杠的新位置柴信。 我們減少剩余的重定向數(shù)套啤,并將下一個(gè)位置放入隊(duì)列:
# URL with a trailing slash. Nine redirects left.
('http://xkcd.com/353/', 9)
The aiohttp
package we use would follow redirects by default and give us the final response. We tell it not to, however, and handle redirects in the crawler, so it can coalesce redirect paths that lead to the same destination: if we have already seen this URL, it is in self.seen_urls
and we have already started on this path from a different entry point:
我們使用的aiohttp
包將遵循默認(rèn)的重定向,并給我們最后的響應(yīng)随常。 然而潜沦,我們告訴它不會(huì)在抓取工具中處理重定向,所以它可以合并到重定向路徑绪氛,導(dǎo)致相同的目標(biāo):如果我們已經(jīng)看到這個(gè)URL唆鸡,它在self.seen_urls
,我們已經(jīng) 在此路徑上從不同的入口點(diǎn)啟動(dòng):
\aosafigure[240pt]{crawler-images/redirects.png}{Redirects}{500l.crawler.redirects}
The crawler fetches "foo" and sees it redirects to "baz", so it adds "baz" to the queue and to seen_urls
. If the next page it fetches is "bar", which also redirects to "baz", the fetcher does not enqueue "baz" again. If the response is a page, rather than a redirect, fetch
parses it for links and puts new ones in the queue.
抓取工具獲取“foo”并且看到它重定向到“baz”枣察,因此它將“baz”添加到隊(duì)列和seen_urls
争占。 如果它獲取的下一頁是“bar”,它也重定向到“baz”序目,抓取器不會(huì)再次入隊(duì)“baz”燃乍。 如果響應(yīng)是一個(gè)頁面,而不是一個(gè)重定向宛琅,fetch
解析它的鏈接,并將新的隊(duì)列中逗旁。
@asyncio.coroutine
def fetch(self, url, max_redirect):
# Handle redirects ourselves.
response = yield from self.session.get(
url, allow_redirects=False)
try:
if is_redirect(response):
if max_redirect > 0:
next_url = response.headers['location']
if next_url in self.seen_urls:
# We have been down this path before.
return
# Remember we have seen this URL.
self.seen_urls.add(next_url)
# Follow the redirect. One less redirect remains.
self.q.put_nowait((next_url, max_redirect - 1))
else:
links = yield from self.parse_links(response)
# Python set-logic:
for link in links.difference(self.seen_urls):
self.q.put_nowait((link, self.max_redirect))
self.seen_urls.update(links)
finally:
# Return connection to pool.
yield from response.release()
If this were multithreaded code, it would be lousy with race conditions. For example, the worker checks if a link is in seen_urls
, and if not the worker puts it in the queue and adds it to seen_urls
. If it were interrupted between the two operations, then another worker might parse the same link from a different page, also observe that it is not in seen_urls
, and also add it to the queue. Now that same link is in the queue twice, leading (at best) to duplicated work and wrong statistics.
如果這是多線程代碼嘿辟,它將是討厭的條件競(jìng)爭。 例如片效,worker 檢查鏈接是否在seen_urls
中红伦,如果不是,則將其放入隊(duì)列并將其添加到seen_urls
中淀衣。 如果它在兩個(gè)操作之間中斷昙读,則另一個(gè)worker 可能從不同的頁面解析相同的鏈接,還觀察到它不在seen_urls
膨桥,并且也將其添加到隊(duì)列蛮浑。 現(xiàn)在同一個(gè)鏈接在隊(duì)列中兩次唠叛,導(dǎo)致(頂多)重復(fù)的工作和錯(cuò)誤的統(tǒng)計(jì)。
However, a coroutine is only vulnerable to interruption at yield from
statements. This is a key difference that makes coroutine code far less prone to races than multithreaded code: multithreaded code must enter a critical section explicitly, by grabbing a lock, otherwise it is interruptible. A Python coroutine is uninterruptible by default, and only cedes control when it explicitly yields.
但是沮稚,協(xié)程只受到yield from
語句中斷的影響艺沼。 這是一個(gè)關(guān)鍵區(qū)別,使得協(xié)同代碼比多線程代碼更不容易發(fā)生競(jìng)爭:多線程代碼必須通過抓取鎖來顯式地進(jìn)入臨界區(qū)蕴掏,否則它是可中斷的障般。 Python協(xié)程在默認(rèn)情況下是不可中斷的,并且只有在它顯式產(chǎn)生時(shí)才控制盛杰。
We no longer need a fetcher class like we had in the callback-based program. That class was a workaround for a deficiency of callbacks: they need some place to store state while waiting for I/O, since their local variables are not preserved across calls. But the fetch
coroutine can store its state in local variables like a regular function does, so there is no more need for a class.
我們不再需要像我們?cè)诨诨卣{(diào)的程序中一樣的fetcher類挽荡。 該類是回調(diào)不足的解決方法:在等待I / O時(shí),它們需要一些地方來存儲(chǔ)狀態(tài)即供,因?yàn)樗鼈兊木植孔兞坎粫?huì)跨越調(diào)用保留定拟。 但是fetch
協(xié)程可以像常規(guī)函數(shù)那樣將其狀態(tài)存儲(chǔ)在局部變量中,因此不再需要類募狂。
When fetch
finishes processing the server response it returns to the caller, work
. The work
method calls task_done
on the queue and then gets the next URL from the queue to be fetched.
當(dāng)fetch
完成處理服務(wù)器響應(yīng)時(shí)办素,它返回到調(diào)用者work
。 work
方法在隊(duì)列上調(diào)用task_done
祸穷,然后從隊(duì)列中獲取下一個(gè)要獲取的URL性穿。
When fetch
puts new links in the queue it increments the count of unfinished tasks and keeps the main coroutine, which is waiting for q.join
, paused. If, however, there are no unseen links and this was the last URL in the queue, then when work
calls task_done
the count of unfinished tasks falls to zero. That event unpauses join
and the main coroutine completes.
當(dāng)fetch
將新的鏈接放入隊(duì)列時(shí),它增加未完成任務(wù)的計(jì)數(shù)雷滚,并保持主協(xié)程需曾,等待q.join
,暫停祈远。 然而呆万,如果沒有unseen links,這是隊(duì)列中的最后一個(gè)URL车份,那么當(dāng)work
調(diào)用task_done
時(shí)谋减,未完成任務(wù)的計(jì)數(shù)降為零。 該事件取消了join
并且主協(xié)程完成扫沼。
The queue code that coordinates the workers and the main coroutine is like this[3]:
協(xié)調(diào)workers 和主協(xié)程的隊(duì)列代碼是這樣的[^ 9]:
class Queue:
def __init__(self):
self._join_future = Future()
self._unfinished_tasks = 0
# ... other initialization ...
def put_nowait(self, item):
self._unfinished_tasks += 1
# ... store the item ...
def task_done(self):
self._unfinished_tasks -= 1
if self._unfinished_tasks == 0:
self._join_future.set_result(None)
@asyncio.coroutine
def join(self):
if self._unfinished_tasks > 0:
yield from self._join_future
The main coroutine, crawl
, yields from join
. So when the last worker decrements the count of unfinished tasks to zero, it signals crawl
to resume, and finish.
主協(xié)程crawl
從 join
中產(chǎn)生出爹。 因此,當(dāng)最后一個(gè)工人將未完成任務(wù)的計(jì)數(shù)減少為零時(shí)缎除,它指示crawl
恢復(fù)并且完成严就。
The ride is almost over. Our program began with the call to crawl
:
我們的程序從調(diào)用crawl
開始:
loop.run_until_complete(self.crawler.crawl())
How does the program end? Since crawl
is a generator function, calling it returns a generator. To drive the generator, asyncio wraps it in a task:
程序如何結(jié)束? 因?yàn)?code>crawl '是一個(gè)生成器函數(shù)器罐,所以調(diào)用它會(huì)返回一個(gè)生成器梢为。 為了驅(qū)動(dòng)生成器,asyncio將它包裝在一個(gè)任務(wù)中:
class EventLoop:
def run_until_complete(self, coro):
"""Run until the coroutine is done."""
task = Task(coro)
task.add_done_callback(stop_callback)
try:
self.run_forever()
except StopError:
pass
class StopError(BaseException):
"""Raised to stop the event loop."""
def stop_callback(future):
raise StopError
When the task completes, it raises StopError
, which the loop uses as a signal that it has arrived at normal completion.
當(dāng)任務(wù)完成時(shí),它引發(fā)StopError铸董,loop 作為它已經(jīng)到達(dá)正常完成的信號(hào)祟印。
But what's this? The task has methods called add_done_callback
and result
? You might think that a task resembles a future. Your instinct is correct. We must admit a detail about the Task class we hid from you: a task is a future.
但是這是什么? task 有稱為add_done_callback
和result
的方法袒炉? 你可能認(rèn)為任務(wù)類似于future旁理。 你的直覺是正確的。 我們必須承認(rèn)我們隱藏的任務(wù)類的細(xì)節(jié):一個(gè)任務(wù)是一個(gè)future我磁。
class Task(Future):
"""A coroutine wrapped in a Future."""
Normally a future is resolved by someone else calling set_result
on it. But a task resolves itself when its coroutine stops. Remember from our earlier exploration of Python generators that when a generator returns, it throws the special StopIteration
exception:
通常孽文,future 由其他人調(diào)用set_result
解決。 但是一個(gè)任務(wù)在它的協(xié)程停止時(shí)自行解決夺艰。 記住我們?cè)缙谔剿鱌ython生成器時(shí)则涯,當(dāng)一個(gè)生成器返回時(shí)月而,它會(huì)拋出特殊的StopIteration異常:
# Method of class Task.
def step(self, future):
try:
next_future = self.coro.send(future.result)
except CancelledError:
self.cancelled = True
return
except StopIteration as exc:
# Task resolves itself with coro's return
# value.
self.set_result(exc.value)
return
next_future.add_done_callback(self.step)
So when the event loop calls task.add_done_callback(stop_callback)
, it prepares to be stopped by the task. Here is run_until_complete
again:
所以當(dāng)事件循環(huán)調(diào)用task.add_done_callback(stop_callback)
時(shí)愉烙,它準(zhǔn)備被任務(wù)停止页徐。 這里是run_until_complete
:
# Method of event loop.
def run_until_complete(self, coro):
task = Task(coro)
task.add_done_callback(stop_callback)
try:
self.run_forever()
except StopError:
pass
When the task catches StopIteration
and resolves itself, the callback raises StopError
from within the loop. The loop stops and the call stack is unwound to run_until_complete
. Our program is finished.
當(dāng)任務(wù)捕獲StopIteration
并且自己解析時(shí),回調(diào)從循環(huán)中引發(fā)StopError
存谎。 循環(huán)停止拔疚,調(diào)用堆棧解開為run_until_complete
。 我們的程序完成了既荚。
Conclusion 結(jié)論
Increasingly often, modern programs are I/O-bound instead of CPU-bound. For such programs, Python threads are the worst of both worlds: the global interpreter lock prevents them from actually executing computations in parallel, and preemptive switching makes them prone to races. Async is often the right pattern. But as callback-based async code grows, it tends to become a dishevelled mess. Coroutines are a tidy alternative. They factor naturally into subroutines, with sane exception handling and stack traces.
越來越多地稚失,現(xiàn)代程序是I / O綁定而不是CPU綁定。 對(duì)于這樣的程序恰聘,Python線程是很糟糕的:全局解釋器鎖防止它們實(shí)際上并行執(zhí)行計(jì)算句各,并且搶先切換使它們?nèi)菀壮霈F(xiàn)競(jìng)爭。 異步通常是正確的模式晴叨。 但是隨著基于回調(diào)的異步代碼的增長凿宾,它往往成為一個(gè)混亂的混亂。 協(xié)程是一個(gè)整潔的替代品兼蕊。 他們自然地考慮子程序初厚,具有正確的異常處理和堆棧跟蹤。
If we squint so that the yield from
statements blur, a coroutine looks like a thread doing traditional blocking I/O. We can even coordinate coroutines with classic patterns from multi-threaded programming. There is no need for reinvention. Thus, compared to callbacks, coroutines are an inviting idiom to the coder experienced with multithreading.
如果我們瞇著眼睛孙技,使得yield from
語句模糊产禾,協(xié)程看起來像是執(zhí)行傳統(tǒng)的阻塞I / O的線程。 我們甚至可以使用多線程編程中的經(jīng)典模式來協(xié)調(diào)協(xié)程绪杏。 沒有必要改造。
But when we open our eyes and focus on the yield from
statements, we see they mark points when the coroutine cedes control and allows others to run. Unlike threads, coroutines display where our code can be interrupted and where it cannot. In his illuminating essay "Unyielding"[4], Glyph Lefkowitz writes, "Threads make local reasoning difficult, and local reasoning is perhaps the most important thing in software development." Explicitly yielding, however, makes it possible to "understand the behavior (and thereby, the correctness) of a routine by examining the routine itself rather than examining the entire system."
但是當(dāng)我們打開我們的眼睛并專注于yield from
語句時(shí)纽绍,我們看到它們?cè)趨f(xié)程退出控制并允許其他人運(yùn)行時(shí)標(biāo)記了重點(diǎn)蕾久。 與線程不同,協(xié)同程序顯示我們的代碼可以中斷的地方拌夏,而線程不能僧著。 在他的論文"Unyielding"[^ 4]中履因,Glyph Lefkowitz寫道:“線程使得局部推理變得困難,局部推理也許是軟件開發(fā)中最重要的盹愚。 然而栅迄,顯式產(chǎn)生可以通過檢查例程本身而不是檢查整個(gè)系統(tǒng)來“理解例程的行為(正確性)。
This chapter was written during a renaissance in the history of Python and async. Generator-based coroutines, whose devising you have just learned, were released in the "asyncio" module with Python 3.4 in March 2014. In September 2015, Python 3.5 was released with coroutines built in to the language itself. These native coroutinesare declared with the new syntax "async def", and instead of "yield from", they use the new "await" keyword to delegate to a coroutine or wait for a Future.
章是在Python和異步的歷史上復(fù)興期間寫的皆怕。 基于生成器的協(xié)程毅舆,它的設(shè)計(jì)你剛剛學(xué)會(huì)了,在2014年3月的Python 3.4的“asyncio”模塊中發(fā)布愈腾。2015年9月憋活,Python 3.5發(fā)布了內(nèi)置語言本身的協(xié)同程序。 這些本地協(xié)程用新語法“async def”聲明虱黄,而不是“yield from”悦即,它們使用新的“await”關(guān)鍵字來委派協(xié)程或等待Future。
Despite these advances, the core ideas remain. Python's new native coroutines will be syntactically distinct from generators but work very similarly; indeed, they will share an implementation within the Python interpreter. Task, Future, and the event loop will continue to play their roles in asyncio.
盡管有這些進(jìn)展橱乱,核心思想仍然存在辜梳。 Python的新本地協(xié)同程序在語法上不同于生成器,但工作非常相似; 實(shí)際上泳叠,他們將在Python解釋器中共享一個(gè)實(shí)現(xiàn)作瞄。 Task, Future, 和event loop將繼續(xù)在asyncio中發(fā)揮他們的角色。
Now that you know how asyncio coroutines work, you can largely forget the details. The machinery is tucked behind a dapper interface. But your grasp of the fundamentals empowers you to code correctly and efficiently in modern async environments.
現(xiàn)在你知道asyncio協(xié)程如何工作析二,你可以在很大程度上忘記細(xì)節(jié)粉洼。 機(jī)械被塞在一個(gè)dapper接口后面。 但是你對(duì)基礎(chǔ)知識(shí)的掌握使你能夠在現(xiàn)代異步環(huán)境中正確有效地編程叶摄。
<latex>
<markdown>
-
The actual
asyncio.Queue
implementation uses anasyncio.Event
in place of the Future shown here. The difference is an Event can be reset, whereas a Future cannot transition from resolved back to pending. ?