我在Python爬蟲基礎(chǔ)-模擬登陸曾經(jīng)談過Cookies和Session睹欲。那么如何我想使用Scrapy進(jìn)行模擬登陸胸懈,那么肯定要逃不過Cookies和Session鳖悠。這篇文章主要為了解決下圖這個問題拼岳,即如何管理為每一個獨立的請求保留其對應(yīng)的cookies细溅。
圖片來自互聯(lián)網(wǎng)
幸運的是官方文檔給了解決方案。
Multiple cookie sessions per spider
There is support for keeping multiple cookie sessions per spider by using the cookiejar
Request meta key. By default it uses a single cookie jar (session), but you can pass an identifier to use different ones.
官方給出的例子:
for i, url in enumerate(urls):
yield scrapy.Request(url, meta={'cookiejar': i},
callback=self.parse_page)
##請記住沼撕,cookjar元鍵(meta key)不會一直保留宋雏。你需要在后續(xù)請求重進(jìn)行傳遞。 例如:
Keep in mind that the cookiejar meta key is not “sticky”.
You need to keep passing it along on subsequent requests. For example:
def parse_page(self, response):
# do some processing
return scrapy.Request("http://www.example.com/otherpage",
meta={'cookiejar': response.meta['cookiejar']},
callback=self.parse_other_page)