需求中希望scrapy的spider能夠一直循環(huán)從Redis鱼的、接口中獲取任務(wù)奸忽,要求spider不能close何鸡。
一版實現(xiàn)在start_requests中:
def start_requests(self):
......
while True:
yield scrapy.Request(url, dont_filter=True)
......
但是這種寫法會導(dǎo)致任務(wù)被頻繁的獲取就是向下一步執(zhí)行渐北。
后用signals實現(xiàn):
from scrapy import signals
from scrapy.exceptions import DontCloseSpider
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(AutoengSpider, cls).from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.spider_idle, signal=signals.spider_idle)
return spider
def start_requests(self):
yield self.next_req()
def spider_idle(self, spider):
request = self.next_req()
if request:
self.crawler.engine.schedule(request, self)
else:
time.sleep(2)
raise DontCloseSpider()