十三象缀、Scrapy框架–實(shí)戰(zhàn)–zcool網(wǎng)站精選圖高速下載(2)
settings.py?設(shè)置代碼
import os
?
BOT_NAME= 'imagedownload'
?
SPIDER_MODULES= ['imagedownload.spiders']
NEWSPIDER_MODULE= 'imagedownload.spiders'
?
DEFAULT_REQUEST_HEADERS= {
? 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
? 'Accept-Language': 'en',
? 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1;WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132Safari/537.36'
}
?
ITEM_PIPELINES= {
?? #'imagedownload.pipelines.ImagedownloadPipeline': 300,
??? 'scrapy.pipelines.images.ImagesPipeline': 1
}
?
IMAGES_STORE= os.path.join(os.path.dirname(os.path.dirname(__file__)), 'images')
Items.py?代碼
import scrapy
?
?
class ImagedownloadItem(scrapy.Item):
??? title = scrapy.Field()
??? # image_urls:是用來保存這個(gè)item上的突破的鏈接的
??? image_urls = scrapy.Field()
??? # images:是后期圖片下載完成后保存后形成image對象再保存到這個(gè)上面
??? images = scrapy.Field()
start.py?代碼
from scrapy import cmdline
?
cmdline.execute("scrapycrawl zcool".split(" "))
續(xù)上例兄淫,zcool.py?示例代碼
import scrapy
from scrapy.spiders.crawl import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from ..items import ImagedownloadItem
?
?
class ZcoolSpider(CrawlSpider):
??? name = 'zcool'
??? allowed_domains = ['zcool.com.cn']
??? start_urls = ['http://zcool.com.cn/']
?
??? rules = (
???????# 翻頁的url
???????Rule(LinkExtractor(allow=".+0!0!0!0!0!!!!2!0!\d+"),follow=True),
???????# 詳情頁面的url
???????Rule(LinkExtractor(allow=".+/work/.+html"), follow=False,callback="parse_detail")
??? )
?
??? def parse_detail(self, response):
??????? image_urls =response.xpath("http://div[@class='reveal-work-wraptext-center']//img/@src").getall()
??????? title_list =response.xpath("http://div[@class='details-contitle-box']/h2/text()").getall()
??????? title ="".join(title_list).strip()
??????? item = ImagedownloadItem(title=title,image_urls=image_urls)
??????? yield item
上一篇文章 第六章 Scrapy框架(十二) 2020-03-14 地址:
http://www.reibang.com/p/fc0b7b7fc5c8
下一篇文章 第六章 Scrapy框架(十四) 2020-03-16 地址:
http://www.reibang.com/p/2febb184009d
以上資料內(nèi)容來源網(wǎng)絡(luò)饥悴,僅供學(xué)習(xí)交流,侵刪請私信我烟勋,謝謝障癌。