自學(xué)Python上渴,還在學(xué)習(xí)中岸梨,如有問(wèn)題請(qǐng)留言,大家共勉
開(kāi)發(fā)工具:PyCharm
使用框架:srcapy
步驟
1 安裝Scrapy:pip3 install srcapy
2 創(chuàng)建工程:scrapy startproject mySpider
-
3 目錄結(jié)構(gòu)如下:
mySpider/
scrapy.cfg
mySpider/
init.py
items.py
pipelines.py
settings.py
spiders/
init.py- mySpider/:工程目錄
- scrapy.cfg: 項(xiàng)目的配置文件
- mySpider/items.py: 項(xiàng)目中的item文件.
- mySpider/pipelines.py: 項(xiàng)目中的pipelines文件.
- mySpider/settings.py: 項(xiàng)目的設(shè)置文件.
- mySpider/spiders/: 放置spider代碼的目錄.
-
4 添加代碼
在items.py 中添加類QiuBaiItem稠氮,代碼如下:
class QiuBaiItem(Item):
userName = scrapy.Field()
content = scrapy.Field()在spiders目錄下創(chuàng)建python 文件 qiubai_spider.py
-
在該文件中創(chuàng)建類 QiubaiSpider曹阔,代碼如下:
class QiubaiSpider(Spider):
name = 'qiubai'
start_urls = [
'http://www.qiushibaike.com'
]
def parse(self, response):
for item in response.xpath('//div[@id="content-left"]/div[@class="article block untagged mb15"]'):
qiubai = QiuBaiItem()
userName = item.xpath('./div[@class="author clearfix"]/a[2]/h2/text()').extract()
if userName:
userName = userName[0]
qiubai['userName'] = userNamecontent = item.xpath('./a[@class="contentHerf"]/div[@class="content"]/span/text()').extract() if content: con = '' for str in content: con += str qiubai['content'] = con yield qiubai
-
5 在 items.py 同級(jí)目錄下創(chuàng)建 manage.py,代碼如下:
from scrapy.cmdline importexecuteexecute()
6 點(diǎn)擊Run -> Edit Configuraction括袒,
Name:糗百(這個(gè)隨意)次兆;
Scrip 選在 manage.py稿茉;
Script parameters 填寫 crawl qiubai7 進(jìn)入settings.py锹锰,添加USER_AGENT
在線查看USER_AGENT
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/602.3.12 (KHTML, like Gecko) Version/10.0.2 Safari/602.3.12'
完成以上7 步就可以直接運(yùn)行了。
如果你想輸出到文件漓库,則可以在 Script parameters 填寫
crawl qiubai -o qiubai_items.json