??發(fā)送POST請(qǐng)求:有時(shí)候我們想要在請(qǐng)求數(shù)據(jù)的時(shí)候發(fā)送POST請(qǐng)求崎脉,那么這時(shí)候需要使用
Request
的子類FromRequest
來(lái)實(shí)現(xiàn)撵摆,如果想要在爬蟲(chóng)一開(kāi)始的時(shí)候就發(fā)送POST請(qǐng)求,那么需要在爬蟲(chóng)類中重寫start_request(self)
方法莉撇,并且不再調(diào)用start_urls
里的url呢蛤。
1、創(chuàng)建項(xiàng)目
D:\學(xué)習(xí)筆記\Python學(xué)習(xí)\Python_Crawler>scrapy startproject renrenLogin
New Scrapy project 'renrenLogin', using template directory 'c:\python38\lib\site-packages\scrapy\templates\project', created in:
D:\學(xué)習(xí)筆記\Python學(xué)習(xí)\Python_Crawler\renrenLogin
You can start your first spider with:
cd renrenLogin
scrapy genspider example example.com
2棍郎、創(chuàng)建爬蟲(chóng)
D:\學(xué)習(xí)筆記\Python學(xué)習(xí)\Python_Crawler>cd renrenLogin
D:\學(xué)習(xí)筆記\Python學(xué)習(xí)\Python_Crawler\renrenLogin>scrapy genspider renren "renren.com"
Created spider 'renren' using template 'basic' in module:
renrenLogin.spiders.renren
3其障、代碼實(shí)現(xiàn)
??A)settings.py文件配置:
ROBOTSTXT_OBEY = False
DOWNLOAD_DELAY = 1
DEFAULT_REQUEST_HEADERS = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.9 Safari/537.36',
}
??B)start.py文件如下:
from scrapy import cmdline
cmdline.execute("scrapy crawl renren".split())
??C)renren.py文件如下:
# -*- coding: utf-8 -*-
import scrapy
class RenrenSpider(scrapy.Spider):
name = 'renren'
allowed_domains = ['renren.com']
start_urls = ['http://renren.com/']
def start_requests(self):
url = "http://www.renren.com/PLogin.do"
data = {"email": "kevin19851228@gmail.com", "password": "1qaz@WSX"}
request = scrapy.FormRequest(url, formdata=data,callback=self.parse_page)
yield request
def parse_page(self, response):
# with open('renren.html', 'w', encoding='utf-8') as fp:
# fp.write(response.text)
request = scrapy.Request(url="http://www.renren.com/880151247/profile", callback=self.parse_profile)
yield request
def parse_profile(self, response):
with open('dpProfile.html', 'w', encoding='utf-8') as fp:
fp.write(response.text)
4、說(shuō)明:
??1)想要發(fā)送post請(qǐng)求坝撑,那么推薦使用“scrapyFormRequest”方法静秆,可以方便的指定表單數(shù)據(jù);
??2)如果想要的爬蟲(chóng)一開(kāi)始的時(shí)候就發(fā)送post請(qǐng)求巡李,那么應(yīng)該重寫start_requests
方法抚笔,在這個(gè)方法中,發(fā)送post請(qǐng)求侨拦。