代碼github地址:zhihu
首先分析知乎登錄頁(yè)嫡秕,一般模擬登錄走的都是手機(jī)頁(yè)读虏,比較方便
分析登錄過(guò)程
為了得到驗(yàn)證碼我特意都輸錯(cuò)了幾次密碼衙传,知乎的登錄不一定有驗(yàn)證碼吟吝,所以我們?cè)诖a實(shí)現(xiàn)的過(guò)程中需要做判斷是否需要驗(yàn)證碼
引入依賴:
from urllib import request,parse
from bs4 import BeautifulSoup
import http.cookiejar
import json
import random
import time
import configparser
構(gòu)建全局請(qǐng)求頭
因?yàn)橐M手機(jī),所以這里我們用的是手機(jī)的user-Agent
def build_opener():
cookie = http.cookiejar.CookieJar()
cookie_processor = request.HTTPCookieProcessor(cookie)
opener = request.build_opener(cookie_processor)
opener.addheaders = [("User-Agent", "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1"),
("Referer", "https://www.zhihu.com/"),
("Origin", "https://www.zhihu.com/"),
("Host", "www.zhihu.com")]
request.install_opener(opener)
登錄過(guò)程
def login(code=0):
login_data = configparser.ConfigParser()
login_data.read("user.ini") #將用戶名密碼放在user.ini配置文件
username = login_data.get("LoginInfo", "email")
password = login_data.get("LoginInfo", "password")
url = 'https://www.zhihu.com/signin'
login_url = 'https://www.zhihu.com/login/email'
captcha_url = 'https://www.zhihu.com/captcha.gif'
req = request.Request(url)
res = request.urlopen(req)
html = res.read().decode('utf-8')
soup = BeautifulSoup(html)
inputs = soup.find_all('input')
_xsrf = inputs[0]['value']
# 構(gòu)造登錄參數(shù)
params = {
'email': username,
'password': password,
'_xsrf': _xsrf
}
#如果code是1,說(shuō)明需要驗(yàn)證碼菩佑,讀取驗(yàn)證碼并寫入到本地,然后手動(dòng)輸入驗(yàn)證碼
if code == 1:
cap_parms = parse.urlencode({"r": time.time(), "type": "login"}).encode('utf-8')
captcha_req = request.Request(captcha_url,cap_parms,method="GET")
captcha_res = request.urlopen(captcha_req)
fo = open('captcha.jpg','wb+')
fo.write(captcha_res.read())
fo.close()
captcha = input("請(qǐng)輸入驗(yàn)證碼:\n")
params['captcha'] = captcha
params = parse.urlencode(params).encode('utf-8')
req = request.Request(login_url,params,method="POST")
res = request.urlopen(req)
result = res.read().decode('utf-8')
login_result = json.loads(result)
if login_result['r'] == 0:
print('登陸成功')
else:
if login_result['errcode'] == 1991829:
login(1)
else:
print(login_result['msg'])
login()
最后
if __name__ == '__main__':
build_opener()
login()
實(shí)例效果
zhihu.gif
代碼地址:https://github.com/vincenth520/Spider/tree/master/zhihu