一、查看登陸
先在瀏覽器中 按住 ctrl + shift + N 打開隱身模式,避免我們已經(jīng)登陸的cookie信息。
然后進(jìn)入知乎登陸頁 https://www.zhihu.com/#signin
按F12,點(diǎn)擊 Network 。嘗試隨便輸個(gè)手機(jī)號登陸下闻镶,可以看到所要登陸頁的請求了
從headers中可以看到請求頁和所需傳送的data
查看請求
即用手機(jī)號登陸請求頁面的url為 https://www.zhihu.com/login/phone_num
phone_num 為手機(jī)號
password 為密碼
那么 _xsrf是什么呢? 點(diǎn)擊 element趴荸,按ctrl + F 查找xsrf儒溉,可以在網(wǎng)頁源代碼中搜索到,不難猜出這其實(shí)是一個(gè)動態(tài)驗(yàn)證碼发钝,既然在網(wǎng)頁中顿涣,那么我們同樣可以輕松獲得它。
那么可以寫出一個(gè)登陸代碼了
二酝豪、登陸代碼
import requests
import re
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
}
# 知乎有反爬涛碑,用個(gè)瀏覽器頭
def get_xsrf(): #獲取網(wǎng)頁_xsrf驗(yàn)證碼
#'''_xsrf 是一個(gè)動態(tài)變化的參數(shù)'''
index_url = 'https://www.zhihu.com/#signin'
#獲取登錄時(shí)要用到的_xsrf
index_page = requests.get(index_url, headers=headers)
html = index_page.text
pattern = r'name="_xsrf" value="(.*?)"'
#這里的_xsrf返回的是一個(gè)list
_xsrf = re.findall(pattern, html)
return _xsrf[0]
def login(account, secret):
xsrf = get_xsrf()
#通過輸入的用戶名判斷是否是手機(jī)號
if re.match(r"^1\d{10}$", account): #正則檢驗(yàn)是否為手機(jī)號
print('login by phone_num\n')
login_url = 'https://www.zhihu.com/login/phone_num' #手機(jī)登陸請求url
formdata = {
'phone_num':account,
'password':secret,
'_xsrf':xsrf
}
else:
if "@" in account:
print('login by Email\n')
else:
print('你的帳號有問題,請重新輸入')
return 0
login_url = 'https://www.zhihu.com/login/email' #郵箱登陸請求url
formdata = {
'email':account,
'password':secret,
'_xsrf':_xsrf
}
login_page = requests.post(url=login_url, data=formdata, headers=headers)
print(login_page.status_code) #檢驗(yàn)網(wǎng)頁響應(yīng)狀態(tài)碼
print(login_page.content)
print(login_page.json()['msg']) #檢驗(yàn)登陸是否成功
if __name__ == "__main__":
username = input('請輸入你的用戶名\n> ')
password = input("請輸入你的密碼\n> ")
login(username, password)
三、使用cookie
接著我們?nèi)绻胗梦覀兊顷懙腎D號做點(diǎn)什么別的事情時(shí),就得保存下cookie描焰,存進(jìn)headers里就行了。
由于我上面代碼的登陸封裝在函數(shù)里揉阎,下面用交互操作演示一下后續(xù)步驟:
>>> import re
>>> import requests
>>> headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
}
>>> def get_xsrf():
#'''_xsrf 是一個(gè)動態(tài)變化的參數(shù)'''
index_url = 'https://www.zhihu.com/#signin'
#獲取登錄時(shí)要用到的_xsrf
index_page = requests.get(index_url, headers=headers)
html = index_page.text
pattern = r'name="_xsrf" value="(.*?)"'
#這里的_xsrf返回的是一個(gè)list
_xsrf = re.findall(pattern, html)
return _xsrf[0]
>>> xsrf = get_xsrf()
>>> formdata = {
'phone_num':'手機(jī)號', #輸入手機(jī)號
'password':'密碼', #輸入密碼
'_xsrf':xsrf
}
>>> url = 'https://www.zhihu.com/login/phone_num'
>>> r = requests.post(url=url, data=formdata)
>>> r.json()['msg']
'登錄成功'
>>> r.headers['Set-Cookie']
'aliyungf_tc=AQAAANuoyleZZwEADpFq2sqbL1fysy78; Path=/; HttpOnly, q_c1=41d9412dc9d6431cb6849a754f41f7a0|1489474734000|1489474734000; Domain=zhihu.com; expires=Fri, 13 Mar 2020 06:58:54 GMT; Path=/, z_c0="QUJCS0pUaFBYZ2dYQUFBQVlRSlZUYTRoNzFpUE5ScjNaNk5oTXJ3Q29ZVW9lT1JlM245Yk93PT0=|1489474734|79a8a1792039b9f1bb14f7562b6cba8068d95721"; Domain=zhihu.com; expires=Thu, 13 Apr 2017 06:58:54 GMT; httponly; Path=/, _xsrf=; Domain=zhihu.com; expires=Mon, 14 Mar 2016 06:58:54 GMT; Path=/, r_cap_id="ZWEwNTVlMjAxYjliNDcyNTgwZjA4MjQ4Y2ZlNGRjMmU=|1489474734|f422359103c09d0b366fa17960ef06bdb214dbe4"; Domain=zhihu.com; expires=Thu, 13 Apr 2017 06:58:54 GMT; Path=/, cap_id="NTBjZWJjNGI1MjkwNDRhNGI5YmU3N2Q4ZWY0YTJiOTc=|1489474734|451abe100589202b9e67ed860b77dfc4d076e051"; Domain=zhihu.com; expires=Thu, 13 Apr 2017 06:58:54 GMT; Path=/, l_cap_id=; Domain=zhihu.com; expires=Mon, 14 Mar 2016 06:58:54 GMT; Path=/'
>>> headers["Cookie"] = r2.headers['Set-Cookie']
>>> headers
{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36', 'Cookie': 'aliyungf_tc=AQAAANuoyleZZwEADpFq2sqbL1fysy78; Path=/; HttpOnly, q_c1=41d9412dc9d6431cb6849a754f41f7a0|1489474734000|1489474734000; Domain=zhihu.com; expires=Fri, 13 Mar 2020 06:58:54 GMT; Path=/, z_c0="QUJCS0pUaFBYZ2dYQUFBQVlRSlZUYTRoNzFpUE5ScjNaNk5oTXJ3Q29ZVW9lT1JlM245Yk93PT0=|1489474734|79a8a1792039b9f1bb14f7562b6cba8068d95721"; Domain=zhihu.com; expires=Thu, 13 Apr 2017 06:58:54 GMT; httponly; Path=/, _xsrf=; Domain=zhihu.com; expires=Mon, 14 Mar 2016 06:58:54 GMT; Path=/, r_cap_id="ZWEwNTVlMjAxYjliNDcyNTgwZjA4MjQ4Y2ZlNGRjMmU=|1489474734|f422359103c09d0b366fa17960ef06bdb214dbe4"; Domain=zhihu.com; expires=Thu, 13 Apr 2017 06:58:54 GMT; Path=/, cap_id="NTBjZWJjNGI1MjkwNDRhNGI5YmU3N2Q4ZWY0YTJiOTc=|1489474734|451abe100589202b9e67ed860b77dfc4d076e051"; Domain=zhihu.com; expires=Thu, 13 Apr 2017 06:58:54 GMT; Path=/, l_cap_id=; Domain=zhihu.com; expires=Mon, 14 Mar 2016 06:58:54 GMT; Path=/'}
可以看到登陸后的 cookie 已經(jīng)存入 headers 了
接著傳入這個(gè) headers 就可以保持登陸狀態(tài)。
用requests庫的session會是更方便的方法
四背捌、使用session
import requests
import re
s = requests.Session()
#session可以自動跟蹤cookie
s.headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
}
def get_xsrf():
#'''_xsrf 是一個(gè)動態(tài)變化的參數(shù)'''
index_url = 'https://www.zhihu.com/#signin'
#獲取登錄時(shí)要用到的_xsrf
index_page = s.get(index_url)
html = index_page.text
pattern = r'name="_xsrf" value="(.*?)"'
#這里的_xsrf返回的是一個(gè)list
_xsrf = re.findall(pattern, html)
return _xsrf[0]
xsrf = get_xsrf()
login_url = 'https://www.zhihu.com/login/phone_num' #用手機(jī)登陸
formdata ={
'phone_num':'手機(jī)號', #輸入手機(jī)號
'password':'密碼', #輸入密碼
'_xsrf':xsrf
}
r = s.post(url=login_url, data=formdata)
print(r.json()['msg'])
print(r.headers)
以后通過 s 對象的請求 headers 都會自動加上 cookie