使用requests可以模擬瀏覽器的請求刻盐,比起之前用到的urllib,requests模塊的api更加便捷(本質(zhì)就是封裝了urllib3)
注意:requests庫發(fā)送請求將網(wǎng)頁內(nèi)容下載下來以后原杂,并不會執(zhí)行js代碼,這需要我們自己分析目標站點然后發(fā)起新的request請求
官方文檔:http://cn.python-requests.org/zh_CN/latest/
安裝:pip3 install requests
requests模塊的各種請求方式
源碼構(gòu)成如下
以上方法均是在此方法的基礎(chǔ)上構(gòu)建
requests.request(method, url, ``*``*``kwargs)
其中最常用的請求方式就是post和get請求弦讽,泵智商污尉,post和get就是封裝了request請求的請求方式
>>> r ``=
requests.get(``'https://api.github.com/events'``)
相當于requests,request(method``=``'get'``, ``'https://api.github.com/events'``)
>>> r ``=
requests.post(``'http://httpbin.org/post'``, data ``=
{``'key'``:``'value'``})
相當于requests,request(method``=``'post'``, ``'https://api.github.com/events'``, data ``=
{``'key'``:``'value'``})
|
requests,request方法詳解
request()源碼
def
request(method, url, ``*``*``kwargs):
"""Constructs and sends a :class:
Request <Request>.
:param method: method for the new :class:
Requestobject.
:param url: URL for the new :class:
Requestobject.
:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:
Request.
:param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:
Request.
:param json: (optional) json data to send in the body of the :class:
Request.
:param headers: (optional) Dictionary of HTTP Headers to send with the :class:
Request.
:param cookies: (optional) Dict or CookieJar object to send with the :class:
Request.
:param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')
or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
to add for the file.
:param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How many seconds to wait for the server to send data
before giving up, as a float, or a :ref:
(connect timeout, read`
timeout) <timeouts>
tuple.`
:type timeout: float or tuple
:param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
:type allow_redirects: bool
:param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
:param verify: (optional) Either a boolean, in which case it controls whether we verify
the server's TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to ``True``.
:param stream: (optional) if ``False``, the response content will be immediately downloaded.
:param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
:return: :class:
Response <Response>object
:rtype: requests.Response
Usage::
>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
<Response [200]>
"""
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return
session.request(method``=``method, url``=``url, ``*``*``kwargs)
|
下面對源碼中的各個屬性進行分析
method和url
指名請求方式和請求路徑
requests.request(method``=``'get'``, url``=``'http://127.0.0.1:8000/test/'``)
requests.request(method``=``'post'``, url``=``'http://127.0.0.1:8000/test/'``)
|
params
requests模塊發(fā)送請求有data、json往产、params三種攜帶參數(shù)的方法被碗。
params在get請求中使用,data仿村、json在post請求中使用锐朴。
params可以接收的參數(shù):
-
可以是字典
-
可以是字符串
字典字符串都會被自動編碼發(fā)送到url
-
可以是字節(jié)(必須是ascii編碼以內(nèi))
|
接收字典字符串都會被自動編碼發(fā)送到url,如下
import
requests
wd``=``'egon老師'
pn``=``1
response``=``requests.get(``'https://www.baidu.com/s'``,
params``=``{
'wd'``:wd,
'pn'``:pn
},
headers``=``{
'User-Agent'``:``'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36'``,
})
print``(response.url)
# 輸出為:https://www.baidu.com/s?wd=egon%E8%80%81%E5%B8%88&pn=1
# 可見url已被自動編碼
|
上面代碼相當于如下代碼蔼囊,params編碼轉(zhuǎn)換本質(zhì)上是用urlencode
import
requests
from
urllib.parse ``import
urlencode
wd``=``'egon老師'
encode_res``=``urlencode({``'k'``:wd},encoding``=``'utf-8'``)
keyword``=``encode_res.split(``'='``)[``1``]
print``(keyword)
# 然后拼接成url
url``=``'https://www.baidu.com/s?wd=%s&pn=1'
%``keyword
response``=``requests.get(url,
headers``=``{
'User-Agent'``:``'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36'``,
})
print``(response.url)
# 輸出為:https://www.baidu.com/s?wd=egon%E8%80%81%E5%B8%88&pn=1
|
還有一點注意的就是接收字節(jié)數(shù)據(jù)時焚志,不能傳非ASCII碼外的字符,如下就是錯誤的
import
requests
# re = requests.request(method='get',
# url='http://127.0.0.1:8000/test/',
# params=bytes("k1=v1&k2=水電費&k3=v3&k3=vv3", encoding='utf8'))
|
data
requests模塊發(fā)送請求有data畏鼓、json酱酬、params三種攜帶參數(shù)的方法。params在get請求中使用云矫,data膳沽、json在post請求中使用。
data可以接收的參數(shù)為:字典让禀,字符串挑社,字節(jié),文件對象巡揍,data和json兩者的區(qū)別在于data的請求體為name=alex&age=18格式而json請求體為‘{'k1': 'v1', 'k2': '水電費'}’(字符串)
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
data``=``{``'k1'``: ``'v1'``, ``'k2'``: ``'水電費'``})
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
data``=``"k1=v1; k2=v2; k3=v3; k3=v4"
)
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
data``=``"k1=v1;k2=v2;k3=v3;k3=v4"``,
headers``=``{``'Content-Type'``: ``'application/x-www-form-urlencoded'``}
)
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
data``=``open``(``'data_file.py'``, mode``=``'r'``, encoding``=``'utf-8'``), ``# 文件內(nèi)容是:k1=v1;k2=v2;k3=v3;k3=v4
headers``=``{``'Content-Type'``: ``'application/x-www-form-urlencoded'``}
)
|
json
將json中對應(yīng)的數(shù)據(jù)進行序列化成一個字符串痛阻,json.dumps(...)
然后發(fā)送到服務(wù)器端的body中,并且Content-Type是 {'Content-Type': 'application/json'}
標志:payload
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
json``=``{``'k1'``: ``'v1'``, ``'k2'``: ``'水電費'``})
|
headers
發(fā)送請求頭到服務(wù)器
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
json``=``{``'k1'``: ``'v1'``, ``'k2'``: ``'水電費'``},
headers``=``{``'Content-Type'``: ``'application/x-www-form-urlencoded'``}
)
|
cookies
# 發(fā)送Cookie到服務(wù)器端
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
data``=``{``'k1'``: ``'v1'``, ``'k2'``: ``'v2'``},
cookies``=``{``'cook1'``: ``'value1'``},
)
# 也可以使用CookieJar(字典形式就是在此基礎(chǔ)上封裝)
from
http.cookiejar ``import
CookieJar
from
http.cookiejar ``import
Cookie
obj ``=
CookieJar()
obj.set_cookie(Cookie(version``=``0``, name``=``'c1'``, value``=``'v1'``, port``=``None``, domain``=``'``', path='``/``', secure``=``False``, expires``=``None``,
discard``=``True``, comment``=``None``, comment_url``=``None``, rest``=``{``'HttpOnly'``: ``None``}, rfc2109``=``False``,
port_specified``=``False``, domain_specified``=``False``, domain_initial_dot``=``False``, path_specified``=``False``)
)
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
data``=``{``'k1'``: ``'v1'``, ``'k2'``: ``'v2'``},
cookies``=``obj)
|
files
發(fā)送文件
file_dict ``=
{
'f1'``: ``open``(``'readme'``, ``'rb'``)
}
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
files``=``file_dict)
發(fā)送文件腮敌,定制文件名
file_dict ``=
{
'f1'``: (``'test.txt'``, ``open``(``'readme'``, ``'rb'``))
}
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
files``=``file_dict)
發(fā)送文件阱当,定制文件名
file_dict ``=
{
'f1'``: (``'test.txt'``, ``"hahsfaksfa9kasdjflaksdjf"``)
}
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
files``=``file_dict)
發(fā)送文件,定制文件名
file_dict ``=
{
'f1'``: (``'test.txt'``, ``"hahsfaksfa9kasdjflaksdjf"``, ``'application/text'``, {``'k1'``: ``'0'``})
}
requests.request(method``=``'POST'``,
url``=``'http://127.0.0.1:8000/test/'``,
files``=``file_dict)
|
auth認證
解決瀏覽器的自帶認證問題
認證設(shè)置:登陸網(wǎng)站是,彈出一個框,要求你輸入用戶名密碼(與alter很類似)缀皱,此時是無法獲取html的斗这,但本質(zhì)原理是拼接成請求頭發(fā)送
r.headers[``'Authorization'``] ``=
_basic_auth_str(``self``.username, ``self``.password)
|
一般的網(wǎng)站都不用默認的加密方式,都是自己寫啤斗,那么我們就需要按照網(wǎng)站的加密方式表箭,自己寫一個類似于_basic_auth_str的方法
得到加密字符串后添加到請求頭:r.headers['Authorization'] =func('.....')
HTTPBasicAuth實際是向瀏覽器發(fā)一個帶有Authorization:.................的請求
HTTPBasicAuth
from
requests.auth ``import
HTTPBasicAuth, HTTPDigestAuth
ret ``=
requests.get(``'https://api.github.com/user'``, auth``=``HTTPBasicAuth(``'wupeiqi'``, ``'sdfasdfasdf'``))
print``(ret.text)
|
auth別的使用方式
# ret = requests.get('http://192.168.1.1',
# auth=HTTPBasicAuth('admin', 'admin'))
# ret.encoding = 'gbk'
# print(ret.text)
# ret = requests.get('http://httpbin.org/digest-auth/auth/user/pass', auth=HTTPDigestAuth('user', 'pass'))
# print(ret)
|
timeout
兩種超時:float or tuple
timeout=0.1 #代表接收數(shù)據(jù)的超時時間
timeout=(0.1,0.2)#0.1代表鏈接超時 0.2代表接收數(shù)據(jù)的超時時間
import
requests
respone``=``requests.get(``'https://www.baidu.com'``,
timeout``=``0.0001``)
|
redirects
ret ``=
requests.get(``'http://127.0.0.1:8000/test/'``, allow_redirects``=``False``)
print``(ret.text)
|
proxies
代理設(shè)置
# 根據(jù)協(xié)議來確定發(fā)送請求時候的ip地址
proxies ``=
{
"http"``: ``"61.172.249.96:80"``,
"https"``: ``"http://61.185.219.126:3128"``,
}
# 根據(jù)接收請求的地址來確定用什么地址發(fā)送
proxies ``=
{``'http://10.20.1.128'``: ``'http://10.10.1.10:5323'``}
ret ``=
requests.get(``"http://www.proxy#/Proxy"``, proxies``=``proxies)
print``(ret.headers)
from
requests.auth ``import
HTTPProxyAuth
proxyDict ``=
{
'http'``: ``'77.75.105.165'``,
'https'``: ``'77.75.105.165'
}
auth ``=
HTTPProxyAuth(``'username'``, ``'mypassword'``)
r ``=
requests.get(``"http://www.google.com"``, proxies``=``proxyDict, auth``=``auth)
print``(r.text)
#支持socks代理,安裝:pip install requests[socks]
import
requests
proxies ``=
{
'http'``: ``'socks5://user:pass@host:port'``,
'https'``: ``'socks5://user:pass@host:port'
}
respone``=``requests.get(``'https://www.12306.cn'``,
proxies``=``proxies)
print``(respone.status_code)
|
stream
ret ``=
requests.get(``'http://127.0.0.1:8000/test/'``, stream``=``True``)
print``(ret.content)
ret.close()
# from contextlib import closing
# with closing(requests.get('http://httpbin.org/get', stream=True)) as r:
# # 在此處理響應(yīng)。
# for i in r.iter_content():
# print(i)
|
session
import
requests
session ``=
requests.Session()
### 1钮莲、首先登陸任何頁面免钻,獲取cookie
i1 ``=
session.get(url``=``"http://dig.chouti.com/help/service"``)
### 2、用戶登陸崔拥,攜帶上一次的cookie极舔,后臺對cookie中的 gpsd 進行授權(quán)
i2 ``=
session.post(
url``=``"http://dig.chouti.com/login"``,
data``=``{
'phone'``: ``"8615131255089"``,
'password'``: ``"xxxxxx"``,
'oneMonth'``: ""
}
)
i3 ``=
session.post(
url``=``"http://dig.chouti.com/link/vote?linksId=8589623"``,
)
print``(i3.text)
|
編碼問題
import
requests
response``=``requests.get(``'http://www.autohome.com/news'``)
# response.encoding='gbk' #汽車之家網(wǎng)站返回的頁面內(nèi)容為gb2312編碼的,而requests的默認編碼為ISO-8859-1链瓦,如果不設(shè)置成gbk則中文亂碼
print``(response.text)
|