如何閱讀源代碼(以 Python requests 庫為例)
對任何一位剛開始閱讀源碼的工程師來說,學會循序漸進莫矗,由表及里地對源碼進行閱讀和剖析,都不是一件容易的事情,因此需要有一些經驗和方法論予以指導嚎于。
本文是對 PyCon 2016 的 Let's read code: the requests library 的學習和總結,主要目標是通過對 kennethreitz 大神出品的 經典 Python 庫 requests
的學習和理解挟冠,一方面學習閱讀源代碼的方法于购,一方面在其中體會 Pythonic Programming 的細節(jié)。
本文將從配置開發(fā)環(huán)境開始知染,對requests
的一個單元測試進行深入解析肋僧,所有的筆記都是筆者實際操作得出。
- 源代碼使用的是最新的 requests master 分支版本。
- 遠程環(huán)境是 Ubuntu16.04, Python3.5
文章的內容包括:
準備:配置本地開發(fā)環(huán)境
TIPS: 配置默認 pip 源為國內源:
(requests) root@cld-chenjiaxi-test:/home/chenjiaxi/requests/src# vim ~/.pip/pip.conf
[global]
trusted-host=mirrors.aliyun.com
index-url=http://mirrors.aliyun.com/pypi/simple/
下載并安裝 requests嫌吠,配置環(huán)境止潘,通過基礎測試:
(requests) root@cld-chenjiaxi-test:/home/chenjiaxi/requests/src# make
(requests) root@cld-chenjiaxi-test:/home/chenjiaxi/requests/src# python setup.py test
running test
...
running build_ext
========================================================= test session starts =========================================================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: /home/chenjiaxi/requests/src, inifile: pytest.ini
plugins: httpbin-0.0.7, xdist-1.22.2, mock-1.10.0, forked-0.2, cov-2.5.1
gw0 [533] / gw1 [533] / gw2 [533] / gw3 [533]
scheduling tests via LoadScheduling
...
========================================= 518 passed, 13 skipped, 2 xfailed in 149.33 seconds =========================================
問題:理解一個 GET 請求
讀懂下面的代碼片段
>>> import requests
>>> print(requests)
<module 'requests' from '/home/chenjiaxi/requests/src/requests/__init__.py'>
>>> r = requests.get('https://api.github.com/user', auth=('chenjiaxi1993', '<mypassword>'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf-8'
>>> r.encoding
'utf-8'
>>> r.text
'{"login":"chenjiaxi1993",...'
>>> r.json()
{'updated_at': '2018-07-02T14:42:13Z', ...}
從單元測試開始
關于requests.get
的單元測試數量居然有 69個:
Jchen@iMac-3 ~/requests master ● git grep requests.get tests/test_requests.py | wc -l
69
挑選其中一個單元測試進行深入閱讀。
閱讀一個單元測試
tests.test_requests.TestRequests#test_DIGEST_HTTP_200_OK_GET
def test_DIGEST_HTTP_200_OK_GET(self, httpbin):
for authtype in self.digest_auth_algo:
# note1
auth = HTTPDigestAuth('user', 'pass')
url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never')
# note2
r = requests.get(url, auth=auth)
assert r.status_code == 200
r = requests.get(url)
assert r.status_code == 401
print(r.headers['WWW-Authenticate'])
# note3
s = requests.session()
s.auth = HTTPDigestAuth('user', 'pass')
r = s.get(url)
assert r.status_code == 200
從以上的測試函數居兆,列出三個方面的知識或者問題
-
auth
,url
對象的初始化 -> 它們是什么覆山?如何初始化? - 兩個測試用例 ->
requests.get
的基本用法 -
session
的使用 -> 什么是session
泥栖?
HTTPDigestAuth
第一個問題的探究簇宽,先看一下HTTPDigestAuth
的類定義:
requests.auth.HTTPDigestAuth
class HTTPDigestAuth(AuthBase):
"""Attaches HTTP Digest Authentication to the given Request object."""
def __init__(self, username, password):
self.username = username
self.password = password
# Keep state in per-thread local storage
self._thread_local = threading.local()
def init_per_thread_state(self):
# Ensure state is initialized just once per-thread
def build_digest_header(self, method, url):
def handle_redirect(self, r, **kwargs):
"""Reset num_401_calls counter on redirects."""
def handle_401(self, r, **kwargs):
"""
Takes the given response and tries digest-auth, if needed.
:rtype: requests.Response
"""
參考 http://docs.python-requests.org/en/master/user/authentication/ 可以進一步理解 HTTPDigestAuth
Digest authentication
Digest access authentication is one of the agreed-upon methods a web server can use to negotiate credentials, such as username or password, with a user's web browser. This can be used to confirm the identity of a user before sending sensitive information, such as online banking transaction history. It applies a hash function to a password before sending it over the network, which is safer than basic access authentication, which sends plaintext.Technically, digest authentication is an application of MD5 cryptographic hashing with usage of nonce values to prevent replay attacks. It uses the HTTP protocol.
摘要式身份驗證
提示用戶輸入用戶名和密碼(也稱作憑據),并在通過網絡進行傳輸之前使用其他數據進行哈希處理的身份驗證方法吧享。
來源于: 維基百科
到此我們知道了 HTTPDigestAuth
可以理解為使用 user + password 進行驗證的一種方式魏割。
httpbin
第二個問題是了解httpbin
函數,但是在代碼中httpbin
是上層傳入的參數钢颂,除此之外無法找到更多的信息钞它。
可以嘗試的方法:
- 查閱 requests 文檔
- 如果找不到相關文檔,通過 debugger 來了解
查閱 requests 文檔
通過 http://docs.python-requests.org/en/master/search/?q=httpbin&check_keywords=yes&area=default 可以查到
httpbin
的文檔: https://httpbin.org/
httpbin: A simple HTTP Request & Response Service.
嘗試使用 httpbin:
>>> import requests
>>> resp = requests.post('http://httpbin.org/post',data={'name':"chenjiaxi"})
>>> resp.json()
{'args': {}, 'form': {'name': 'chenjiaxi'}, 'files': {}, 'url': 'http://httpbin.org/post', 'json': None, 'data': '', 'headers': {'Connection': 'close', 'User-Agent': 'python-requests/2.19.1', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'Content-Length': '16', 'Content-Type': 'application/x-www-form-urlencoded', 'Accept': '*/*'}, 'origin': '42.186.112.21'}
httpbin 是 requests 用來封裝 http 方法的一個組件殊鞭。
使用 pdb 調試
借助 pdb 可以對 Python 代碼進行調試遭垛,通過pdb.set_trace()
加入斷點:
def test_DIGEST_HTTP_200_OK_GET(self, httpbin):
for authtype in self.digest_auth_algo:
auth = HTTPDigestAuth('user', 'pass')
url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never')
import pdb
pdb.set_trace()
...
運行程序,運行到斷點時會停下來操灿,查看當前的url
變量的值:
Testing started at 17:49 ...
ssh://chenjiaxi@xxx/home/chenjiaxi/requests/opt/requests/bin/python -u /home/chenjiaxi/.pycharm_helpers/pycharm/_jb_pytest_runner.py --target tests/test_requests.py::TestRequests.test_DIGEST_HTTP_200_OK_GET
Launching py.test with arguments tests/test_requests.py::TestRequests::test_DIGEST_HTTP_200_OK_GET in /home/chenjiaxi/requests/src
============================= test session starts ==============================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: /home/chenjiaxi/requests/src, inifile: pytest.ini
plugins: xdist-1.22.2, mock-1.10.0, httpbin-0.0.7, forked-0.2, cov-2.5.1
collected 1 item
tests/test_requests.py
>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
> /home/chenjiaxi/requests/src/tests/test_requests.py(595)test_DIGEST_HTTP_200_OK_GET()
-> r = requests.get(url, auth=auth)
(Pdb) url
'http://127.0.0.1:40631/digest-auth/auth/user/pass/MD5/never'
通過 list
命令查看當前運行的代碼塊:
(Pdb) list
list
590 url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never')
591
592 import pdb
593 pdb.set_trace()
594
595 -> r = requests.get(url, auth=auth)
596 assert r.status_code == 200
597
598 r = requests.get(url)
599 assert r.status_code == 401
600 print(r.headers['WWW-Authenticate'])
通過 c
命令讓程序跳過斷點繼續(xù)執(zhí)行:
(Pdb) c
c
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 200 37
Digest nonce="6eeb7626165a8ffdd8fcac8c608a2350", realm="me@kennethreitz.com", qop="auth", stale=FALSE, algorithm=MD5, opaque="04b7cffdd42f6a3575401089dab14b16"
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 200 37
>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
> /home/chenjiaxi/requests/src/tests/test_requests.py(593)test_DIGEST_HTTP_200_OK_GET()
-> pdb.set_trace()
(Pdb) c
c
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 200 37
Digest nonce="aa4ba69f64892c28a60434d7cc476c59a7b2c4444b0f92fa68f7eb52b3caa7f2", realm="me@kennethreitz.com", qop="auth", stale=FALSE, algorithm=SHA-256, opaque="303b8f4fdd8360cbb9663099eb5c4bf6f91b9c48ce69f4a5e2d19aec9532b4a4"
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 200 37
>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
> /home/chenjiaxi/requests/src/tests/test_requests.py(595)test_DIGEST_HTTP_200_OK_GET()
-> r = requests.get(url, auth=auth)
(Pdb) c
c
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 200 37
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 401 0
Digest nonce="5b703ded2588e07187c583973274a037ed48add03f41e21f594affaf2fa359de63bbb200d727e48a27687d4d104927196f2691a4fcdf65a7d453c9422750aba2", realm="me@kennethreitz.com", qop="auth", stale=FALSE, algorithm=SHA-512, opaque="4cf39ce7c546b8da0125193c5c9539ee071ac882ccf8080431402a8e51f2c8a952b9034c4a5828f56a3187de570cef49d0c587ae901b260fe4310540ae477da6"
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 200 37
. [100%]
========================== 1 passed in 22.09 seconds ===========================
Process finished with exit code 0
通過 pdb锯仪,可以觀察到整個單元測試的運行過程,平時都是通過 Pycharm 進行調試趾盐,直觀方便庶喜,但是掌握 pdb 的調試對服務器編程來說也非常有必要。
requests.get()
結束了對auth
, url
初始化過程后, 進入到兩個測試用例的代碼:
r = requests.get(url, auth=auth)
assert r.status_code == 200
r = requests.get(url)
assert r.status_code == 401
print(r.headers['WWW-Authenticate'])
這個部分的問題是救鲤,requests.get
中發(fā)生了什么?
requests.api.get
def get(url, params=None, **kwargs):
r"""Sends a GET request.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
"""
kwargs.setdefault('allow_redirects', True)
return request('get', url, params=params, **kwargs)
- 設置可選參數
kwargs
的默認值久窟,允許重定向 - 返回一個
Response
類型的對象,而這個對象是request()
函數調用的返回值本缠。
問題:
requst()
函數的返回值是什么斥扛?
requst()
requests.api.request
def request(method, url, **kwargs):
"""Constructs and sends a :class:`Request <Request>`.
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
:param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
...
:return: :class:`Response <Response>` object
:rtype: requests.Response
Usage::
>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
<Response [200]>
"""
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
根據傳入的參數完成一次 HTTP 請求,并返回 Response搓茬。
在這個函數里可以看到近乎完備的注釋犹赖,包括對每一個參數的詳細解釋,簡單的調用例子卷仑,以及通過with
來管理資源對象的經典實用峻村。
通過
with
來管理對象,可以通過對象上下文保證對象的生命周期管理锡凝,對有限的資源類型的對象(比如 HTTP 連接粘昨,數據庫連接,文件描述符等)非常適用。
OK张肾,看到這里我們發(fā)現 api.request()
實際上是通過 session.request()
來完成的芭析,那么問題來了:
session
是什么?
sessions
requests.sessions.Session
class Session(SessionRedirectMixin):
"""A Requests session.
Provides cookie persistence, connection-pooling, and configuration.
Basic Usage::
>>> import requests
>>> s = requests.Session()
>>> s.get('http://httpbin.org/get')
<Response [200]>
Or as a context manager::
>>> with requests.Session() as s:
>>> s.get('http://httpbin.org/get')
<Response [200]>
"""
查閱文檔: http://docs.python-requests.org/en/master/user/advanced/
The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3’s connection pooling. So if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
Session(會話):用于同一端到端多次通信下 TCP 連接復用吞瞪,提高性能
- 保留參數信息 和 cookie
- 利用
urllib3
的連接池 - 可以為 request 對象提供默認數據
需要注意的是: 雖然 request 的調用最終都會由 session 來實現馁启,但是 request 層級上的參數信息是不會保留的?
(這里比較難理解,需要重新再查閱資料)
Section.request()
對 Session 這個類有了基本的理解后芍秆,進一步了解Session
提供的request()
方法的具體實現:
requests.sessions.Session#request
def request(self, method, url,
params=None, data=None, headers=None, cookies=None, files=None,
auth=None, timeout=None, allow_redirects=True, proxies=None,
hooks=None, stream=None, verify=None, cert=None, json=None):
"""Constructs a :class:`Request <Request>`, prepares it and sends it.
Returns :class:`Response <Response>` object.
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query
string for the :class:`Request`.
:param data: (optional) Dictionary, bytes, or file-like object to send
in the body of the :class:`Request`.
:param json: (optional) json to send in the body of the
:class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the
:class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the
:class:`Request`.
:param files: (optional) Dictionary of ``'filename': file-like-objects``
for multipart encoding upload.
:param auth: (optional) Auth tuple or callable to enable
Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How long to wait for the server to send
data before giving up, as a float, or a :ref:`(connect timeout,
read timeout) <timeouts>` tuple.
:type timeout: float or tuple
:param allow_redirects: (optional) Set to True by default.
:type allow_redirects: bool
:param proxies: (optional) Dictionary mapping protocol or protocol and
hostname to the URL of the proxy.
:param stream: (optional) whether to immediately download the response
content. Defaults to ``False``.
:param verify: (optional) Either a boolean, in which case it controls whether we verify
the server's TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to ``True``.
:param cert: (optional) if String, path to ssl client cert file (.pem).
If Tuple, ('cert', 'key') pair.
:rtype: requests.Response
"""
# Create the Request.
req = Request(
method=method.upper(),
url=url,
headers=headers,
files=files,
data=data or {},
json=json,
params=params or {},
auth=auth,
cookies=cookies,
hooks=hooks,
)
prep = self.prepare_request(req)
proxies = proxies or {}
settings = self.merge_environment_settings(
prep.url, proxies, stream, verify, cert
)
# Send the request.
send_kwargs = {
'timeout': timeout,
'allow_redirects': allow_redirects,
}
send_kwargs.update(settings)
resp = self.send(prep, **send_kwargs)
return resp
可以拆分為以下四個步驟:
- 創(chuàng)建
Request
對象 request - 創(chuàng)建 prepare request 對象
prep
- 發(fā)送 request
send
-
send
返回值 response惯疙,返回給用戶
那么問題來了:
- 什么是
Request
? - 什么是
prepare_request
妖啥? - 發(fā)送 request 的過程霉颠?
-
resp
是什么?
requests.models.Request
class Request(RequestHooksMixin):
"""A user-created :class:`Request <Request>` object.
Used to prepare a :class:`PreparedRequest <PreparedRequest>`, which is sent to the server.
:param method: HTTP method to use.
:param url: URL to send.
:param headers: dictionary of headers to send.
:param files: dictionary of {filename: fileobject} files to multipart upload.
:param data: the body to attach to the request. If a dictionary is provided, form-encoding will take place.
:param json: json for the body to attach to the request (if files or data is not specified).
:param params: dictionary of URL parameters to append to the URL.
:param auth: Auth handler or (user, pass) tuple.
:param cookies: dictionary or CookieJar of cookies to attach to this request.
:param hooks: dictionary of callback hooks, for internal usage.
Usage::
>>> import requests
>>> req = requests.Request('GET', 'http://httpbin.org/get')
>>> req.prepare()
<PreparedRequest [GET]>
"""
根據用戶傳入的一系列傳輸構建的 request荆虱,用于準備真正傳送出去的 PreparedRequest
prepare_request()
def prepare_request(self, request):
"""Constructs a :class:`PreparedRequest <PreparedRequest>` for
transmission and returns it. The :class:`PreparedRequest` has settings
merged from the :class:`Request <Request>` instance and those of the
:class:`Session`.
:param request: :class:`Request` instance to prepare with this
session's settings.
:rtype: requests.PreparedRequest
"""
...
p = PreparedRequest()
p.prepare(
method=request.method.upper(),
url=request.url,
files=request.files,
data=request.data,
json=request.json,
headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict),
params=merge_setting(request.params, self.params),
auth=merge_setting(auth, self.auth),
cookies=merged_cookies,
hooks=merge_hooks(request.hooks, self.hooks),
)
return p
- 創(chuàng)建
PreparedRequest
對象p
- 調用
p.prepare()
然后返回p
問題來了:
-
PreparedRequest
是什么? -
p.prepare()
中發(fā)生了什么蒿偎?
PreparedRequest
requests.models.PreparedRequest
class PreparedRequest(RequestEncodingMixin, RequestHooksMixin):
"""The fully mutable :class:`PreparedRequest <PreparedRequest>` object,
containing the exact bytes that will be sent to the server.
Generated from either a :class:`Request <Request>` object or manually.
Usage::
>>> import requests
>>> req = requests.Request('GET', 'http://httpbin.org/get')
>>> r = req.prepare()
<PreparedRequest [GET]>
>>> s = requests.Session()
>>> s.send(r)
<Response [200]>
"""
包含實際傳輸的字節(jié),session 傳輸到 server 的實際對象怀读。
def prepare(self,
method=None, url=None, headers=None, files=None, data=None,
params=None, auth=None, cookies=None, hooks=None, json=None):
"""Prepares the entire request with the given parameters."""
self.prepare_method(method)
self.prepare_url(url, params)
self.prepare_headers(headers)
self.prepare_cookies(cookies)
self.prepare_body(data, files, json)
self.prepare_auth(auth, url)
# Note that prepare_auth must be last to enable authentication schemes
# such as OAuth to work on a fully prepared request.
# This MUST go after prepare_auth. Authenticators could add a hook
self.prepare_hooks(hooks)
通過一系列的步驟, 完成了整個 request 的準備诉位。
send
準備好了要發(fā)送的對象后,調用Session.send()
發(fā)送到 server:
requests.sessions.Session#send
def send(self, request, **kwargs):
"""Send a given PreparedRequest.
:rtype: requests.Response
"""
# Set defaults that the hooks can utilize to ensure they always have
# the correct parameters to reproduce the previous request.
...
# Get the appropriate adapter to use
adapter = self.get_adapter(url=request.url)
# Start time (approximately) of the request
start = preferred_clock()
# Send the request
r = adapter.send(request, **kwargs)
# Total elapsed time of the request (approximately)
elapsed = preferred_clock() - start
r.elapsed = timedelta(seconds=elapsed)
# Response manipulation hooks
r = dispatch_hook('response', hooks, r, **kwargs)
...
return r
在這里可以看到實際上 send 是通過 adapater
來實現的菜枷,有出現了新的問題:
- 為什么要用 adapter?
- 什么是 adapter?
- adapter 是怎么實現的不从?
transport adapters
查閱官方文檔: http://docs.python-requests.org/en/latest/user/advanced/?highlight=adapter
Transport Adapters provide a mechanism to define interaction methods for an HTTP service. In particular, they allow you to apply per-service configuration.
Requests ships with a single Transport Adapter, the HTTPAdapter. This adapter provides the default Requests interaction with HTTP and HTTPS using the powerful urllib3 library.
Requests enables users to create and use their own Transport Adapters that provide specific functionality.
- 提供定義 HTTP 服務的通訊方法的機制
- 默認使用
HTTPAdapter
,基于urllib3
- 用戶可以自定義 Adapter犁跪,而 send 機制不需要做改變
這里對 Adapter 的使用可以說是非常好的抽象,面向接口編程的典范歹袁,可以通過這個例子進一步了解 Adapter 這個設計模式的知識坷衍。
HTTPAdapter
進一步通過默認的 HTTPAdapter
的代碼,了解一個 Adapter 的定義:
requests.adapters.HTTPAdapter
class HTTPAdapter(BaseAdapter):
"""The built-in HTTP Adapter for urllib3.
Provides a general-case interface for Requests sessions to contact HTTP and
HTTPS urls by implementing the Transport Adapter interface. This class will
usually be created by the :class:`Session <Session>` class under the
covers.
:param pool_connections: The number of urllib3 connection pools to cache.
:param pool_maxsize: The maximum number of connections to save in the pool.
:param max_retries: The maximum number of retries each connection
should attempt. Note, this applies only to failed DNS lookups, socket
connections and connection timeouts, never to requests where data has
made it to the server. By default, Requests does not retry failed
connections. If you need granular control over the conditions under
which we retry a request, import urllib3's ``Retry`` class and pass
that instead.
:param pool_block: Whether the connection pool should block for connections.
Usage::
>>> import requests
>>> s = requests.Session()
>>> a = requests.adapters.HTTPAdapter(max_retries=3)
>>> s.mount('http://', a)
"""
HTTPAdapter.send()
def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
"""Sends PreparedRequest object. Returns Response object.
:param request: The :class:`PreparedRequest <PreparedRequest>` being sent.
:param stream: (optional) Whether to stream the request content.
:param timeout: (optional) How long to wait for the server to send
data before giving up, as a float, or a :ref:`(connect timeout,
read timeout) <timeouts>` tuple.
:type timeout: float or tuple or urllib3 Timeout object
:param verify: (optional) Either a boolean, in which case it controls whether
we verify the server's TLS certificate, or a string, in which case it
must be a path to a CA bundle to use
:param cert: (optional) Any user-provided SSL certificate to be trusted.
:param proxies: (optional) The proxies dictionary to apply to the request.
:rtype: requests.Response
"""
# Connection establish
conn = self.get_connection(request.url, proxies)
...
chunked = not (request.body is None or 'Content-Length' in request.headers)
# Timeout mechanism
...
try:
if not chunked:
resp = conn.urlopen(
...
)
# Send the request.
else:
if hasattr(conn, 'proxy_pool'):
conn = conn.proxy_pool
low_conn = conn._get_conn(timeout=DEFAULT_POOL_TIMEOUT)
try:
low_conn.putrequest(request.method,
url,
skip_accept_encoding=True)
for header, value in request.headers.items():
low_conn.putheader(header, value)
low_conn.endheaders()
for i in request.body:
low_conn.send(hex(len(i))[2:].encode('utf-8'))
low_conn.send(b'\r\n')
low_conn.send(i)
low_conn.send(b'\r\n')
low_conn.send(b'0\r\n\r\n')
# Receive the response from the server
try:
# For Python 2.7+ versions, use buffering of HTTP
# responses
r = low_conn.getresponse(buffering=True)
except TypeError:
# For compatibility with Python 2.6 versions and back
r = low_conn.getresponse()
resp = HTTPResponse.from_httplib(
r,
pool=conn,
connection=low_conn,
preload_content=False,
decode_content=False
)
except:
# If we hit any problems here, clean up the connection.
# Then, reraise so that we can handle the actual exception.
low_conn.close()
raise
# Exception Handling
except (ProtocolError, socket.error) as err:
raise ConnectionError(err, request=request)
....
return self.build_response(request, resp)
略過連接建立条舔,超時機制枫耳,異常處理的部分,只看實際發(fā)送請求的部分:
- 從 urllib3 維護的 Connection Pool 中獲取連接
- 添加 request 主體
putrequest
- 添加 request 頭部
putheader
- 序列化
request.body
- 發(fā)送 request
- 接受 response
最后通過調用 build_response
來基于 urllib3 response 構建 request.Respnse
對象返回給用戶孟抗,到此為止一次 requests.get()
動作便結束迁杨。
阻塞和非阻塞
閱讀官方文檔時看到有關 Blocking Or Non-Blocking?
的部分,摘錄如下:
With the default Transport Adapter in place, Requests does not provide any kind of non-blocking IO. The Response.content property will block until the entire response has been downloaded. If you require more granularity, the streaming features of the library (see Streaming Requests) allow you to retrieve smaller quantities of the response at a time. However, these calls will still block.
If you are concerned about the use of blocking IO, there are lots of projects out there that combine Requests with one of Python’s asynchronicity frameworks. Some excellent examples are requests-threads, grequests, and requests-futures.
requests 默認是阻塞的凄硼,當通過 requests 進行 IO 時延長的 同步 HTTP 請求時铅协,可以使用 grequests
,基于 gevent
提供 協程調用 requests摊沉。
總結
通過以上的分析狐史,可以將一次requests.get()
總結為以下的流程圖:
同時通過本次的學習,也可以感受到真正的開源不止是代碼,還得包括一系列的文檔和社區(qū)骏全,kennethreitz 大神同時還開源了教你如何寫 Pythonic 代碼的指引: The Hitchhiker’s Guide to Python!苍柏,另外 GitHub 上有開源閱讀 requests 源碼的筆記 read_requests 也可供參考。