如何閱讀源代碼(以 Python requests 庫為例)


requests: http library for humans

如何閱讀源代碼(以 Python requests 庫為例)

對任何一位剛開始閱讀源碼的工程師來說,學會循序漸進莫矗,由表及里地對源碼進行閱讀和剖析,都不是一件容易的事情,因此需要有一些經驗和方法論予以指導嚎于。

本文是對 PyCon 2016 的 Let's read code: the requests library 的學習和總結,主要目標是通過對 kennethreitz 大神出品的 經典 Python 庫 requests 的學習和理解挟冠,一方面學習閱讀源代碼的方法于购,一方面在其中體會 Pythonic Programming 的細節(jié)。

本文將從配置開發(fā)環(huán)境開始知染,對requests的一個單元測試進行深入解析肋僧,所有的筆記都是筆者實際操作得出。

  • 源代碼使用的是最新的 requests master 分支版本。
  • 遠程環(huán)境是 Ubuntu16.04, Python3.5

文章的內容包括:

準備:配置本地開發(fā)環(huán)境

TIPS: 配置默認 pip 源為國內源:

(requests) root@cld-chenjiaxi-test:/home/chenjiaxi/requests/src# vim ~/.pip/pip.conf

[global]
trusted-host=mirrors.aliyun.com
index-url=http://mirrors.aliyun.com/pypi/simple/

下載并安裝 requests嫌吠,配置環(huán)境止潘,通過基礎測試:

(requests) root@cld-chenjiaxi-test:/home/chenjiaxi/requests/src# make

(requests) root@cld-chenjiaxi-test:/home/chenjiaxi/requests/src# python setup.py test
running test
...
running build_ext
========================================================= test session starts =========================================================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: /home/chenjiaxi/requests/src, inifile: pytest.ini
plugins: httpbin-0.0.7, xdist-1.22.2, mock-1.10.0, forked-0.2, cov-2.5.1
gw0 [533] / gw1 [533] / gw2 [533] / gw3 [533]
scheduling tests via LoadScheduling
...
========================================= 518 passed, 13 skipped, 2 xfailed in 149.33 seconds =========================================

問題:理解一個 GET 請求

讀懂下面的代碼片段

>>> import requests
>>> print(requests)
<module 'requests' from '/home/chenjiaxi/requests/src/requests/__init__.py'>
>>> r = requests.get('https://api.github.com/user', auth=('chenjiaxi1993', '<mypassword>'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf-8'
>>> r.encoding
'utf-8'
>>> r.text
'{"login":"chenjiaxi1993",...'
>>> r.json()
{'updated_at': '2018-07-02T14:42:13Z', ...}

從單元測試開始

關于requests.get 的單元測試數量居然有 69個:

Jchen@iMac-3  ~/requests   master ●  git grep requests.get tests/test_requests.py | wc -l
      69

挑選其中一個單元測試進行深入閱讀。

閱讀一個單元測試

tests.test_requests.TestRequests#test_DIGEST_HTTP_200_OK_GET

    def test_DIGEST_HTTP_200_OK_GET(self, httpbin):

        for authtype in self.digest_auth_algo:
            # note1
            auth = HTTPDigestAuth('user', 'pass')
            url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never')

            # note2
            r = requests.get(url, auth=auth)
            assert r.status_code == 200

            r = requests.get(url)
            assert r.status_code == 401
            print(r.headers['WWW-Authenticate'])

            # note3
            s = requests.session()
            s.auth = HTTPDigestAuth('user', 'pass')
            r = s.get(url)
            assert r.status_code == 200

從以上的測試函數居兆,列出三個方面的知識或者問題

  1. auth,url 對象的初始化 -> 它們是什么覆山?如何初始化?
  2. 兩個測試用例 -> requests.get 的基本用法
  3. session的使用 -> 什么是 session泥栖?

HTTPDigestAuth

第一個問題的探究簇宽,先看一下HTTPDigestAuth的類定義:

requests.auth.HTTPDigestAuth

class HTTPDigestAuth(AuthBase):
    """Attaches HTTP Digest Authentication to the given Request object."""

    def __init__(self, username, password):
        self.username = username
        self.password = password
        # Keep state in per-thread local storage
        self._thread_local = threading.local()
        
    def init_per_thread_state(self):
        # Ensure state is initialized just once per-thread
        

    def build_digest_header(self, method, url):

    def handle_redirect(self, r, **kwargs):
        """Reset num_401_calls counter on redirects."""


    def handle_401(self, r, **kwargs):
        """
        Takes the given response and tries digest-auth, if needed.

        :rtype: requests.Response
        """

參考 http://docs.python-requests.org/en/master/user/authentication/ 可以進一步理解 HTTPDigestAuth

Digest authentication
Digest access authentication is one of the agreed-upon methods a web server can use to negotiate credentials, such as username or password, with a user's web browser. This can be used to confirm the identity of a user before sending sensitive information, such as online banking transaction history. It applies a hash function to a password before sending it over the network, which is safer than basic access authentication, which sends plaintext.Technically, digest authentication is an application of MD5 cryptographic hashing with usage of nonce values to prevent replay attacks. It uses the HTTP protocol.
摘要式身份驗證
提示用戶輸入用戶名和密碼(也稱作憑據),并在通過網絡進行傳輸之前使用其他數據進行哈希處理的身份驗證方法吧享。
來源于: 維基百科

到此我們知道了 HTTPDigestAuth 可以理解為使用 user + password 進行驗證的一種方式魏割。

httpbin

第二個問題是了解httpbin函數,但是在代碼中httpbin是上層傳入的參數钢颂,除此之外無法找到更多的信息钞它。

可以嘗試的方法:

  • 查閱 requests 文檔
  • 如果找不到相關文檔,通過 debugger 來了解

查閱 requests 文檔

通過 http://docs.python-requests.org/en/master/search/?q=httpbin&check_keywords=yes&area=default 可以查到
httpbin 的文檔: https://httpbin.org/

httpbin: A simple HTTP Request & Response Service.

嘗試使用 httpbin:

>>> import requests
>>> resp = requests.post('http://httpbin.org/post',data={'name':"chenjiaxi"})
>>> resp.json()
{'args': {}, 'form': {'name': 'chenjiaxi'}, 'files': {}, 'url': 'http://httpbin.org/post', 'json': None, 'data': '', 'headers': {'Connection': 'close', 'User-Agent': 'python-requests/2.19.1', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'Content-Length': '16', 'Content-Type': 'application/x-www-form-urlencoded', 'Accept': '*/*'}, 'origin': '42.186.112.21'}

httpbin 是 requests 用來封裝 http 方法的一個組件殊鞭。

使用 pdb 調試

借助 pdb 可以對 Python 代碼進行調試遭垛,通過pdb.set_trace() 加入斷點:

    def test_DIGEST_HTTP_200_OK_GET(self, httpbin):

        for authtype in self.digest_auth_algo:
            auth = HTTPDigestAuth('user', 'pass')
            url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never')

            import pdb
            pdb.set_trace()

            ...

運行程序,運行到斷點時會停下來操灿,查看當前的url 變量的值:

Testing started at 17:49 ...
ssh://chenjiaxi@xxx/home/chenjiaxi/requests/opt/requests/bin/python -u /home/chenjiaxi/.pycharm_helpers/pycharm/_jb_pytest_runner.py --target tests/test_requests.py::TestRequests.test_DIGEST_HTTP_200_OK_GET
Launching py.test with arguments tests/test_requests.py::TestRequests::test_DIGEST_HTTP_200_OK_GET in /home/chenjiaxi/requests/src

============================= test session starts ==============================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: /home/chenjiaxi/requests/src, inifile: pytest.ini
plugins: xdist-1.22.2, mock-1.10.0, httpbin-0.0.7, forked-0.2, cov-2.5.1
collected 1 item                                                               

tests/test_requests.py 
>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
> /home/chenjiaxi/requests/src/tests/test_requests.py(595)test_DIGEST_HTTP_200_OK_GET()
-> r = requests.get(url, auth=auth)
(Pdb) url
'http://127.0.0.1:40631/digest-auth/auth/user/pass/MD5/never'

通過 list 命令查看當前運行的代碼塊:

(Pdb) list
list
590                 url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never')
591     
592                 import pdb
593                 pdb.set_trace()
594     
595  ->             r = requests.get(url, auth=auth)
596                 assert r.status_code == 200
597     
598                 r = requests.get(url)
599                 assert r.status_code == 401
600                 print(r.headers['WWW-Authenticate'])

通過 c 命令讓程序跳過斷點繼續(xù)執(zhí)行:

(Pdb) c
c
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 200 37
Digest nonce="6eeb7626165a8ffdd8fcac8c608a2350", realm="me@kennethreitz.com", qop="auth", stale=FALSE, algorithm=MD5, opaque="04b7cffdd42f6a3575401089dab14b16"
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:10] "GET /digest-auth/auth/user/pass/MD5/never HTTP/1.1" 200 37

>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
> /home/chenjiaxi/requests/src/tests/test_requests.py(593)test_DIGEST_HTTP_200_OK_GET()
-> pdb.set_trace()
(Pdb) c
c
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 200 37
Digest nonce="aa4ba69f64892c28a60434d7cc476c59a7b2c4444b0f92fa68f7eb52b3caa7f2", realm="me@kennethreitz.com", qop="auth", stale=FALSE, algorithm=SHA-256, opaque="303b8f4fdd8360cbb9663099eb5c4bf6f91b9c48ce69f4a5e2d19aec9532b4a4"
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:19] "GET /digest-auth/auth/user/pass/SHA-256/never HTTP/1.1" 200 37

>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
> /home/chenjiaxi/requests/src/tests/test_requests.py(595)test_DIGEST_HTTP_200_OK_GET()
-> r = requests.get(url, auth=auth)
(Pdb) c
c
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 200 37
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 401 0
Digest nonce="5b703ded2588e07187c583973274a037ed48add03f41e21f594affaf2fa359de63bbb200d727e48a27687d4d104927196f2691a4fcdf65a7d453c9422750aba2", realm="me@kennethreitz.com", qop="auth", stale=FALSE, algorithm=SHA-512, opaque="4cf39ce7c546b8da0125193c5c9539ee071ac882ccf8080431402a8e51f2c8a952b9034c4a5828f56a3187de570cef49d0c587ae901b260fe4310540ae477da6"
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 401 0
127.0.0.1 - - [14/Jul/2018 20:16:22] "GET /digest-auth/auth/user/pass/SHA-512/never HTTP/1.1" 200 37
.                                                 [100%]

========================== 1 passed in 22.09 seconds ===========================
Process finished with exit code 0

通過 pdb锯仪,可以觀察到整個單元測試的運行過程,平時都是通過 Pycharm 進行調試趾盐,直觀方便庶喜,但是掌握 pdb 的調試對服務器編程來說也非常有必要。

requests.get()

結束了對auth, url 初始化過程后, 進入到兩個測試用例的代碼:

            r = requests.get(url, auth=auth)
            assert r.status_code == 200

            r = requests.get(url)
            assert r.status_code == 401
            print(r.headers['WWW-Authenticate'])

這個部分的問題是救鲤,requests.get 中發(fā)生了什么?

requests.api.get

def get(url, params=None, **kwargs):
    r"""Sends a GET request.

    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    kwargs.setdefault('allow_redirects', True)
    return request('get', url, params=params, **kwargs)
  1. 設置可選參數 kwargs 的默認值久窟,允許重定向
  2. 返回一個Response類型的對象,而這個對象是request()函數調用的返回值本缠。

問題:

requst() 函數的返回值是什么斥扛?

requst()

requests.api.request

def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
    ...
    :return: :class:`Response <Response>` object
    :rtype: requests.Response

    Usage::

      >>> import requests
      >>> req = requests.request('GET', 'http://httpbin.org/get')
      <Response [200]>
    """

    # By using the 'with' statement we are sure the session is closed, thus we
    # avoid leaving sockets open which can trigger a ResourceWarning in some
    # cases, and look like a memory leak in others.
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)

根據傳入的參數完成一次 HTTP 請求,并返回 Response搓茬。
在這個函數里可以看到近乎完備的注釋犹赖,包括對每一個參數的詳細解釋,簡單的調用例子卷仑,以及通過with來管理資源對象的經典實用峻村。

通過with來管理對象,可以通過對象上下文保證對象的生命周期管理锡凝,對有限的資源類型的對象(比如 HTTP 連接粘昨,數據庫連接,文件描述符等)非常適用。

OK张肾,看到這里我們發(fā)現 api.request() 實際上是通過 session.request() 來完成的芭析,那么問題來了:

session 是什么?

sessions

requests.sessions.Session

class Session(SessionRedirectMixin):
    """A Requests session.

    Provides cookie persistence, connection-pooling, and configuration.

    Basic Usage::

      >>> import requests
      >>> s = requests.Session()
      >>> s.get('http://httpbin.org/get')
      <Response [200]>

    Or as a context manager::

      >>> with requests.Session() as s:
      >>>     s.get('http://httpbin.org/get')
      <Response [200]>
    """

查閱文檔: http://docs.python-requests.org/en/master/user/advanced/

The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3’s connection pooling. So if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).

Session(會話):用于同一端到端多次通信下 TCP 連接復用吞瞪,提高性能

  • 保留參數信息 和 cookie
  • 利用 urllib3 的連接池
  • 可以為 request 對象提供默認數據

需要注意的是: 雖然 request 的調用最終都會由 session 來實現馁启,但是 request 層級上的參數信息是不會保留的?
(這里比較難理解,需要重新再查閱資料)

Section.request()

對 Session 這個類有了基本的理解后芍秆,進一步了解Session提供的request() 方法的具體實現:

requests.sessions.Session#request

    def request(self, method, url,
            params=None, data=None, headers=None, cookies=None, files=None,
            auth=None, timeout=None, allow_redirects=True, proxies=None,
            hooks=None, stream=None, verify=None, cert=None, json=None):
        """Constructs a :class:`Request <Request>`, prepares it and sends it.
        Returns :class:`Response <Response>` object.

        :param method: method for the new :class:`Request` object.
        :param url: URL for the new :class:`Request` object.
        :param params: (optional) Dictionary or bytes to be sent in the query
            string for the :class:`Request`.
        :param data: (optional) Dictionary, bytes, or file-like object to send
            in the body of the :class:`Request`.
        :param json: (optional) json to send in the body of the
            :class:`Request`.
        :param headers: (optional) Dictionary of HTTP Headers to send with the
            :class:`Request`.
        :param cookies: (optional) Dict or CookieJar object to send with the
            :class:`Request`.
        :param files: (optional) Dictionary of ``'filename': file-like-objects``
            for multipart encoding upload.
        :param auth: (optional) Auth tuple or callable to enable
            Basic/Digest/Custom HTTP Auth.
        :param timeout: (optional) How long to wait for the server to send
            data before giving up, as a float, or a :ref:`(connect timeout,
            read timeout) <timeouts>` tuple.
        :type timeout: float or tuple
        :param allow_redirects: (optional) Set to True by default.
        :type allow_redirects: bool
        :param proxies: (optional) Dictionary mapping protocol or protocol and
            hostname to the URL of the proxy.
        :param stream: (optional) whether to immediately download the response
            content. Defaults to ``False``.
        :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
        :param cert: (optional) if String, path to ssl client cert file (.pem).
            If Tuple, ('cert', 'key') pair.
        :rtype: requests.Response
        """
        # Create the Request.
        req = Request(
            method=method.upper(),
            url=url,
            headers=headers,
            files=files,
            data=data or {},
            json=json,
            params=params or {},
            auth=auth,
            cookies=cookies,
            hooks=hooks,
        )
        prep = self.prepare_request(req)

        proxies = proxies or {}

        settings = self.merge_environment_settings(
            prep.url, proxies, stream, verify, cert
        )

        # Send the request.
        send_kwargs = {
            'timeout': timeout,
            'allow_redirects': allow_redirects,
        }
        send_kwargs.update(settings)
        resp = self.send(prep, **send_kwargs)

        return resp

可以拆分為以下四個步驟:

  • 創(chuàng)建 Request 對象 request
  • 創(chuàng)建 prepare request 對象 prep
  • 發(fā)送 request send
  • send 返回值 response惯疙,返回給用戶

那么問題來了:

  1. 什么是 Request
  2. 什么是 prepare_request妖啥?
  3. 發(fā)送 request 的過程霉颠?
  4. resp 是什么?

requests.models.Request

class Request(RequestHooksMixin):
    """A user-created :class:`Request <Request>` object.

    Used to prepare a :class:`PreparedRequest <PreparedRequest>`, which is sent to the server.

    :param method: HTTP method to use.
    :param url: URL to send.
    :param headers: dictionary of headers to send.
    :param files: dictionary of {filename: fileobject} files to multipart upload.
    :param data: the body to attach to the request. If a dictionary is provided, form-encoding will take place.
    :param json: json for the body to attach to the request (if files or data is not specified).
    :param params: dictionary of URL parameters to append to the URL.
    :param auth: Auth handler or (user, pass) tuple.
    :param cookies: dictionary or CookieJar of cookies to attach to this request.
    :param hooks: dictionary of callback hooks, for internal usage.

    Usage::

      >>> import requests
      >>> req = requests.Request('GET', 'http://httpbin.org/get')
      >>> req.prepare()
      <PreparedRequest [GET]>
    """

根據用戶傳入的一系列傳輸構建的 request荆虱,用于準備真正傳送出去的 PreparedRequest

prepare_request()

    def prepare_request(self, request):
        """Constructs a :class:`PreparedRequest <PreparedRequest>` for
        transmission and returns it. The :class:`PreparedRequest` has settings
        merged from the :class:`Request <Request>` instance and those of the
        :class:`Session`.

        :param request: :class:`Request` instance to prepare with this
            session's settings.
        :rtype: requests.PreparedRequest
        """
        ...
        p = PreparedRequest()
        p.prepare(
            method=request.method.upper(),
            url=request.url,
            files=request.files,
            data=request.data,
            json=request.json,
            headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict),
            params=merge_setting(request.params, self.params),
            auth=merge_setting(auth, self.auth),
            cookies=merged_cookies,
            hooks=merge_hooks(request.hooks, self.hooks),
        )
        return p
  • 創(chuàng)建 PreparedRequest 對象 p
  • 調用 p.prepare() 然后返回 p

問題來了:

  1. PreparedRequest 是什么?
  2. p.prepare() 中發(fā)生了什么蒿偎?

PreparedRequest

requests.models.PreparedRequest

class PreparedRequest(RequestEncodingMixin, RequestHooksMixin):
    """The fully mutable :class:`PreparedRequest <PreparedRequest>` object,
    containing the exact bytes that will be sent to the server.

    Generated from either a :class:`Request <Request>` object or manually.

    Usage::

      >>> import requests
      >>> req = requests.Request('GET', 'http://httpbin.org/get')
      >>> r = req.prepare()
      <PreparedRequest [GET]>

      >>> s = requests.Session()
      >>> s.send(r)
      <Response [200]>
    """

包含實際傳輸的字節(jié),session 傳輸到 server 的實際對象怀读。

    def prepare(self,
            method=None, url=None, headers=None, files=None, data=None,
            params=None, auth=None, cookies=None, hooks=None, json=None):
        """Prepares the entire request with the given parameters."""

        self.prepare_method(method)
        self.prepare_url(url, params)
        self.prepare_headers(headers)
        self.prepare_cookies(cookies)
        self.prepare_body(data, files, json)
        self.prepare_auth(auth, url)

        # Note that prepare_auth must be last to enable authentication schemes
        # such as OAuth to work on a fully prepared request.

        # This MUST go after prepare_auth. Authenticators could add a hook
        self.prepare_hooks(hooks)

通過一系列的步驟, 完成了整個 request 的準備诉位。

send

準備好了要發(fā)送的對象后,調用Session.send() 發(fā)送到 server:

requests.sessions.Session#send

    def send(self, request, **kwargs):
        """Send a given PreparedRequest.

        :rtype: requests.Response
        """
        # Set defaults that the hooks can utilize to ensure they always have
        # the correct parameters to reproduce the previous request.
        ...  

        # Get the appropriate adapter to use
        adapter = self.get_adapter(url=request.url)

        # Start time (approximately) of the request
        start = preferred_clock()

        # Send the request
        r = adapter.send(request, **kwargs)

        # Total elapsed time of the request (approximately)
        elapsed = preferred_clock() - start
        r.elapsed = timedelta(seconds=elapsed)

        # Response manipulation hooks
        r = dispatch_hook('response', hooks, r, **kwargs)

        ...
        return r

在這里可以看到實際上 send 是通過 adapater 來實現的菜枷,有出現了新的問題:

  1. 為什么要用 adapter?
  2. 什么是 adapter?
  3. adapter 是怎么實現的不从?

transport adapters

查閱官方文檔: http://docs.python-requests.org/en/latest/user/advanced/?highlight=adapter

Transport Adapters provide a mechanism to define interaction methods for an HTTP service. In particular, they allow you to apply per-service configuration.
Requests ships with a single Transport Adapter, the HTTPAdapter. This adapter provides the default Requests interaction with HTTP and HTTPS using the powerful urllib3 library.
Requests enables users to create and use their own Transport Adapters that provide specific functionality.

  • 提供定義 HTTP 服務的通訊方法的機制
  • 默認使用 HTTPAdapter,基于 urllib3
  • 用戶可以自定義 Adapter犁跪,而 send 機制不需要做改變

這里對 Adapter 的使用可以說是非常好的抽象,面向接口編程的典范歹袁,可以通過這個例子進一步了解 Adapter 這個設計模式的知識坷衍。

HTTPAdapter

進一步通過默認的 HTTPAdapter 的代碼,了解一個 Adapter 的定義:

requests.adapters.HTTPAdapter

class HTTPAdapter(BaseAdapter):
    """The built-in HTTP Adapter for urllib3.

    Provides a general-case interface for Requests sessions to contact HTTP and
    HTTPS urls by implementing the Transport Adapter interface. This class will
    usually be created by the :class:`Session <Session>` class under the
    covers.

    :param pool_connections: The number of urllib3 connection pools to cache.
    :param pool_maxsize: The maximum number of connections to save in the pool.
    :param max_retries: The maximum number of retries each connection
        should attempt. Note, this applies only to failed DNS lookups, socket
        connections and connection timeouts, never to requests where data has
        made it to the server. By default, Requests does not retry failed
        connections. If you need granular control over the conditions under
        which we retry a request, import urllib3's ``Retry`` class and pass
        that instead.
    :param pool_block: Whether the connection pool should block for connections.

    Usage::

      >>> import requests
      >>> s = requests.Session()
      >>> a = requests.adapters.HTTPAdapter(max_retries=3)
      >>> s.mount('http://', a)
    """

HTTPAdapter.send()

    def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
        """Sends PreparedRequest object. Returns Response object.

        :param request: The :class:`PreparedRequest <PreparedRequest>` being sent.
        :param stream: (optional) Whether to stream the request content.
        :param timeout: (optional) How long to wait for the server to send
            data before giving up, as a float, or a :ref:`(connect timeout,
            read timeout) <timeouts>` tuple.
        :type timeout: float or tuple or urllib3 Timeout object
        :param verify: (optional) Either a boolean, in which case it controls whether
            we verify the server's TLS certificate, or a string, in which case it
            must be a path to a CA bundle to use
        :param cert: (optional) Any user-provided SSL certificate to be trusted.
        :param proxies: (optional) The proxies dictionary to apply to the request.
        :rtype: requests.Response
        """

        # Connection establish
        conn = self.get_connection(request.url, proxies)

        ...
        chunked = not (request.body is None or 'Content-Length' in request.headers)

        # Timeout mechanism
        ...

        try:
            if not chunked:
                resp = conn.urlopen(
                    ...
                )

            # Send the request.
            else:
                if hasattr(conn, 'proxy_pool'):
                    conn = conn.proxy_pool

                low_conn = conn._get_conn(timeout=DEFAULT_POOL_TIMEOUT)

                try:
                    low_conn.putrequest(request.method,
                                        url,
                                        skip_accept_encoding=True)

                    for header, value in request.headers.items():
                        low_conn.putheader(header, value)

                    low_conn.endheaders()

                    for i in request.body:
                        low_conn.send(hex(len(i))[2:].encode('utf-8'))
                        low_conn.send(b'\r\n')
                        low_conn.send(i)
                        low_conn.send(b'\r\n')
                    low_conn.send(b'0\r\n\r\n')

                    # Receive the response from the server
                    try:
                        # For Python 2.7+ versions, use buffering of HTTP
                        # responses
                        r = low_conn.getresponse(buffering=True)
                    except TypeError:
                        # For compatibility with Python 2.6 versions and back
                        r = low_conn.getresponse()

                    resp = HTTPResponse.from_httplib(
                        r,
                        pool=conn,
                        connection=low_conn,
                        preload_content=False,
                        decode_content=False
                    )
                except:
                    # If we hit any problems here, clean up the connection.
                    # Then, reraise so that we can handle the actual exception.
                    low_conn.close()
                    raise
                    
        # Exception Handling
        except (ProtocolError, socket.error) as err:
            raise ConnectionError(err, request=request)

        ....

        return self.build_response(request, resp)

略過連接建立条舔,超時機制枫耳,異常處理的部分,只看實際發(fā)送請求的部分:

  1. 從 urllib3 維護的 Connection Pool 中獲取連接
  2. 添加 request 主體 putrequest
  3. 添加 request 頭部 putheader
  4. 序列化request.body
  5. 發(fā)送 request
  6. 接受 response

最后通過調用 build_response 來基于 urllib3 response 構建 request.Respnse 對象返回給用戶孟抗,到此為止一次 requests.get() 動作便結束迁杨。

阻塞和非阻塞

閱讀官方文檔時看到有關 Blocking Or Non-Blocking? 的部分,摘錄如下:

With the default Transport Adapter in place, Requests does not provide any kind of non-blocking IO. The Response.content property will block until the entire response has been downloaded. If you require more granularity, the streaming features of the library (see Streaming Requests) allow you to retrieve smaller quantities of the response at a time. However, these calls will still block.
If you are concerned about the use of blocking IO, there are lots of projects out there that combine Requests with one of Python’s asynchronicity frameworks. Some excellent examples are requests-threads, grequests, and requests-futures.

requests 默認是阻塞的凄硼,當通過 requests 進行 IO 時延長的 同步 HTTP 請求時铅协,可以使用 grequests,基于 gevent 提供 協程調用 requests摊沉。

總結

通過以上的分析狐史,可以將一次requests.get()總結為以下的流程圖:

同時通過本次的學習,也可以感受到真正的開源不止是代碼,還得包括一系列的文檔和社區(qū)骏全,kennethreitz 大神同時還開源了教你如何寫 Pythonic 代碼的指引: The Hitchhiker’s Guide to Python!苍柏,另外 GitHub 上有開源閱讀 requests 源碼的筆記 read_requests 也可供參考。

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
  • 序言:七十年代末姜贡,一起剝皮案震驚了整個濱河市试吁,隨后出現的幾起案子,更是在濱河造成了極大的恐慌楼咳,老刑警劉巖熄捍,帶你破解...
    沈念sama閱讀 222,681評論 6 517
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現場離奇詭異爬橡,居然都是意外死亡治唤,警方通過查閱死者的電腦和手機,發(fā)現死者居然都...
    沈念sama閱讀 95,205評論 3 399
  • 文/潘曉璐 我一進店門糙申,熙熙樓的掌柜王于貴愁眉苦臉地迎上來宾添,“玉大人,你說我怎么就攤上這事柜裸÷粕拢” “怎么了?”我有些...
    開封第一講書人閱讀 169,421評論 0 362
  • 文/不壞的土叔 我叫張陵疙挺,是天一觀的道長扛邑。 經常有香客問我,道長铐然,這世上最難降的妖魔是什么蔬崩? 我笑而不...
    開封第一講書人閱讀 60,114評論 1 300
  • 正文 為了忘掉前任,我火速辦了婚禮搀暑,結果婚禮上沥阳,老公的妹妹穿的比我還像新娘。我一直安慰自己自点,他們只是感情好桐罕,可當我...
    茶點故事閱讀 69,116評論 6 398
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著桂敛,像睡著了一般功炮。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上术唬,一...
    開封第一講書人閱讀 52,713評論 1 312
  • 那天薪伏,我揣著相機與錄音,去河邊找鬼粗仓。 笑死毅该,一個胖子當著我的面吹牛博秫,可吹牛的內容都是我干的。 我是一名探鬼主播眶掌,決...
    沈念sama閱讀 41,170評論 3 422
  • 文/蒼蘭香墨 我猛地睜開眼挡育,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了朴爬?” 一聲冷哼從身側響起即寒,我...
    開封第一講書人閱讀 40,116評論 0 277
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎召噩,沒想到半個月后母赵,有當地人在樹林里發(fā)現了一具尸體,經...
    沈念sama閱讀 46,651評論 1 320
  • 正文 獨居荒郊野嶺守林人離奇死亡具滴,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 38,714評論 3 342
  • 正文 我和宋清朗相戀三年凹嘲,在試婚紗的時候發(fā)現自己被綠了。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片构韵。...
    茶點故事閱讀 40,865評論 1 353
  • 序言:一個原本活蹦亂跳的男人離奇死亡周蹭,死狀恐怖,靈堂內的尸體忽然破棺而出疲恢,到底是詐尸還是另有隱情凶朗,我是刑警寧澤,帶...
    沈念sama閱讀 36,527評論 5 351
  • 正文 年R本政府宣布显拳,位于F島的核電站棚愤,受9級特大地震影響,放射性物質發(fā)生泄漏杂数。R本人自食惡果不足惜宛畦,卻給世界環(huán)境...
    茶點故事閱讀 42,211評論 3 336
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望揍移。 院中可真熱鬧刃永,春花似錦、人聲如沸羊精。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,699評論 0 25
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽喧锦。三九已至,卻和暖如春抓督,著一層夾襖步出監(jiān)牢的瞬間燃少,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,814評論 1 274
  • 我被黑心中介騙來泰國打工铃在, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留阵具,地道東北人碍遍。 一個月前我還...
    沈念sama閱讀 49,299評論 3 379
  • 正文 我出身青樓,卻偏偏與公主長得像阳液,于是被迫代替她去往敵國和親怕敬。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 45,870評論 2 361

推薦閱讀更多精彩內容