- 學(xué)習(xí)get與post請求诲侮,嘗試使用requests或者是urllib用get方法向https://www.baidu.com/發(fā)出一個請求沥割,并將其返回結(jié)果輸出肮雨。
- 如果是斷開了網(wǎng)絡(luò)睹限,再發(fā)出申請啄寡,結(jié)果又是什么窍箍。了解申請返回的狀態(tài)碼串纺。
- 了解什么是請求頭丽旅,如何添加請求頭。
(學(xué)習(xí)博客地址:https://desmonday.github.io/2019/02/28/python%E7%88%AC%E8%99%AB%E5%AD%A6%E4%B9%A0-day1/#more)
1.GET與POST請求
1.1使用requests實現(xiàn)HTTP請求(推薦)
使用前需先安裝第三方requests庫纺棺。
get和post兩個函數(shù)的參數(shù)含義如下:
in:
help(requests.get)
help(requests.post)
out:
get(url, params=None, **kwargs)
Sends a GET request.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary, list of tuples or bytes to send
in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
post(url, data=None, json=None, **kwargs)
Sends a POST request.
:param url: URL for the new :class:`Request` object.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param json: (optional) json data to send in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
1.1.1一個請求與響應(yīng)模型實現(xiàn):
import requests
# get請求
r = requests.get('https://www.baidu.com')
print(r.content)
# post請求
postdata = {'key':'value'}
r = requests.post('https://www.baidu.com/login',data=postdata)
print(r.content)
輸出:1.1.2響應(yīng)與編碼
import requests
# get請求為例
r = requests.get('https://www.baidu.com')
print('content:')
print(r.content)
print('text:')
print(r.text)
print('encoding:')
print(r.encoding)
r.encoding = 'utf-8'
print('new text:')
print(r.text)
輸出:1.2使用urllib實現(xiàn)HTTP請求
Help on module urllib.request in urllib:
NAME
urllib.request - An extensible library for opening URLs using a variety of protocols
DESCRIPTION
The simplest way to use this module is to call the urlopen function,
which accepts a string containing a URL or a Request object (described
below). It opens the URL and returns the results as file-like
object; the returned object has some extra methods described below.
The OpenerDirector manages a collection of Handler objects that do
all the actual work. Each Handler implements a particular protocol or
option. The OpenerDirector is a composite object that invokes the
Handlers needed to open the requested URL. For example, the
HTTPHandler performs HTTP GET and POST requests and deals with
non-error returns. The HTTPRedirectHandler automatically deals with
HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
deals with digest authentication.
urlopen(url, data=None) -- Basic usage is the same as original
urllib. pass the url and optionally data to post to an HTTP URL, and
get a file-like object back. One difference is that you can also pass
a Request instance instead of URL. Raises a URLError (subclass of
OSError); for HTTP errors, raises an HTTPError, which can also be
treated as a valid response.
build_opener -- Function that creates a new OpenerDirector instance.
Will install the default handlers. Accepts one or more Handlers as
arguments, either instances or Handler classes that it will
instantiate. If one of the argument is a subclass of the default
handler, the argument will be installed instead of the default.
install_opener -- Installs a new opener as the default opener.
objects of interest:
OpenerDirector -- Sets up the User Agent as the Python-urllib client and manages
the Handler classes, while dealing with requests and responses.
Request -- An object that encapsulates the state of a request. The
state can be as simple as the URL. It can also include extra HTTP
headers, e.g. a User-Agent.
BaseHandler --
internals:
BaseHandler and parent
_call_chain conventions
Example usage:
import urllib.request
# set up authentication info
authinfo = urllib.request.HTTPBasicAuthHandler()
authinfo.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='geheim$parole')
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.python.org/')
1.2.1一個請求與響應(yīng)模型實現(xiàn)
import urllib.request
# get請求
f = urllib.request.urlopen('https://www.baidu.com')
firstline = f.readline() # 讀取html頁面的第一行
print(firstline)
# post請求
req = urllib.request.Request(url='https://www.baidu.com',
data=b'The first day of Web Crawler')
req_data = urllib.request.urlopen(req)
req = req_data.read()
print(req)
輸出:2.申請返回的狀態(tài)碼
2.1HTTP請求返回狀態(tài)碼詳解
2.2網(wǎng)絡(luò)斷開時發(fā)出申請的結(jié)果
使用requests請求為例榄笙。
import requests
# get請求
r = requests.get('https://www.baidu.com')
print(r.content)
# post請求
postdata = {'key':'value'}
r = requests.post('https://www.baidu.com/login',data=postdata)
print(r.content)
輸出:3.請求頭
3.1什么是請求頭
請求頭的作用,通俗來講祷蝌,就是能夠告訴被請求的服務(wù)器需要傳送什么樣的格式的信息茅撞。
(參考博客:常用的請求頭與響應(yīng)頭)
3.2如何添加請求頭
在爬蟲的時候,如果不添加請求頭巨朦,可能網(wǎng)站會阻止一個用戶的登陸米丘,此時我們就需要添加請求頭來進(jìn)行模擬偽裝,使用python添加請求頭方法如下(使用requests請求為例)糊啡。
import requests
# 請求頭
headers={"User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.1.6) ",
"Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language" : "en-us",
"Connection" : "keep-alive",
"Accept-Charset" : "GB2312,utf-8;q=0.7,*;q=0.7"}
r=requests.post("http://baike.baidu.com/item/哆啦A夢",headers=headers,allow_redirects=False) # allow_redirects設(shè)置為重定向
r.encoding='UTF-8'
print(r.url)
print(r.headers) #響應(yīng)頭
print(r.request.headers) #請求頭
輸出: