urlopen(url, data, timeout)
第一個(gè)參數(shù)url即為URL朱浴,第二個(gè)參數(shù)data是訪(fǎng)問(wèn)URL時(shí)要傳送的數(shù)據(jù),第三個(gè)timeout是設(shè)置超時(shí)時(shí)間
import urllib2
response = urllib2.urlopen("http://www.baidu.com")
print response.read()
推薦寫(xiě)法(因?yàn)樵跇?gòu)造請(qǐng)求時(shí)還需要加入好多內(nèi)容趋距,通過(guò)構(gòu)建一個(gè)request月劈,服務(wù)器響應(yīng)請(qǐng)求得到應(yīng)答)
import urllib2
request = urllib2.Request("http://www.baidu.com")
response = urllib2.urlopen(request)
print response.read()
POST
import urlilb
import urllib2
values = {"username": "916859921@qq.com", "password":"XXX"}
data = urllib.urlencode(values)
url = "https://passport.csdn.net/account/login?from=http://my.csdn.net/my/mycsdn"
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
print response.read()
GET
import urlilb
import urllib2
values = {"username": "916859921@qq.com", "password":"XXX"}
data = urllib.urlencode(values)
url = "http://passport.csdn.net/account/login"
geturl = url + "?" + data
request = urllib2.Request(geturl, data)
response = urllib2.urlopen(request)
print response.read()
Headers
import urllib
import urllib2
url = "http://www.server.com/login"
user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
values = {"username":"916859921@qq.com", "password":"XXX"}
headers = {"User-Agent": user_agent}
data = urllib.urlencode(values)
request = urllib2.Reqeust(url, data, headers)
response = urllib2.urlopen(request)
page = response.read()
反盜鏈:對(duì)付防盜鏈呜袁,服務(wù)器會(huì)識(shí)別headers中的referer是不是它自己敌买,如果不是,有的服務(wù)器不會(huì)響應(yīng)阶界,在headers中加入referer來(lái)應(yīng)付防盜鏈
headers = {
"User-Agent":Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36,
"Referer":"http://www.zhihu.com/articles"
}
Proxy
加入一個(gè)網(wǎng)站它會(huì)檢測(cè)某一段時(shí)間某個(gè)IP的訪(fǎng)問(wèn)次數(shù)虹钮,如果訪(fǎng)問(wèn)次數(shù)過(guò)多,它會(huì)禁止你的訪(fǎng)問(wèn)膘融。解決辦法:設(shè)置一些代理服務(wù)器來(lái)每隔一段時(shí)間換一個(gè)代理芙粱。
import urllib2
enable_proxy = True
proxy_handler = urllib2.ProxyHandler({"http": "http://some-proxy.com:8080"})
null_proxy_handler = urllib2.ProxyHandler({})
if enable_proxy:
opener = urllib2.build_opener(proxy_handler)
else:
opener = urllib2.build_opener(null_proxy_handler)
urllib2.install_opener(opener)
Timeout
import urllib2
response = urllib2.urlopen("http://www.baidu.com", timeout = 10)
response = urllib2.urlopen("http://www.baidu.com", data, 10)
URLError
import urllib2
request = urllib2.Request("http://www.xxxxx.com")
try:
urllib2.urlopen(request)
except urllib2.URLError, e:
print e.reason
# [Errno 11004] getaddrinfo failed
HTTPError
HTTPError是URLError的子類(lèi)
import urllib2
request = urllib2.Request("http://blog.csdn.net/cqcre")
try:
urllib2.urlopen(request)
except urllib2.HTTPError, e:
print e.code
print e.reason
import urllib2
request = urllib2.Request("http://blog.csdn.net/cqcre")
try:
urllib2.urlopen(request)
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.reason
else:
print "OK"