socket套接字模塊server/client操作
如果想用Python做一個服務(wù)器端和客戶端的通信程序碉输,那么就一定要選擇標(biāo)準(zhǔn)庫中的 scoket 套接字模塊视搏,它支持多種網(wǎng)絡(luò)協(xié)議:TCP/IP、ICMP/IP夕玩、UDP/IP等坤按。
- 在網(wǎng)絡(luò)中一個最基本的組件就是套接字(socket),它的功能是在2個機(jī)器或進(jìn)程之間建立信息的通道盲厌。
- socket包括2個套接字署照,一個是服務(wù)器端(server),一個是客戶端(client)吗浩。在一個程序中創(chuàng)建服務(wù)器端的套接字建芙,讓它等客戶端的連接,這樣它就在這個IP和端口處懂扼,監(jiān)聽禁荸。
- 處理Client端套接字通常比處理服務(wù)器端套接字相對容易一些,因?yàn)榉?wù)器端還要準(zhǔn)備隨時處理客戶端的連接阀湿,同時還要處理多個連接任務(wù)赶熟。
- 而客戶端只需要簡單的設(shè)置好IP和端口就可以完成任務(wù)了。
socket套接字有2個方法陷嘴,一個是send映砖,另一個是recv,它們用來傳輸數(shù)據(jù)信息灾挨。
可以用字符串參數(shù)調(diào)用send方法發(fā)送數(shù)據(jù)邑退,用一個所需的最大字節(jié)數(shù)做參數(shù)調(diào)用recv方法來接收數(shù)據(jù)。
Socket套接字模塊的信息劳澄,可以參考Python官網(wǎng)的標(biāo)準(zhǔn)庫socket — Low-level networking interface
Docstring:
This module provides socket operations and some related functions.
On Unix, it supports IP (Internet Protocol) and Unix domain sockets.
On other systems, it only supports IP. Functions specific for a
socket are available as methods of the socket object.
socket.socket()函數(shù)用來創(chuàng)建套接字
import socket
# 創(chuàng)建一個TCP/IP的套接字
socket.socket(socket.AF_INET,socket.SOCK_STREAM)
# 創(chuàng)建一個UDP/IP的套接字
socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
<socket.socket fd=388, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0>
套接字對象常用函數(shù)
函數(shù)名 | 說明 |
---|---|
socket() |
根據(jù)TCP/UDP協(xié)議類型創(chuàng)建socket(套接字編程接口)(create a new socket object) |
connect() |
主動方向被動方建立連接地技、會發(fā)起TCP三次握手(主動初始化TCP服務(wù)器連接,出錯時拋出異常) |
bind() |
綁定地址(主機(jī)和端口)綁定到套接字(TCP/UDP) |
listen() |
將套接字轉(zhuǎn)換為被動套接字秒拔,并制定監(jiān)聽TCP的最大連接輸 |
accept() |
用于返回下一個已完成連接TCP服務(wù)器,被動接受TCP客戶的連接莫矗,阻塞式(accept會阻塞程序,運(yùn)行到這里會掛起) |
connect_ex() |
在connect的基礎(chǔ)上增加了出錯返回錯誤碼 |
recv() |
接收TCP數(shù)據(jù) |
send() |
發(fā)送TCP數(shù)據(jù) |
sendall() |
完整發(fā)送TCP數(shù)據(jù) |
recvfrom() |
接收UDP數(shù)據(jù) |
sendto() |
發(fā)送UDP數(shù)據(jù) |
socketpair() |
create a pair of new socket objects [* ] |
fromfd() |
create a socket object from an open file descriptor [* ] |
fromshare() |
create a socket object from data received from socket.share() [* ] |
gethostname() |
return the current hostname |
gethostbyname() |
map a hostname to its IP number |
gethostbyaddr() |
map an IP number or hostname to DNS info |
getservbyname() |
map a service name and a protocol name to a port number |
getprotobyname() |
map a protocol name (e.g. 'tcp' ) to a number |
ntohs(), ntohl() |
convert 16, 32 bit int from network to host byte order |
htons(), htonl() |
convert 16, 32 bit int from host to network byte order |
inet_aton() |
convert IP addr string (123.45.67.89) to 32-bit packed format |
inet_ntoa() |
convert 32-bit packed format IP to string (123.45.67.89) |
socket.getdefaulttimeout() |
get the default timeout value |
socket.setdefaulttimeout() |
set the default timeout value |
create_connection() |
connects to an address, with an optional timeout and optional source address |
端口是指接口電路中的一些寄存器,這些寄存器分別用來存放數(shù)據(jù)信息趣苏、控制信息和狀態(tài)信息狡相,相應(yīng)的端口分別稱為數(shù)據(jù)端口、控制端口和狀態(tài)端口食磕。
查看自己端口的方法如下:
- 切換到桌面尽棕,按Win+X組合鍵,選擇“命令提示符(管理員)”命令彬伦;
- 如果只是選擇了“命令提示符”命令滔悉,則后面的操作可能會出現(xiàn)錯誤;
- 打開DOS窗口后单绑,一般我們會先輸入“netstat”命令查看簡單的統(tǒng)計信息回官,其中冒號后面的是端口信息;
- 輸入“netstat -nao”命令時可以在最右列顯示PID進(jìn)程序號搂橙,以便用命令直接結(jié)束程序歉提;
- 輸入“netstat -nab”命令可以網(wǎng)絡(luò)連接、端口占用和程序運(yùn)行的詳細(xì)信息区转;
- 發(fā)現(xiàn)這些異常的端口和程序后可以先結(jié)束進(jìn)程樹了苔巨,并進(jìn)一步進(jìn)行其它詳細(xì)操作:
- 而如果你需要具體的監(jiān)視和管控端口使用的話,則需要用到第三方軟件了废离,這類如聚生網(wǎng)管等軟件可以直觀侄泽、快速的實(shí)現(xiàn)端口監(jiān)視和管控。
# socket 套接字的服務(wù)器端寫法
import socket
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM) # 定義TCP的socket
sock.bind(('localhost',13014)) # 綁定端口
sock.listen(5) # 設(shè)置監(jiān)聽
while True:
connection,address = sock.accept() # 獲取地址
print('client ip is ') # 打印地址
print(address)
try:
connection.settimeout(5) # 設(shè)置超時時間
buf = connection.recv(1024) # 設(shè)置接收緩存
if buf == '1':
connection.send('welcome to python server!') # 發(fā)送數(shù)據(jù)
else:
connection.send('please go out!') # 發(fā)送數(shù)據(jù)
except sock.timeout:
print('time out')
connection.close()
# socket 套接字的客戶端寫法
import socket
import time
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM) # 定義TCP的socket
sock.connect(('localhost',13014)) # 設(shè)置連接地址端口
time.sleep(2) #休眠
sock.send('1') # 發(fā)送數(shù)據(jù)
print(sock.recv(1024)) #打印緩存信息
sock.close()
urllib模塊
在Python 3以后的版本中蜻韭,urllib2這個模塊已經(jīng)不單獨(dú)存在(也就是說當(dāng)你import urllib2時悼尾,系統(tǒng)提示你沒這個模塊),urllib2被合并到了urllib中肖方,叫做urllib.request 和 urllib.error闺魏。
urllib整個模塊分為urllib.request, urllib.parse, urllib.error,urllib.response
例:
- 其中
urllib2.urlopen()
變成了urllib.request.urlopen()
-
urllib2.Request()
變成了urllib.request.Request()
urllib.request
Type: module
Docstring:
An extensible library for opening URLs using a variety of protocols
The simplest way to use this module is to call the urlopen function,
which accepts a string containing a URL or a Request object (described
below). It opens the URL and returns the results as file-like
object; the returned object has some extra methods described below.
The OpenerDirector manages a collection of Handler objects that do
all the actual work. Each Handler implements a particular protocol or
option. The OpenerDirector is a composite object that invokes the
Handlers needed to open the requested URL. For example, the
HTTPHandler performs HTTP GET and POST requests and deals with
non-error returns. The HTTPRedirectHandler automatically deals with
HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
deals with digest authentication.
urlopen(url, data=None) -- Basic usage is the same as original
urllib. pass the url and optionally data to post to an HTTP URL, and
get a file-like object back. One difference is that you can also pass
a Request instance instead of URL. Raises a URLError (subclass of
OSError); for HTTP errors, raises an HTTPError, which can also be
treated as a valid response.
build_opener -- Function that creates a new OpenerDirector instance.
Will install the default handlers. Accepts one or more Handlers as
arguments, either instances or Handler classes that it will
instantiate. If one of the argument is a subclass of the default
handler, the argument will be installed instead of the default.
install_opener -- Installs a new opener as the default opener.
objects of interest:
OpenerDirector -- Sets up the User Agent as the Python-urllib client and manages
the Handler classes, while dealing with requests and responses.
Request -- An object that encapsulates the state of a request. The
state can be as simple as the URL. It can also include extra HTTP
headers, e.g. a User-Agent.
BaseHandler --
internals:
BaseHandler and parent
_call_chain conventions
Example usage:
import urllib.request
# set up authentication info
authinfo = urllib.request.HTTPBasicAuthHandler()
authinfo.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='geheim$parole')
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.python.org/')
urllib.request.urlopen
Signature: urllib.request.urlopen(url, data=None, timeout=<object object at 0x000002BE3FA59760>, *, cafile=None, capath=None, cadefault=False, context=None)
Docstring:
Open the URL url, which can be either a string or a Request object.
*data* must be an object specifying additional data to be sent to
the server, or None if no such data is needed. See Request for
details.
urllib.request module uses HTTP/1.1 and includes a "Connection:close"
header in its HTTP requests.
The optional *timeout* parameter specifies a timeout in seconds for
blocking operations like the connection attempt (if not specified, the
global default timeout setting will be used). This only works for HTTP,
HTTPS and FTP connections.
If *context* is specified, it must be a ssl.SSLContext instance describing
the various SSL options. See HTTPSConnection for more details.
The optional *cafile* and *capath* parameters specify a set of trusted CA
certificates for HTTPS requests. cafile should point to a single file
containing a bundle of CA certificates, whereas capath should point to a
directory of hashed certificate files. More information can be found in
ssl.SSLContext.load_verify_locations().
The *cadefault* parameter is ignored.
This function always returns an object which can work as a context
manager and has methods such as
* geturl() - return the URL of the resource retrieved, commonly used to
determine if a redirect was followed
* info() - return the meta-information of the page, such as headers, in the
form of an email.message_from_string() instance (see Quick Reference to
HTTP Headers)
* getcode() - return the HTTP status code of the response. Raises URLError
on errors.
For HTTP and HTTPS URLs, this function returns a http.client.HTTPResponse
object slightly modified. In addition to the three new methods above, the
msg attribute contains the same information as the reason attribute ---
the reason phrase returned by the server --- instead of the response
headers as it is specified in the documentation for HTTPResponse.
For FTP, file, and data URLs and requests explicitly handled by legacy
URLopener and FancyURLopener classes, this function returns a
urllib.response.addinfourl object.
Note that None may be returned if no handler handles the request (though
the default installed global OpenerDirector uses UnknownHandler to ensure
this never happens).
In addition, if proxy settings are detected (for example, when a *_proxy
environment variable like http_proxy is set), ProxyHandler is default
installed and makes sure the requests are handled through the proxy.
Type: function
urlopen
創(chuàng)建一個表示遠(yuǎn)程url
的類文件對象
,然后像本地文件一樣操作這個類文件對象來獲取遠(yuǎn)程數(shù)據(jù)窥妇。
- 參數(shù)
url
表示遠(yuǎn)程數(shù)據(jù)的路徑舷胜,一般是網(wǎng)址。如果要執(zhí)行更加復(fù)雜的操作活翩,如修改HTTP報頭,可創(chuàng)建Request實(shí)例
并當(dāng)為url參數(shù)使用翻伺; - 參數(shù)
data
表示以post
方式提交到url
的數(shù)據(jù)材泄,需要經(jīng)過URL
編碼; -
timeout
是可選的超時選項(xiàng)吨岭。
返回值:一個類似文件對象的對象(file_like object)
該對象擁有方法:
read() , readline() , readlines() , fileno() , close() :這些方法的使用方式與文件對象完全一樣
方法 | 描述 |
---|---|
read([bytes]) |
從文件對象中讀出所有或bytes個字節(jié) |
readline() |
以字節(jié)字符串形式讀取單行文本 |
readlines() |
讀取所有輸入行并返回列表 |
fileno() |
返回整數(shù)文件描述符 |
close() |
關(guān)閉連接 |
info() |
返回的mimetools.Message映射對象,表示遠(yuǎn)程服務(wù)器返回的頭信息拉宗。 |
geturl() |
返回真實(shí)的URL(之所以稱為真實(shí),是因?yàn)閷τ谀承┲囟ㄏ虻腢RL,將返回被重定后的URL) |
getcode() |
返回整數(shù)形式的HTTP響應(yīng)代碼 |
抓取html網(wǎng)頁
import urllib
response=urllib.request.urlopen('http://www.cnblogs.com/linxiyue/p/3537486.html')
response.getcode()
200
response.geturl()
'http://www.cnblogs.com/linxiyue/p/3537486.html'
urllib.request.urlretrieve
Signature: urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)
Docstring:
Retrieve a URL into a temporary location on disk.
Requires a URL argument. If a filename is passed, it is used as
the temporary file location. The reporthook argument should be
a callable that accepts a block number, a read size, and the
total file size of the URL target. The data argument should be
valid URL encoded data.
If a filename is passed and the URL points to a local resource,
the result is a copy from local file to new file.
Returns a tuple containing the path to the newly created
data file as well as the resulting HTTPMessage object.
Type: function
參數(shù):
- finename: 指定了保存本地路徑(如果參數(shù)未指定旦事,urllib會生成一個臨時文件保存數(shù)據(jù)魁巩。)
- reporthook: 是一個回調(diào)函數(shù),當(dāng)連接上服務(wù)器姐浮、以及相應(yīng)的數(shù)據(jù)塊傳輸完畢時會觸發(fā)該回調(diào)谷遂,我們可以利用這個回調(diào)函數(shù)來顯示當(dāng)前的下載進(jìn)度。
- data: 指 post 到服務(wù)器的數(shù)據(jù)卖鲤,該方法返回一個包含兩個元素的(filename, headers)元組肾扰,filename 表示保存到本地的路徑,header 表示服務(wù)器的響應(yīng)頭蛋逾。
示例:
將 google 的 html 抓取到本地集晚,保存在 E://bing_images.html文件中,同時顯示下載的進(jìn)度区匣。
import urllib
url = 'https://cn.bing.com/images/trending?form=Z9LH'
local = 'e://bing_images.html'
urllib.request.urlretrieve(url, local)
('e://bing_images.html', <http.client.HTTPMessage at 0x1e788975940>)
# 下面是 urlretrieve() 下載文件實(shí)例偷拔,可以顯示下載進(jìn)度
import urllib
import os
def Schedule(a,b,c):
'''
回調(diào)函數(shù)
@a: 已經(jīng)下載的數(shù)據(jù)塊
@b:數(shù)據(jù)塊大小
@c:遠(yuǎn)程文件大小
'''
per = 100.0 * a * b / c
if per > 100:
per = 100
print('%.2f%%' % per)
url = 'http://pic.7y7.com/Uploads/Former/20154/2015040338924433_0_0_water.jpg'
local = os.path.join(r'E:\圖片','water.jpg')
urllib.request.urlretrieve(url,local,Schedule)
0.00%
5.64%
11.28%
16.92%
22.56%
28.20%
33.84%
39.48%
45.12%
50.76%
56.40%
62.05%
67.69%
73.33%
78.97%
84.61%
90.25%
95.89%
100.00%
('E:\\圖片\\water.jpg', <http.client.HTTPMessage at 0x1e7882a0048>)
通過上面的練習(xí)可以知道,urlopen()
可以輕松獲取遠(yuǎn)端 html 頁面信息亏钩,然后通過 python 正則對所需要的數(shù)據(jù)進(jìn)行分析莲绰,匹配出想要用的數(shù)據(jù),在利用urlretrieve()
將數(shù)據(jù)下載到本地铸屉。對于訪問受限或者對連接數(shù)有限制的遠(yuǎn)程 url 地址可以采用proxies
(代理的方式)連接钉蒲,如果遠(yuǎn)程數(shù)據(jù)量過大,單線程下載太慢的話可以采用多線程下載彻坛,這個就是傳說中的爬蟲顷啼。
urllib.request.quote
Signature: urllib.request.quote(string, safe='/', encoding=None, errors=None)
Docstring:
quote('abc def') -> 'abc%20def'
Each part of a URL, e.g. the path info, the query, etc., has a
different set of reserved characters that must be quoted.
RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists
the following reserved characters.
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
Each of these characters is reserved in some component of a URL,
but not necessarily in all of them.
By default, the quote function is intended for quoting the path
section of a URL. Thus, it will not encode '/'. This character
is reserved, but in typical usage the quote function is being
called on a path where the existing slash characters are used as
reserved characters.
string and safe may be either str or bytes objects. encoding and errors
must not be specified if string is a bytes object.
The optional encoding and errors parameters specify how to deal with
non-ASCII characters, as accepted by the str.encode method.
By default, encoding='utf-8' (characters are encoded with UTF-8), and
errors='strict' (unsupported characters raise a UnicodeEncodeError).
Type: function
urllib.request.unquote
Signature: urllib.request.unquote(string, encoding='utf-8', errors='replace')
Docstring:
Replace %xx escapes by their single-character equivalent. The optional
encoding and errors parameters specify how to decode percent-encoded
sequences into Unicode characters, as accepted by the bytes.decode()
method.
By default, percent-encoded sequences are decoded with UTF-8, and invalid
sequences are replaced by a placeholder character.
unquote('abc%20def') -> 'abc def'.
Type: function
urllib.request.urlopen(url).read().decode
Signature: resp.decode(encoding='utf-8', errors='strict')
Docstring:
Decode the bytes using the codec registered for encoding.
encoding
The encoding with which to decode the bytes.
errors
The error handling scheme to use for the handling of decoding errors.
The default is 'strict' meaning that decoding errors raise a
UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registered with codecs.register_error that
can handle UnicodeDecodeErrors.
Type: builtin_function_or_method
urllib.parse.urlencode
Signature: urllib.parse.urlencode(query, doseq=False, safe='', encoding=None, errors=None, quote_via=<function quote_plus at 0x000002BE403499D8>)
Docstring:
Encode a dict or sequence of two-element tuples into a URL query string.
If any values in the query arg are sequences and doseq is true, each
sequence element is converted to a separate parameter.
If the query arg is a sequence of two-element tuples, the order of the
parameters in the output will match the order of parameters in the
input.
The components of a query arg may each be either a string or a bytes type.
The safe, encoding, and errors parameters are passed down to the function
specified by quote_via (encoding and errors only if a component is a str).
Type: function
import urllib
try:
web = urllib.request.urlopen('http://www.python.org/')
resp = web.read()
except HTTPError as e:
resp = e.read()
web
<http.client.HTTPResponse at 0x2be438c1e48>
代理
import urllib
enable_proxy = True
# 定義指定和非指定的代理ProxyHandler對象
proxy_handler = urllib.request.ProxyHandler({'http':'http://代理:8080'})
null_proxy_handler = urllib.request.ProxyHandler({})
# 通過代理對象定義opner對象,if語句判斷是否打開代理
if enable_proxy:
opener = urllib.request.build_opener(proxy_handler)
else:
opener = urllib.request.build_opener(null_proxy_handler)
# 安裝opener對象作為urlopen的全局opener
urllib.request.install_opener(opener)
UDP編程 & TCP 編程
參考Python程序設(shè)計與實(shí)現(xiàn)