1.url(Uniform Resource Locator):叫做統(tǒng)一資源定位符,是互聯(lián)網(wǎng)上標(biāo)準(zhǔn)資源的地址桶癣,俗稱“網(wǎng)址”拥褂。
2.在python 3.x中已經(jīng)沒有了urllib2庫,只有urllib一個(gè)庫了牙寞。
3.url Encoding也叫做percent—encode饺鹃,即URL編碼也叫做百分號(hào)編碼莫秆。
? ? ?1.1.urlopen函數(shù)是常用的打開url方式悔详。
? ? ?1.2.用built_opener函數(shù)構(gòu)建opener來打開網(wǎng)頁時(shí)高級(jí)方式镊屎。
urllib.request.urlopen(url, data=None, [timeout,], cafile=None, capath=None, cadefault=False, context=None)
1.urllib.request 模塊用HTTP/1.1協(xié)議以及包括Connection:close的頭部在它的http請(qǐng)求中茄螃。
3.對(duì)于HTTP and HTTPS URLs,這個(gè)函數(shù)返回的是一個(gè)http.client.HTTPResponse對(duì)象(進(jìn)行了輕微的修飾)拼弃,該對(duì)象有如下方法:
- ? 該對(duì)象是類文件對(duì)象夏伊,類文件的方法都可以使用,(read吻氧,readline署海,fileno,close)
- ? geturl():返回請(qǐng)求的url
- ? getcode():返回響應(yīng)的http狀態(tài)碼医男,200表示請(qǐng)求成功得到響應(yīng)砸狞,404表示請(qǐng)求沒響應(yīng)
- ? info():返回httplib.HTTPMessage對(duì)象,表示遠(yuǎn)程服務(wù)器返回的頭部信息
>>>from urllib import parse
>>>url = r'https://docs.python.org/3.5/search.html?q=parse&check_keywords=yes&area=default'
>>>parseResult= parse.urlparse(url)
ParseResult(scheme='https', netloc='docs.python.org', path='/3.5/search.html', params='', query='q=parse&check_keywords=yes&area=default', fragment='')
urljoin(base, url, allow_fragments=True)
? ? ? ? Join a base URL and a possibly relative URL to form an absolute
? ? ? ? interpretation of the latter.
>>> unparsed_url=parse.urlunparse((scheme,netloc,path,'','',''))
>>> unparsed_url
>>> for mod in modlist:
-用來分析字符串形式的query請(qǐng)求。(Parse a query given as a string argument)
>>> param_dict=parse.parse_qs(parseResult.query)
>>> param_dict
>>> {'area': ['default'], 'check_keywords': ['yes'], 'q': ['parse']}
5.urlencode(query, doseq=False, safe='', encoding=None, errors=None, quote_via=<function quote_plus at 0x0365CC90>)
>>> from urllib import parse
>>> query={'name':'walker','age':99}
>>> parse.urlencode(query)
5.urllib.parse.quote(string, safe='/', encoding=None, errors=None)
>>>from urllib import parse
6.unquote(string, encoding='utf-8', errors='replace')
>>> parse.unquote_plus('1+2')
'1 2'
>>>from urlli import robotparser
? ? parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
? ? ? ? Parse a query given as a string argument.
? ? ? Arguments:
? ? ? ? qs: percent-encoded query string to be parsed
? ? ? ? keep_blank_values: flag indicating whether blank values in
? ? ? ? ? ? percent-encoded queries should be treated as blank strings.
? ? ? ? ? ? A true value indicates that blanks should be retained as
? ? ? ? ? ? blank strings. ?The default false value indicates that
? ? ? ? ? ? blank values are to be ignored and treated as if they were
? ? ? ? ? ? not included.
? ? ? ? strict_parsing: flag indicating what to do with parsing errors.
? ? ? ? ? ? If false (the default), errors are silently ignored.
? ? ? ? ? ? If true, errors raise a ValueError exception.
? ? ? ? encoding and errors: specify how to decode percent-encoded sequences
? ? ? ? ? ? into Unicode characters, as accepted by the bytes.decode() method.
? ? parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
? ? ? ? Parse a query given as a string argument.
? ? ? ? Arguments:
? ? ? ? qs: percent-encoded query string to be parsed
? ? ? ? keep_blank_values: flag indicating whether blank values in
? ? ? ? ? ? percent-encoded queries should be treated as blank strings. ?A
? ? ? ? ? ? true value indicates that blanks should be retained as blank
? ? ? ? ? ? strings. ?The default false value indicates that blank values
? ? ? ? ? ? are to be ignored and treated as if they were ?not included.
? ? ? ? strict_parsing: flag indicating what to do with parsing errors. If
? ? ? ? ? ? false (the default), errors are silently ignored. If true,
? ? ? ? ? ? errors raise a ValueError exception.
? ? ? ? encoding and errors: specify how to decode percent-encoded sequences
? ? ? ? ? ? into Unicode characters, as accepted by the bytes.decode() method.
? ? ? ? Returns a list, as G-d intended.
? ? quote(string, safe='/', encoding=None, errors=None)
? ? ? ? quote('abc def') -> 'abc%20def'
? ? ? ? Each part of a URL, e.g. the path info, the query, etc., has a
? ? ? ? different set of reserved characters that must be quoted.
? ? ? ? RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists
? ? ? ? the following reserved characters.
? ? ? ? reserved ? ?= ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
? ? ? ? ? ? ? ? ? ? ? "$" | ","
? ? ? ? Each of these characters is reserved in some component of a URL,
? ? ? ? but not necessarily in all of them.
? ? ? ? By default, the quote function is intended for quoting the path
? ? ? ? section of a URL. ?Thus, it will not encode '/'. ?This character
? ? ? ? is reserved, but in typical usage the quote function is being
? ? ? ? called on a path where the existing slash characters are used as
? ? ? ? reserved characters.
? ? ? ? string and safe may be either str or bytes objects. encoding and errors
? ? ? ? must not be specified if string is a bytes object.
? ? ? ? The optional encoding and errors parameters specify how to deal with
? ? ? ? non-ASCII characters, as accepted by the str.encode method.
? ? ? ? By default, encoding='utf-8' (characters are encoded with UTF-8), and
? ? ? ? errors='strict' (unsupported characters raise a UnicodeEncodeError).
? ? quote_from_bytes(bs, safe='/')
? ? ? ? Like quote(), but accepts a bytes object rather than a str, and does
? ? ? ? not perform string-to-bytes encoding. ?It always returns an ASCII string.
? ? ? ? quote_from_bytes(b'abc def?') -> 'abc%20def%3f'
? ? quote_plus(string, safe='', encoding=None, errors=None)
? ? ? ? Like quote(), but also replace ' ' with '+', as required for quoting
? ? ? ? HTML form values. Plus signs in the original string are escaped unless
? ? ? ? they are included in safe. It also does not have safe default to '/'.
? ? unquote(string, encoding='utf-8', errors='replace')
? ? ? ? Replace %xx escapes by their single-character equivalent. The optional
? ? ? ? encoding and errors parameters specify how to decode percent-encoded
? ? ? ? sequences into Unicode characters, as accepted by the bytes.decode()
? ? ? ? method.
? ? ? ? By default, percent-encoded sequences are decoded with UTF-8, and invalid
? ? ? ? sequences are replaced by a placeholder character.
? ? ? ? unquote('abc%20def') -> 'abc def'.
? ? unquote_plus(string, encoding='utf-8', errors='replace')
? ? ? ? Like unquote(), but also replace plus signs by spaces, as required for
? ? ? ? unquoting HTML form values.
? ? ? ? unquote_plus('%7e/abc+def') -> '~/abc def'
? ? unquote_to_bytes(string)
? ? ? ? unquote_to_bytes('abc%20def') -> b'abc def'.
? ? urldefrag(url)
? ? ? ? Removes any existing fragment from URL.
? ? ? ? Returns a tuple of the defragmented URL and the fragment. ?If
? ? ? ? the URL contained no fragments, the second element is the
? ? ? ? empty string.
? ? urlencode(query, doseq=False, safe='', encoding=None, errors=None, quote_via=<function quote_plus at 0x0365CC90>)
? ? ? ? Encode a dict or sequence of two-element tuples into a URL query string.
? ? ? ? If any values in the query arg are sequences and doseq is true, each
? ? ? ? sequence element is converted to a separate parameter.
? ? ? ? If the query arg is a sequence of two-element tuples, the order of the
? ? ? ? parameters in the output will match the order of parameters in the
? ? ? ? input.
? ? ? ? The components of a query arg may each be either a string or a bytes type.
? ? ? ? The safe, encoding, and errors parameters are passed down to the function
? ? ? ? specified by quote_via (encoding and errors only if a component is a str).
? ? urljoin(base, url, allow_fragments=True)
? ? ? ? Join a base URL and a possibly relative URL to form an absolute
? ? ? ? interpretation of the latter.
? ? urlparse(url, scheme='', allow_fragments=True)
? ? ? ? Parse a URL into 6 components:
? ? ? ? <scheme>://<netloc>/<path>;<params>?<query>#<fragment>
? ? ? ? Return a 6-tuple: (scheme, netloc, path, params, query, fragment).
? ? ? ? Note that we don't break the components up in smaller bits
? ? ? ? (e.g. netloc is a single string) and we don't expand % escapes.
? ? urlsplit(url, scheme='', allow_fragments=True)
? ? ? ? Parse a URL into 5 components:
? ? ? ? <scheme>://<netloc>/<path>?<query>#<fragment>
? ? ? ? Return a 5-tuple: (scheme, netloc, path, query, fragment).
? ? ? ? Note that we don't break the components up in smaller bits
? ? ? ? (e.g. netloc is a single string) and we don't expand % escapes.
? ? urlunparse(components)
? ? ? ? Put a parsed URL back together again. ?This may result in a
? ? ? ? slightly different, but equivalent URL, if the URL that was parsed
? ? ? ? originally had redundant delimiters, e.g. a ? with an empty query
? ? ? ? (the draft states that these are equivalent).
? ? urlunsplit(components)
? ? ? ? Combine the elements of a tuple as returned by urlsplit() into a
? ? ? ? complete URL as a string. The data argument can be any five-item iterable.
? ? ? ? This may result in a slightly different, but equivalent URL, if the URL that
? ? ? ? was parsed originally had unnecessary delimiters (for example, a ? with an
? ? ? ? empty query; the RFC states that these are equivalent).
? ? __all__ = ['urlparse', 'urlunparse', 'urljoin', 'urldefrag', 'urlsplit...
