python3.6版本的urlparse模塊需要先引包,這個(gè)地方區(qū)別于python2.7
from urllib import parse
我發(fā)現(xiàn)需要獲取的圖片的鏈接形式是:
- /shtml/sxwb/20180608/images/b_page_01.jpg
這個(gè)路徑不是完整的呢撞,就沒辦法直接通過這個(gè)路徑訪問到圖片饰剥,這個(gè)時(shí)候就可以用parse模塊的urljoin函數(shù) - 定義: def urljoin(base,url,allow_fragments=Ture)
前提是我已經(jīng)抓取這個(gè)圖片對(duì)應(yīng)文章的鏈接即base = http://epaper.sxrb.com/shtml/sxwb/20180608/749257.shtml
url = /shtml/sxwb/20180608/images/b_page_01.jpg
img_url= parse.urljoin(base,url)
就可以得到圖片路徑的完整形式
輸出:
url=/shtml/sxwb/20180608/images/b_page_09.jpg
img_url =http://epaper.sxrb.com/shtml/sxwb/20180608/images/b_page_09.jpg