看圖說話。附帶源碼岂座。?
感興趣的可以拷過去肪凛,然后修改“print(alt +" : " + url)”, 想干什么干什么堰汉。。(●>?<●)
import urllib.request# 用于發(fā)送http請(qǐng)求伟墙,獲取網(wǎng)頁信息
from bs4import BeautifulSoup# 用于解析網(wǎng)頁
import time# 用于控制訪問速度
# 1翘鸭、獲取網(wǎng)頁,分析想獲取的數(shù)據(jù)規(guī)則
# 2戳葵、通過如上規(guī)則就乓,使用BeautifulSoup批量獲取
# 3、通過如上規(guī)則拱烁,遍歷整個(gè)網(wǎng)站的每一個(gè)頁面
urlPreFix ="https://www.sex.com/"
targetUrl = urlPreFix
# 用于提取資源目標(biāo)url
def geturls(target):
target = BeautifulSoup(target, 'html.parser')
for imgin target.find_all('img', src='/images/t.png'):
alt = img.attrs['alt']
url = img.attrs['data-src']
print(alt +" : " + url)
counter =1
while counter <57:
print('當(dāng)前執(zhí)行URL:' + targetUrl)
html = urllib.request.urlopen(targetUrl)
geturls(html)
counter +=1
? ? targetUrl = urlPreFix +"/?page=" +str(counter)
print('休眠5秒...')
time.sleep(5)