1,基礎(chǔ)知識(shí)
- 獲取網(wǎng)頁(yè)中的動(dòng)態(tài)數(shù)據(jù)
2畜眨, 自己動(dòng)手寫(xiě)程序
from bs4 import BeautifulSoup
import requests, urllib, time
#proxies = {"http": "http://139.162.8.118"}
proxy_support = urllib.request.ProxyHandler({'http': '127.0.0.1:8787'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
def get_page(url, page, data = None):
wb_data = requests.get(url)
soup = BeautifulSoup(wb_data.text, 'lxml')
imgs = soup.select('img.entry-thumbnail')
if data == None:
for i in range(len(imgs)):
img_link = imgs[i]
img = img_link.get('src')
str_num = str(i)
print(img)
#download(img, str_num)
pic = urllib.request.urlopen(img)
name = '/Users/aipengya/Downloads/pictures_test/' + '(' + str(page) + ')' + str_num + '.jpg'
f = open(name, 'wb')
f.write(pic.read())
f.close()
print("Done!")
def get_more_pages(start, end):
for page in range(start, end):
url = 'http://weheartit.com/inspirations/taylorswift?scrolling=true&page={}'.format(page)
get_page(url, page)
time.sleep(2)
#def download(img, str_num):
# file_name = path + str_num + '.jpg'
# img_data = urllib.request.urlopen(img).read()
# f = open(file_name, 'wb').write(img_data)
# f.close()
get_more_pages(1, 3)
#main-container > div > div > div > div > div > a > img
3昼牛, 反思與總結(jié)
- 代碼中也要設(shè)代理,雖然目前還不清楚代理應(yīng)該怎么設(shè)康聂。
最后編輯于 :
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者