beautiful soup 是一個(gè)很強(qiáng)大的軟件,我們可以用他來爬取網(wǎng)站上的一些信息,我們以尤果網(wǎng)的圖片為例,做了一個(gè)code
from bs4 import BeautifulSoup
import requests,shutil
for i in range(2,42):
c = requests.get('https://www.ugirls.com/Content/Page-{}.html'.format(i))
c_soup = BeautifulSoup(c.text)
for i in c_soup.findAll("a", {"class": "magazine_item_wrap"}):
girl_url = i['href']
cont_page = requests.get(girl_url)
cont_soup = BeautifulSoup(cont_page.text)
div = cont_soup.findAll("div", {"class": "yang auto"})
img = div[0].findAll('img')
for i in img[0:3]:
name=i['alt']
url = i ['src']
response = requests.get(url, stream=True)
with open('/home/ws/PycharmProjects/untitled/作業(yè)/code/picture/{}.jpg'.format(name), 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
- 首先需要導(dǎo)入BeautifulSoup,requests,shutil這三個(gè)模塊,下面會(huì)使用到
- 因?yàn)樵摼W(wǎng)站有42頁(yè),所以我們需要做一個(gè)小循環(huán)讓他從第二頁(yè)開始查詢
with open('/home/ws/PycharmProjects/untitled/作業(yè)/code/picture/{}.jpg'.format(name), 'wb') as out_file: shutil.copyfileobj(response.raw, out_file)```
這兒選擇一個(gè)儲(chǔ)存路徑,最好和你的代碼運(yùn)行的程序存在一塊,否則會(huì)報(bào)錯(cuò)的
好了,基本代碼就是這樣,比較簡(jiǎn)單,不怎么會(huì)描述
關(guān)于beautifulsoup 的使用,可以去這個(gè)網(wǎng)站看一看,之后你應(yīng)該會(huì)知道如何爬取照片了
https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html