1毛嫉,基礎(chǔ)知識
使用BeautifulSoup解析網(wǎng)頁:
步驟:
- Step1:解析網(wǎng)頁
BeautifulSoup(html, 'lxml')
- Step2:描述要爬取得東西在哪
Soup.select( )
- Step3:從標簽中獲取需要的信息
Soup.select(???)
2,自己動手寫程序
-The Result:
Paste_Image.png
-The Code:
from bs4 import BeautifulSoup
path = '/Users/huoqi/Documents/pythonlearn/combating/week1/1_2/homework1_2/1_2_homework_required/index.html'
with open(path, 'r') as wb_data:
#print(wb_data)
Soup = BeautifulSoup(wb_data, 'lxml')
#print(Soup)
images = Soup.select('body > div > div > div.col-md-9 > div > div > div > img')
titles = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')
prices = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')
views = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')
stars = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)')
#print(images, titles, prices, views, stars)
for image, title, price, view, star in zip(images, titles, prices, views, stars):
data = {
'image' : image.get('src'),
'title' : title.get_text(),
'price' : price.get_text(),
'view' : view.get_text(),
'star' : len(star.find_all('span', class_= "glyphicon glyphicon-star"))
}
print(data)
3赋兵,反思與總結(jié)
- len()函數(shù)可以返回列表元素的個數(shù)。
- 使用copy selector選出來的路徑要多比較冰木。
- 路徑的修改問題尚未明白懦冰,現(xiàn)在仍在思考。
KEEP FIGHTING羡洁!