測(cè)試結(jié)果:
{'review_number': '65 reviews', 'star': 5, 'title': 'EarPod', 'image': 'img/pic_0000_073a9256d9624c92a05dc680fc28865f.jpg', 'price': '$24.99'}
{'review_number': '12 reviews', 'star': 4, 'title': 'New Pocket', 'image': 'img/pic_0005_828148335519990171_c234285520ff.jpg', 'price': '$64.99'}
{'review_number': '31 reviews', 'star': 4, 'title': 'New sunglasses', 'image': 'img/pic_0006_949802399717918904_339a16e02268.jpg', 'price': '$74.99'}
{'review_number': '6 reviews', 'star': 3, 'title': 'Art Cup', 'image': 'img/pic_0008_975641865984412951_ade7a767cfc8.jpg', 'price': '$84.99'}
{'review_number': '18 reviews', 'star': 4, 'title': 'iphone gamepad', 'image': 'img/pic_0001_160243060888837960_1c3bcd26f5fe.jpg', 'price': '$94.99'}
{'review_number': '18 reviews', 'star': 4, 'title': 'Best Bed', 'image': 'img/pic_0002_556261037783915561_bf22b24b9e4e.jpg', 'price': '$214.5'}
{'review_number': '35 reviews', 'star': 4, 'title': 'iWatch', 'image': 'img/pic_0011_1032030741401174813_4e43d182fce7.jpg', 'price': '$500'}
{'review_number': '8 reviews', 'star': 4, 'title': 'Park tickets', 'image': 'img/pic_0010_1027323963916688311_09cc2d7648d9.jpg', 'price': '$15.5'}
使用代碼:
from bs4
import BeautifulSoup
data=[]
path='/Users/lihai/Desktop/Plan-for-combating-master/week1/1_2/1_2answer_of_homework/1_2_homework_required/index.html'
with open(path,'r') as f:
Soup=BeautifulSoup(f.read(),'lxml')
pics=Soup.select('body > div > div > div.col-md-9 > div > div > div > img')
price=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')
titles=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')
review=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')
stars=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)')
for pic,pri,title,rev,star in zip(pics,price,titles,review,stars):
info={
'price':pri.get_text(),
'image':pic.get('src'),
'title':title.get_text(),
'review_number':rev.get_text(),
'star':len(star.find_all("span", class_='glyphicon glyphicon-star'))
}
data.append(info)
for d in data:
print(d)
心得體會(huì):
通過這次實(shí)驗(yàn)對(duì)HTML選擇器有了更為深入的理解碉纳,同時(shí)復(fù)習(xí)了DOM樹匕垫。我完成實(shí)驗(yàn)的過程是先自己思考然后試著寫褂删,想很久寫不出來的時(shí)候回參考老師的代碼煌珊,比如p:nth-of-type(2)是參考老師的代碼站故,然后去W3C查看這個(gè)代碼的意思是獲取同類的第二標(biāo)簽区宇,對(duì)比網(wǎng)頁HTML結(jié)構(gòu)星星是第二個(gè)P標(biāo)簽所以頓時(shí)恍然大悟娃殖。