爬取的信息的成果展示
image : img/pic_0000_073a9256d9624c92a05dc680fc28865f.jpg
price : $24.99
view : 65 reviews
describe : See more snippets like this online store item at web store
score : 5
title : EarPod
image : img/pic_0005_828148335519990171_c234285520ff.jpg
price : $64.99
view : 12 reviews
describe : This is a short description. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
score : 4
title : New Pocket
image : img/pic_0006_949802399717918904_339a16e02268.jpg
price : $74.99
view : 31 reviews
describe : This is a short description. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
score : 4
title : New sunglasses
image : img/pic_0008_975641865984412951_ade7a767cfc8.jpg
price : $84.99
view : 6 reviews
describe : This is a short description. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
score : 3
title : Art Cup
image : img/pic_0001_160243060888837960_1c3bcd26f5fe.jpg
price : $94.99
view : 18 reviews
describe : This is a short description. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
score : 4
title : iphone gamepad
image : img/pic_0002_556261037783915561_bf22b24b9e4e.jpg
price : $214.5
view : 18 reviews
describe : This is a short description. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
score : 4
title : Best Bed
image : img/pic_0011_1032030741401174813_4e43d182fce7.jpg
price : $500
view : 35 reviews
describe : This is a short description. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
score : 4
title : iWatch
image : img/pic_0010_1027323963916688311_09cc2d7648d9.jpg
price : $15.5
view : 8 reviews
describe : This is a short description. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
score : 4
title : Park tickets
源代碼
from bs4 import BeautifulSoupwith open('./index.html', 'r') as wbdata:
soup = BeautifulSoup(wbdata, 'lxml')
images = soup.select('div > div.col-md-9 > div > div > div > img')
titles = soup.select('div.caption > h4:nth-of-type(2) > a')
prices = soup.select('div.caption > h4.pull-right')
describes = soup.select('div.caption > p')
views = soup.select(' div.ratings > p.pull-right')
scores = soup.select('div > div.ratings > p:nth-of-type(2)')
info = []
for title, image, price, describe, view, score in zip(titles, images, prices, describes, views, scores):
data = {
'title': title.get_text(),
'image': image.get('src'),
'price': price.get_text(),
'describe': describe.get_text(),
'view': view.get_text(),
'score': len(score.find_all('span','glyphicon glyphicon-star'))
}
info.append(data)
for i in info:
for a in i:
print(a, ':', i[a])
print('\n')
筆記
1路媚、Beautiful Soup不支持Nth-child語(yǔ)法凌盯,所以要換成nth-of type(或者去掉這個(gè)部分案啦)
2、soup.select()盡量不用完整selector
3集嵌、要學(xué)著自己看錯(cuò)題集和文檔
4讥电、耐心看debug提示信息
5、獲得某一標(biāo)簽下的屬性可以用get()也可以用find_all()