Python實(shí)戰(zhàn)計(jì)劃 第一周大作業(yè)-58同城
要求:
1.爬取頁(yè)面http://bj.58.com/pbdn/0/ 的列表信息(除轉(zhuǎn)轉(zhuǎn)和推廣信息外),然后獲取詳情信息夭咬,如:類目啃炸、標(biāo)題、發(fā)貼時(shí)間卓舵、價(jià)格南用、成色、區(qū)域掏湾、瀏覽量等
2.注意瀏覽量的獲取方法
執(zhí)行結(jié)果:
Paste_Image.png
代碼如下:
from bs4 import BeautifulSoup
import requests
import time
url ='http://bj.58.com/pbdn/0/'
url_links = []
data = []
def get_url(url):
msg = requests.get(url)
soup = BeautifulSoup(msg.text, 'lxml')
links = soup.select('td.t a.t')
for link in links:
try:
if link.attrs['data-addtype']:
pass
except:
if link.attrs['href'].find('zhuanzhuan') >= 0: #find不拋出異常裹虫,如果用index,則會(huì)拋出異常
pass
else:
#print(link.attrs['href'])
url_links.append(link.attrs['href'])
def get_msginfo(url):
print("msginfo:" + url)
msg = requests.get(url)
soup = BeautifulSoup(msg.text, 'lxml')
#類目
type = soup.select('#header > div.breadCrumb.f12 ')[0].text.split()
#標(biāo)題
title = soup.select('#content > div.person_add_top.no_ident_top > div.per_ad_left > div.col_sub.mainTitle > h1')[0].text
#發(fā)貼時(shí)間
ftime = soup.select('#index_show > ul.mtit_con_left.fl > li.time')[0].text
#價(jià)格
price = soup.select('ul > li > div.su_con > span.price')[0].text
#成色
purity = soup.select('ul > li > div.su_con > span')[1].text.strip()
#區(qū)域
if len(soup.select('.c_25d')) == 0:
area = None
else:
area = soup.select('.c_25d')[0].text.replace('-', '').split()
#瀏覽量
view = get_view(url)
date = {
'type':type,
'title':title,
'ftime':ftime,
'price':price,
'purity':purity,
'area':area,
'view':view
}
print(date)
data.append(date)
def get_view(url):
headers ={
'Referer':url
}
viwe_url = 'http://jst1.58.com/counter?infoid={}'.format(str(url.split('x.shtml')[0].split('/')[-1]))
msg = requests.get(viwe_url, headers=headers)
return msg.text.split('=')[-1]
get_url(url)
for url_link in url_links:
time.sleep(2)
get_msginfo(url_link)
總結(jié):
1.通過(guò)一周的學(xué)習(xí),已熟練掌握requests融击、bs4庫(kù)的使用恒界,學(xué)會(huì)了網(wǎng)頁(yè)要素的提取,能用多種方法進(jìn)行數(shù)據(jù)篩選
2.通過(guò)大作業(yè)砚嘴,學(xué)會(huì)了商品過(guò)濾、js異步加載的頁(yè)面分析涩拙,及簡(jiǎn)單的反爬技巧际长,對(duì)http協(xié)議有了更多的認(rèn)識(shí)