實戰(zhàn)計劃第三天欲诺,抓了300條數(shù)據(jù)吓笙。
最終成果是這樣的:
Paste_Image.png
我的代碼:
#!/usr/bin/env python #告訴計算機(jī)執(zhí)行程序在系統(tǒng)環(huán)境變量中的名字陕壹,詳細(xì)位置在環(huán)境變量中設(shè)置好了
#-*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import time
import requests
urls = ['http://bj.xiaozhu.com/search-duanzufang-p{}-0/'.format(str(i))for i in range(1,10,1)]
def house_source(url,data=None):
wb_data = requests.get(url)
time.sleep(1)
soup = BeautifulSoup(wb_data.text, 'lxml')
titles = soup.select("body > div.wrap.clearfix.con_bg > div.con_l > div.pho_info > h4 > em")
adresses = soup.select('body > div.wrap.clearfix.con_bg > div.con_l > div.pho_info > p > span.pr5')
prices = soup.select('#pricePart > div.day_l > span')
imgs = soup.select('img[id="curBigImage"]')
names = soup.select('#floatRightBox > div.js_box.clearfix > div.w_240 > h6 > a')
picts = soup.select('#floatRightBox > div.js_box.clearfix > div.member_pic > a > img')
males = soup.select('div[class="member_ico1"]')
for title, adress, price, img, name, pict, male in zip(titles, adresses, prices, imgs, names, picts, males):
data = {
'title': title.get_text(),
'adress': adress.get_text(),
'price': price.get_text(),
'img': img.get('src'),
'name': name.get_text(),
'pict': pict.get('src'),
'male': get_lorder_male(male.get('class')) #寫個函數(shù)處理
}
print(data)
def get_links(url):
wb_data = requests.get(url)
time.sleep(2)
soup = BeautifulSoup(wb_data.text, 'lxml')
links = soup.select("#page_list > ul > li > a" )
for link in links:
href = link.get('href')
house_source(href)
def get_lorder_male(class_name):
if class_name == ['member_ico']: # 判斷語句
return '男'
else:
return '女'
for single_url in urls:
get_links(single_url)
總結(jié)
- format(str(i))for i in range(1,10,1) 找網(wǎng)頁編號規(guī)律
- 構(gòu)建字典時對key的處理,如get到css樣式對應(yīng)的屬性male.get('class')筛谚,link.get('href')鸠窗,
- 函數(shù)的構(gòu)建 如def get_lorder_male(class_name):
if class_name == ['member_ico']: # 判斷語句 return '男'