心得
- 調(diào)用MongoDB需要導(dǎo)入pymongo庫(kù)
- MongoDB中也有庫(kù)(db)和表(collection)的概念赡盘,可以用use,show collections缰揪,find()方法來查看
- 插入數(shù)據(jù)使用insert_one()方法陨享,可以在循環(huán)中隨時(shí)入庫(kù),不用再使用專用列表來存儲(chǔ)钝腺,數(shù)據(jù)庫(kù)中的數(shù)據(jù)可以保留長(zhǎng)期反復(fù)使用抛姑。
- 數(shù)據(jù)庫(kù)中的數(shù)據(jù)可以使用mongoexport.exe導(dǎo)出(如json、csv格式)艳狐,也可以使用mongoimport將外部數(shù)據(jù)導(dǎo)入數(shù)據(jù)庫(kù)
- find()方法可以對(duì)結(jié)果進(jìn)行條件篩選
我的代碼
找到小豬短租網(wǎng)站的列表頁(yè)前三頁(yè)上月租500元以上的房源信息
from bs4 import BeautifulSoup
import requests
import time
import pymongo
client = pymongo.MongoClient('localhost',27017)
urls = ['http://bj.xiaozhu.com/search-duanzufang-p{}-0/'.format(str(i)) for i in range(1,4,1)]
xiaozhu = client['xiaozhu']
sheet_lines = xiaozhu['sheet_lines']
def get_page_info(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text,'lxml')
titles = soup.select('div.result_btm_con.lodgeunitname > div > a > span')
links = soup.select('div.result_btm_con.lodgeunitname')
prices = soup.select('div.result_btm_con.lodgeunitname > span.result_price > i')
types = soup.select('div.result_btm_con.lodgeunitname > div > em.hiddenTxt')
for title, type,price,link in zip(titles, types, prices,links):
data = {
'title':title.get_text(),
'link': link.get('detailurl'),
'unit':type.get_text().split('\n')[1].replace(' ',''),
#'comment':type.get_text().split('\n')[7].replace(' ',''),
'price':int(price.get_text()) #變成數(shù)字才能根據(jù)大小并檢索
}
sheet_lines.insert_one(data) #注入數(shù)據(jù)庫(kù)
for single_url in urls:
get_page_info(single_url)
time.sleep(2)
for item in sheet_lines.find({'price':{'$gte':500}}):
print(item)
運(yùn)行結(jié)果
- 數(shù)據(jù)庫(kù)內(nèi)數(shù)據(jù)
1.jpg
- 篩選結(jié)果
2.jpg