實(shí)戰(zhàn)計(jì)劃第六天蝶溶,學(xué)習(xí)Mongo數(shù)據(jù)庫(kù)的生成和讀取。
最終成果是這樣的:
Paste_Image.png
我的代碼:
#!/usr/bin/env python #告訴計(jì)算機(jī)執(zhí)行程序在系統(tǒng)環(huán)境變量中的名字甲抖,詳細(xì)位置在環(huán)境變量中設(shè)置好了
# -*- coding: utf-8 -*-
import pymongo
from bs4 import BeautifulSoup
import requests
client = pymongo.MongoClient('localhost',27017)
xiaozhu = client['xiaozhu']
sheet_tab = xiaozhu['sheet_tab']
def container(url):
wb_data = requests.get(url)
soup = BeautifulSoup(wb_data.text,'lxml')
prices = soup.select('#page_list > ul > li > div.result_btm_con.lodgeunitname > span.result_price > i')
titles = soup.select('#page_list > ul > li > div.result_btm_con.lodgeunitname > div > a > span')
types = soup.select('#page_list > ul > li > div.result_btm_con.lodgeunitname > div > em')
adresses = soup.select('#page_list > ul > li > div.result_btm_con.lodgeunitname > div > em')
links = soup.select('#page_list > ul > li > a')
for price,title,type,adress,link in zip(prices,titles,types,adresses,links):
data = {
'price': price.get_text(),
'title' : title.get_text(),
'types' : type.get_text().split('-')[0].strip(),
'adress' : adress.get_text().split('-')[2].strip(),
'link' : link.get('href')
}
#print(data)
sheet_tab.insert_one(data)
def find_fangzi():
for i in sheet_tab.find():
if int(i['price']) >= 500:
print(i)
urls = ['http://bj.xiaozhu.com/search-duanzufang-p{}-0/'.format(i)for i in range(3)]
for url in urls:
container(url)
find_fangzi()
總結(jié)和問(wèn)題
- strip()去除抓取到的字符串中的空格
- 面向?qū)ο缶幊?-習(xí)慣性使用函數(shù)