利用xpath解析數(shù)據(jù)运怖,requests庫爬取房?jī)r(jià)拼弃,大致步驟如下:
- 獲得目標(biāo)網(wǎng)址,并觀察網(wǎng)址源碼摇展;
- UA偽裝吻氧,請(qǐng)求并獲得響應(yīng);
- 解析標(biāo)簽數(shù)據(jù);
- 循環(huán)遍歷提取解析到的數(shù)據(jù)医男,并保存下來砸狞。
import requests
from lxml import etree
import re
import os
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36'
}
file_path = './python_learning/xiantao58.txt'
f = open(file_path,"w")
for page in range(1,3):
url = 'https://xiantao.58.com/xinfang/loupan/all/p{0:d}/'.format(page)
response = requests.get(url=url,headers=headers)
response.encoding = "utf-8"
page_text = response.text
# print(page_text)
tree = etree.HTML(page_text)
div_list = tree.xpath('//div[@class= "key-list imglazyload"]/div')
# print(house_name_div)
for div in div_list:
house_name = div.xpath('./div/a[@class="lp-name"]/span/text()')[0]
# print(house_name)
house_price = div.xpath('./a[@class="favor-pos"]/p/span/text()')[0]
# print(house_price)
house_address = div.xpath('./div/a[@class="address"]/span/text()')[0]
address1 = house_address.replace('[','')
address2 = address1.replace(']','')
address3 = address2.replace('(','')
address4 = address3.replace(')','')
address = "".join(address4.split()[1:])
# print(house_name,house_price,house_address,sep="\t")
# house_address = house_address.encode('iso-8859-1').decode('gbk')
f.write(house_name+"\t"+house_price+"\t"+address+"\n")
# break
print(house_name+"下載成功!镀梭!")
f.close()
最后編輯于 :
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者