- 使用個人BDP
爬取網址:https://www.qiushibaike.com/text/
爬取內容:用戶地址
爬取方式:lxml
爬取思路:先確定幾大模塊帅掘,get_url(url)獲取用戶的詳細頁面棺榔,get_address(url)在用戶的詳細頁面中爬取地址信息鲫售,例如:廣東、四川等掏击,get_geo(address)通過調用API來獲取各省份對應的經度耳胎、緯度顾犹,最后將獲取的地址、經度绿贞、緯度信息保存到CSV文件中因块。
代碼:
import requests
from lxml import etree
import json
import csv
import time
def get_url(url):
r = requests.get(url,headers = headers)
print(r.status_code)
html = etree.HTML(r.text)
url_infos = html.xpath('//div[@class="author clearfix"]')
user_link_list = []
for url_info in url_infos:
user_part_link = url_info.xpath('a[1]/@href')
if len(user_part_link) != 0:
user_part_link = user_part_link[0]
user_link = "https://www.qiushibaike.com" + user_part_link
user_link_list.append(user_link)
else:
pass
return user_link_list
def get_address(url):
r = requests.get(user_link,headers = headers)
print(r.status_code)
html = etree.HTML(r.text)
if html.xpath('//div[2]/div[3]/div[2]/ul/li[4]/text()'):
address = html.xpath('//div[2]/div[3]/div[2]/ul/li[4]/text()')[0].split(' · ')[0]
get_geo(address)
else:
pass
def get_geo(address):
par = {'address': address, 'key': 'cb649a25c1f81c1451adbeca73623251'}
url = 'http://restapi.amap.com/v3/geocode/geo'
r = requests.get(url, par)
json_data = json.loads(r.text)
try:
geo = json_data['geocodes'][0]['location']
longtitude = geo.split(',')[0] #經度
latitude = geo.split(',')[1] #緯度
writer.writerow((address,longtitude,latitude))
#print(address,longtitude,latitude)
except IndexError:
pass
if __name__=="__main__":
f = open('F://map.csv','w',newline='')
writer = csv.writer(f)
writer.writerow(('地址','經度','緯度'))
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3294.6 Safari/537.36'}
url_list = ["https://www.qiushibaike.com/text/page/{}/".format(i) for i in range(1,14)]
for url in url_list:
user_link_list = get_url(url)
for user_link in user_link_list:
address = get_address(user_link)
f.close()
結果:
image.png
然后利用excel的插入→數(shù)據(jù)透視表功能整理數(shù)據(jù),最終變成:
image.png
然后利用BDP個人版中新建工作表→上傳數(shù)據(jù)→新建圖表(選擇地圖圖表)→選擇顏色和尺寸等籍铁。
最后的效果如下(不同顏色代表不同省份涡上,形狀越大的用戶分布越多):
image.png
當然,也可以方便地做出常規(guī)圖表:
image.png
具體可以查看以下鏈接:https://me.bdp.cn/api/su/5YWKREVA