上周已經(jīng)分享過搜狗·疫情數(shù)據(jù)爬缺癜濉(R語言),這次分享一下搜狗·疫情數(shù)據(jù)爬韧ā(Python)
不說廢話蛾茉,直接上代碼垦江。有什么問題嗓化,可以在留言區(qū)討論棠涮。
from urllib import request
from lxml import etree
import re
import pandas as pd
import json
url="http://sa.sogou.com/new-weball/page/sgs/epidemic?type_page=WEB"
response = request.urlopen(url) #請求
html = response.read()#獲取
html = html.decode("utf-8")#解碼
xml = etree.HTML(html)
datas = xml.xpath('//html/body/script[1]/text()')
datas=re.sub('window.type_page = \"WEB\"\n window.__INITIAL_STATE__ = ',"",datas[0])
area=json_data["data"]["area"]
citytempdate = []
provincetempdate = []
for i in area:
provinceShortName = i["provinceShortName"]
confirmedCount = i["confirmedCount"]
curedCount = i["curedCount"]
deadCount = i["deadCount"]
provincetempdate.append([provinceShortName,confirmedCount,curedCount,deadCount])
for j in i["cities"]:
cityName = j["cityName"]
confirmedCount=j["confirmedCount"]
curedCount=j["curedCount"]
deadCount=j["deadCount"]
citytempdate.append([provinceShortName,cityName,confirmedCount,curedCount,deadCount])
dt_city = pd.DataFrame(citytempdate,columns=["PROVINCESHORTNAME","CITYNAME","CONFIRMEDCOUNT","CUREDCOUNT","DEADCOUNT"])
dt_province = pd.DataFrame(provincetempdate,columns=["PROVINCESHORTNAME","CONFIRMEDCOUNT","CUREDCOUNT","DEADCOUNT"])
各城市(部分)數(shù)據(jù)如下:
PROVINCESHORTNAME | CITYNAME | CONFIRMEDCOUNT | CUREDCOUNT | DEADCOUNT | |
---|---|---|---|---|---|
0 | 湖北 | 武漢 | 41152 | 3507 | 1309 |
1 | 湖北 | 孝感 | 3279 | 449 | 70 |
2 | 湖北 | 黃岡 | 2831 | 839 | 78 |
3 | 湖北 | 荊州 | 1501 | 305 | 37 |
4 | 湖北 | 鄂州 | 1274 | 244 | 35 |
5 | 湖北 | 隨州 | 1267 | 140 | 24 |
6 | 湖北 | 襄陽 | 1155 | 151 | 20 |
各省分總體(部分)數(shù)據(jù)如下:
PROVINCESHORTNAME | CONFIRMEDCOUNT | CUREDCOUNT | DEADCOUNT | |
---|---|---|---|---|
0 | 湖北 | 58182 | 6693 | 1696 |
1 | 廣東 | 1322 | 524 | 4 |
2 | 河南 | 1246 | 509 | 16 |
3 | 浙江 | 1171 | 507 | 0 |
4 | 湖南 | 1006 | 498 | 3 |
5 | 安徽 | 973 | 280 | 6 |
6 | 江西 | 930 | 275 | 1 |
7 | 江蘇 | 626 | 258 | 0 |
8 | 重慶 | 552 | 211 | 5 |
9 | 山東 | 541 | 191 | 2 |
轉(zhuǎn)載請注明:
微信公眾號:數(shù)據(jù)志
簡書:數(shù)據(jù)志
博客園:https://www.cnblogs.com/wheng/
CSDN:https://blog.csdn.net/wzgl__wh
GitHub(數(shù)據(jù)、代碼):https://github.com/hellowangheng/datazhi/tree/master/2019-nCoV