Flask全國氣象數(shù)據(jù)采集及可視化系統(tǒng)(爬蟲)Python源碼下載
一拒贱、項(xiàng)目前置條件
①具備 python 環(huán)境,并且可以通過 pip 進(jìn)行安裝項(xiàng)目
②具備 Pycram 工具崔梗,推薦專業(yè)版
③具備 Mysql 數(shù)據(jù)庫
④具備 Navicat 數(shù)據(jù)庫可視化管理工具
⑤推薦使用 Google Chrome夜只、Firefox 瀏覽器
項(xiàng)目簡介
項(xiàng)目通過爬取中國天氣網(wǎng)的各個(gè)城市數(shù)據(jù),然后保存到自己的數(shù)據(jù)庫蒜魄,然后
通過python 以及echart 技術(shù)對這些數(shù)據(jù)進(jìn)行初步分析以及可視化扔亥。項(xiàng)目采用B/S
架構(gòu),通過瀏覽器即可訪問谈为。完善版本實(shí)現(xiàn)了所有可視化與后臺(tái)數(shù)據(jù)進(jìn)行關(guān)聯(lián)旅挤,
并且爬蟲也進(jìn)行了優(yōu)化處理。
項(xiàng)目技術(shù)
python 爬蟲技術(shù)伞鲫、Flask 后端框架粘茄、python、mysql 數(shù)據(jù)庫秕脓、echart 大數(shù)據(jù)可
視化柒瓣、layui 后臺(tái)管理前端框架。
項(xiàng)目功能
系統(tǒng)擁有五大功能模塊吠架,應(yīng)付各種設(shè)計(jì)場景芙贫,其中有可視化功能模塊、版本管理
功能模塊诵肛、用戶管理功能模塊屹培、實(shí)時(shí)氣象數(shù)據(jù)管理功能模塊、爬蟲管理功能模
塊怔檩。同時(shí)我們針對管理員和普通用戶有著不同菜單權(quán)限的控制
爬蟲簡述
自動(dòng)爬蟲:項(xiàng)目啟動(dòng)褪秀,定時(shí)間隔1 小時(shí)爬取一次數(shù)據(jù)(可以自定義修改)。如下圖薛训。
項(xiàng)目手動(dòng)啟動(dòng)爬蟲:通過后臺(tái)控制臺(tái)媒吗,點(diǎn)擊啟動(dòng)爬蟲,后臺(tái)靜默爬取實(shí)時(shí)天氣數(shù)據(jù)乙埃,智能跳
過已經(jīng)獲取到的數(shù)據(jù)闸英。
python?文件手動(dòng)啟動(dòng)爬蟲:手動(dòng)運(yùn)行python 文件锯岖,獲取最新天氣數(shù)據(jù)。
flask
flask_apscheduler
pymysql
requests
xlwt
selenium
Navicat軟件將weathers數(shù)據(jù)庫文件導(dǎo)入
pytcharm打開項(xiàng)目甫何,配置Python環(huán)境(安裝need文件的依賴包)
運(yùn)行app.py文件啟動(dòng)項(xiàng)目
賬號? admin? 密碼123456
爬取的網(wǎng)站(天氣網(wǎng))http://www.weather.com.cn/
管理員賬號admin ??密碼123456
依賴的Python包:
flask
flask_apscheduler
pymysql
requests
xlwt
Selenium
爬蟲核心代碼:
class GetWeather:
def __init__(self):
self.baseUrl = r"http://d1.weather.com.cn/sk_2d/"
self.headers = {'Accept': "*/*",
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'keep-alive',
'Connection': '',
'Cookie': 'f_city=北京|101010100|; Hm_lvt_080dabacb001ad3dc8b9b9049b36d43b=1637305568,1637734650,1639644011,1639710627; Hm_lpvt_080dabacb001ad3dc8b9b9049b36d43b=1639723697'.encode(
"utf-8").decode("latin1"),
'Host': 'd1.weather.com.cn',
'Referer': 'http://www.weather.com.cn/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36', }
self.loadList = []
self.cityList = [] ?# 格式為:列表里面的子列表都是一個(gè)省份的所有城市出吹,子列表里所有元素都是字典,每個(gè)字典有兩項(xiàng)
self.cityDict = {}
self.result = xlwt.Workbook(encoding='utf-8', style_compression=0)
self.sheet = self.result.add_sheet('result', cell_overwrite_ok=True)
self.cityRow = 0
self.totalGet = 0
current_path = os.path.dirname(__file__)
with open(current_path + "/CITY.txt", 'r', encoding='UTF-8') as load_f:
loadList = json.load(load_f) ?# 34個(gè)省份
for i in range(0, 4):
self.cityList.append(loadList[i])
for i in range(4, 34):
for j in loadList[i]['cityList']:
self.cityList.append(j)
for i in self.cityList:
if 'districtList' in i.keys():
self.cityDict.setdefault(i['cityName'], i['cityId'] + "01") ?# 省
else:
self.cityDict.setdefault(i['provinceName'], i['id'] + "0100") ?# 直轄市
print(len(self.cityDict))
def __getWeatherInfo__(self):
db = dbUtil()
count = 0
for city, id in self.cityDict.items():
try:
self.totalGet = self.totalGet + 1
self.sheet.write(self.cityRow, 0, city) ?# 寫當(dāng)前城市名
PageUrl = self.baseUrl + id + ".html?_" + str(int(time.time() * 1000))
response = requests.get(PageUrl, headers=self.headers, allow_redirects=False)
response.encoding = "utf-8"
self.htmlResult = response.text
data = json.loads(self.htmlResult.replace("var dataSK=", ""))
nameen = data["nameen"] ?# 城市拼音
cityname = data["cityname"] ?# 城市名稱
temp = data["temp"] ?# 當(dāng)前溫度
WD = data["WD"] ?# 風(fēng)向
WS = data["WS"].replace("級", "") ?# 風(fēng)力
wse = data["wse"].replace("km/h", "") ?# 風(fēng)速
sd = data["sd"].replace("%", "") ?# 濕度
weather = data["weather"] ?# 天氣
record_date = data["date"] ?# 時(shí)間
record_time = data["time"] ?# 時(shí)分
aqi = data["aqi"] ?# 時(shí)分
judge_sql = "select count(id) from `weather` where nameen = '" + nameen + "' and cityname='" + cityname + "' and record_date='" + record_date + "' and record_time='" + record_time + "'";
sql = "INSERT INTO `weather` VALUES (null, '" + nameen + "', '" + cityname + "', '" + record_date + "', '" + record_time + "', " + str(
temp) + ", '" + WD + "', " + WS + ", " + wse + ", " + sd + ", '" + weather + "', " + aqi + ", '" + time.strftime(
"%Y-%m-%d %H:%M:%S", time.localtime()) + "',0);"
i = db.query_noargs(judge_sql)[0][0]
if int(i) > 0:
print("跳過:", judge_sql)
continue
update_sql = "update `weather` set is_old=1 where nameen = '" + nameen + "' and cityname='" + cityname + "'";
print("插入:", sql)
count += 1
db.query_noargs(update_sql)
db.query_noargs(sql)
except Exception as e:
print(e)
continue
t = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
sql = "insert into slog VALUES (NULL, \"【爬蟲啟動(dòng)】爬取數(shù)據(jù)全國天氣數(shù)據(jù)運(yùn)行成功,獲取數(shù)據(jù):" + str(count) + "條\",\"" + t + "\")"
db.query_noargs(sql)
db.close_commit()
def __main__(self):
print(datetime.datetime.now())
self.__getWeatherInfo__()
print(datetime.datetime.now())
# 后臺(tái)調(diào)用爬蟲
def online():
weather = GetWeather()
weather.__main__()
return 200