不知不覺狭瞎,在b站(嗶哩嗶哩)做了一段時間的萌新up主了今瀑。(雖然是萌新党窜,但是還是有顆想當(dāng)大V的心的)古人云:工欲善其事,必先利其器拗引。“器”在這里我定義為數(shù)據(jù)分析幌衣,用數(shù)據(jù)驅(qū)動產(chǎn)品矾削。
嗶哩嗶哩的創(chuàng)作中心數(shù)據(jù)趨勢只展示七天以內(nèi)的數(shù)據(jù)。于是萌生出寫一個dashboard泼掠,把嗶哩嗶哩和YouTube的數(shù)據(jù)都放在mysql怔软,用dashboard展示,還能從mysql拉歷史數(shù)據(jù)择镇,從不同緯度進(jìn)行數(shù)據(jù)分析。
最下邊有完整源碼的github地址
第一個版本的demo長這樣:
開始寫dashboard
圖表的生成用到了pyecharts的庫括改,官方提供了個生成html的render()方法和生成多圖的page()方法腻豌,但是只能固定排列樣式。temp傳入后嘱能,要多圖也只能for循環(huán)吝梅,不能固定樣式。
于是惹骂,到pyecharts源碼苏携,添加了個render_html_content()方法,作用是对粪,生成一個包含圖表html代碼的對象右冻。然后就可以和python3,format()方法結(jié)合生成index.html
注:web前端用了開源的keen/dashboards
format()技巧:
例如我們在html_temp.html寫個{all}著拭,讀取html_temp內(nèi)容作為obj對象
然后obj.format(all=all)纱扭,右邊的all是html內(nèi)容,執(zhí)行后儡遮,就可以把{all}替換為html內(nèi)容乳蛾。
pyecharts源碼修改:
(如何找到pip3安裝的第三方庫的地址?)
運(yùn)行python3鄙币,
import pyecharts
pyecharts
就有地址了
修改render/engine.py
并在render類添加個render_html_content()函數(shù)
def render_html_content(self, template_name: str, chart: Any, path: str, **kwargs):
tpl = self.env.get_template(template_name)
html = utils.replace_placeholder(
tpl.render(chart=self.generate_js_link(chart), **kwargs)
)
return html
修改charts/base.py
并在base類添加個render_html_content()函數(shù)
def render_html_content(
self,
path: str = "render.html",
template_name: str = "simple_chart.html",
env: Optional[Environment] = None,
**kwargs,
) -> str:
self._prepare_render()
return engine.render_html_content(self, path, template_name, env, **kwargs)
在render/templates添加個temp.html
?{% import 'macro' as macro %}
{{ macro.render_chart_content(chart) }}
這個temp.html 去掉了<html></html>等標(biāo)簽肃叶,只輸出圖表的html代碼
這時只需要調(diào)用
圖表對象.render_html_content(template_name="temp.html")
和
html_obj.format(all=all)
就可以把圖表html替換掉模版html的{all}
獲取嗶哩嗶哩數(shù)據(jù):
光有圖表沒數(shù)據(jù)可不行
以下幾個公開的API可以獲取播放量、粉絲數(shù)十嘿、點(diǎn)贊等數(shù)據(jù)
https://api.bilibili.com/x/relation/stat?vmid=嗶哩嗶哩id
https://api.bilibili.com/x/space/upstat?mid=嗶哩嗶哩id
http://api.bilibili.com/x/space/navnum?mid=嗶哩嗶哩id
我們可以先建一個bilibili表因惭,然后把數(shù)據(jù)插入進(jìn)去
表結(jié)構(gòu)為:
CREATE TABLE bilibili (
id int(8) unsigned NOT NULL AUTO_INCREMENT,
view int(9) NOT NULL COMMENT '播放總數(shù)',
follower int(9) NOT NULL COMMENT '被關(guān)注數(shù)',
likes int(9) NOT NULL COMMENT '點(diǎn)贊數(shù)',
video_count int(9) NOT NULL COMMENT '視頻數(shù)',
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
獲取數(shù)據(jù)的python腳本:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os,requests,json,pymysql
class spider(object):
"""docstring for zs_spider"""
def __init__(self):
# create connection object
self.conn = pymysql.connect(host='192.168.28.140',port=3306,user='test',passwd='test123',db='test',charset='utf8')
self.cursor = self.conn.cursor()
self.headers = {
"user-agent": "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)",
"referer":"https://space.bilibili.com/164106011/video",
}
self.vmid = "164106011"
def __del__(self):
# close connection object
self.cursor.close()
self.conn.close()
def insert_testdata(self):
sql = """select count(*) from bilibili;"""
self.cursor.execute(sql)
countNum = self.cursor.fetchall()[0][0]
if countNum <= 5:
for i in range(5 - countNum):
self.insert_to_database(1000*i,10*i,10*i,1*i)
self.conn.commit()
print("已插入測試數(shù)據(jù)")
?
def insert_to_database(self,view,follower,likes,video_count):
#
sql = """INSERT INTO bilibili (view,follower,likes,video_count) VALUES ( %d, %d,%d, %d) """
data = (view,follower,likes,video_count)
self.cursor.execute(sql % data)
print("已插入今日數(shù)據(jù)")
def select_data(self):
sql = """select * from bilibili order by id DESC limit 6;"""
self.cursor.execute(sql)
return self.cursor.fetchall()
def spider_get_data(self):
follower = json.loads(requests.get("https://api.bilibili.com/x/relation/stat?vmid="+self.vmid,headers=self.headers).text)["data"]["follower"]
upstat = json.loads(requests.get("https://api.bilibili.com/x/space/upstat?mid="+self.vmid,headers=self.headers).text)["data"]
view = upstat["archive"]["view"]
likes = upstat["likes"]
video_count = json.loads(requests.get("http://api.bilibili.com/x/space/navnum?mid="+self.vmid,headers=self.headers).text)["data"]["video"]
self.insert_to_database(view,follower,likes,video_count)
self.conn.commit()
def main():
bilibili = spider()
# bilibili.spider()
?
if __name__ == '__main__':
main()
Python獲取近五天日期的列表:
import datetime
def get_date():
date = list()
for i in range(5):
date.append((datetime.date.today() + datetime.timedelta(days = -i)).strftime("%m月%d日"))
return date
date = get_date()[::-1] # 獲取五天的日期
最后附上生成dashboard腳本:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
?
from pyecharts.faker import Faker
from pyecharts import options as opts
from pyecharts.charts import Pie,Page,Line
from pyecharts.globals import ThemeType
import get_data
import datetime
with open("index_temp.html","r") as f:
f.readline().rstrip("\n bg")
index_content = f.read()
f.close()
def line_center(width,height,title,date,view):
c = (
Line(init_opts=opts.InitOpts(theme=ThemeType.CHALK,width=width,height=height))
.add_xaxis(date)
.add_yaxis("嗶哩嗶哩", view)
# .add_yaxis("YouTube", [3,2,55,4,5])
.set_series_opts(
areastyle_opts=opts.AreaStyleOpts(opacity=0.5),
label_opts=opts.LabelOpts(is_show=True),
)
.set_global_opts(
xaxis_opts=opts.AxisOpts(
axistick_opts=opts.AxisTickOpts(is_align_with_label=True),
is_scale=False,
boundary_gap=False,
),
)
)
return c
def line_left(width,height,title,date,data):
c = (
Line(init_opts=opts.InitOpts(theme=ThemeType.CHALK,width=width,height=height))
.add_xaxis(date)
.add_yaxis(title, data)
.set_series_opts(
areastyle_opts=opts.AreaStyleOpts(opacity=0.5),
label_opts=opts.LabelOpts(is_show=True),
)
.set_global_opts(
yaxis_opts=opts.AxisOpts(name="單位:/千人",
axislabel_opts=opts.LabelOpts(formatter="{value} K"),
),
)
)
return c
def line_right(width,height,title,date,data):
c = (
Line(init_opts=opts.InitOpts(theme=ThemeType.CHALK,width=width,height=height))
.add_xaxis(date)
.add_yaxis(title, data)
.set_series_opts(
areastyle_opts=opts.AreaStyleOpts(opacity=0.5),
label_opts=opts.LabelOpts(is_show=True),
)
)
return c
def bottom_all(width,height,title,date,view,follower,likes,video_count):
c = (
Line(init_opts=opts.InitOpts(theme=ThemeType.CHALK,width=width,height=height))
.add_xaxis(date)
.add_yaxis(
series_name="被關(guān)注數(shù)",
stack="總量",
y_axis=follower,
areastyle_opts=opts.AreaStyleOpts(opacity=0.5),
label_opts=opts.LabelOpts(is_show=False),
)
.add_yaxis(
series_name="點(diǎn)贊數(shù)",
stack="總量",
y_axis=likes,
areastyle_opts=opts.AreaStyleOpts(opacity=0.5),
label_opts=opts.LabelOpts(is_show=False),
)
.add_yaxis(
series_name="視頻總數(shù)",
stack="總量",
y_axis=video_count,
areastyle_opts=opts.AreaStyleOpts(opacity=0.5),
label_opts=opts.LabelOpts(is_show=False),
)
.add_yaxis(
series_name="播放總數(shù)",
stack="總量",
y_axis=view,
areastyle_opts=opts.AreaStyleOpts(opacity=0.5),
label_opts=opts.LabelOpts(is_show=False),
)
.set_global_opts(
tooltip_opts=opts.TooltipOpts(trigger="axis", axis_pointer_type="cross"),
yaxis_opts=opts.AxisOpts(
type_="value",
axistick_opts=opts.AxisTickOpts(is_show=True),
splitline_opts=opts.SplitLineOpts(is_show=True),
),
xaxis_opts=opts.AxisOpts(type_="category", boundary_gap=False),
)
?
)
return c
def get_date():
date = list()
for i in range(5):
date.append((datetime.date.today() + datetime.timedelta(days = -i)).strftime("%m月%d日"))
return date
def write_html_to_file(format_content):
with open("index.html","w+") as f:
f.write(format_content)
f.close
def main():
get_data.spider().insert_testdata() #如果數(shù)據(jù)不存在,插入前五天的測試數(shù)據(jù)
date = get_date()[::-1] # 獲取五天的日期
# get_data.spider().spider_get_data()
data = get_data.spider().select_data()[::-1] # 爬取嗶哩嗶哩 用戶數(shù)據(jù)
view = [x[1] for x in data[1:]] # 從用戶數(shù)據(jù)提取 播放數(shù)
follower = [x[2] for x in data[1:]] # 從用戶數(shù)據(jù)提取 關(guān)注數(shù)
likes =[x[3] for x in data[1:]] # 從用戶數(shù)據(jù)提取 點(diǎn)贊數(shù)
video_count =[x[4] for x in data[1:]] # 從用戶數(shù)據(jù)提取 視頻播放數(shù)
view_six_day = [x[1] for x in data]
view_sub = [(view_six_day[x+1]-view_six_day[x])/1000 for x in range(len(view_six_day)-1)]
follower_six_day = [x[2] for x in data]
follower_sub = [follower_six_day[x+1]-follower_six_day[x] for x in range(len(follower_six_day)-1)]
# 開始畫圖并生成html
# "256px","325px"
all = line_center("533px","325px","總曝光量",date,view).render_html_content(template_name="temp.html")
line_left_bilibili = line_left("310px","325px","新增播放",date,view_sub).render_html_content(template_name="temp.html")
line_right_bilibili = line_right("310px","325px","新增關(guān)注",date,follower_sub).render_html_content(template_name="temp.html")
bottom = bottom_all("1226px","600px","新增數(shù)",date,view,follower,likes,video_count).render_html_content(template_name="temp.html")
format_content = index_content.format(all=all,line_left_bilibili=line_left_bilibili,line_right_bilibili=line_right_bilibili,bottom_all=bottom)
print(all)
write_html_to_file(format_content)
print("index.html生成成功")
if __name__ == '__main__':
main()
完整源碼的github地址:
https://github.com/guyuxiu/project
參考文獻(xiàn):
【pyechart文檔】https://pyecharts.org/
【dashboards源碼】https://github.com/keen/dashboards
【嗶哩嗶哩 API】https://github.com/SocialSisterYi/bilibili-API-collect/