國(guó)慶期間每類視頻點(diǎn)贊量和轉(zhuǎn)發(fā)量
描述
用戶-視頻互動(dòng)表tb_user_video_log
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
---|---|---|---|---|---|---|---|---|
1 | 101 | 2001 | 2021-09-24 10:00:00 | 2021-09-24 10:00:20 | 1 | 1 | 0 | NULL |
2 | 105 | 2002 | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 0 | 0 | 1 | NULL |
3 | 102 | 2002 | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 1 | 1 | 1 | NULL |
4 | 101 | 2002 | 2021-09-26 11:00:00 | 2021-09-26 11:00:30 | 1 | 0 | 1 | NULL |
5 | 101 | 2002 | 2021-09-27 11:00:00 | 2021-09-27 11:00:30 | 1 | 1 | 0 | NULL |
6 | 102 | 2002 | 2021-09-28 11:00:00 | 2021-09-28 11:00:30 | 1 | 0 | 1 | NULL |
7 | 103 | 2002 | 2021-09-29 11:00:00 | 2021-10-02 11:00:30 | 1 | 0 | 1 | NULL |
8 | 102 | 2002 | 2021-09-30 11:00:00 | 2021-09-30 11:00:30 | 1 | 1 | 1 | NULL |
9 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:20 | 1 | 1 | 0 | NULL |
10 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:15 | 0 | 0 | 1 | NULL |
11 | 103 | 2001 | 2021-10-01 11:00:50 | 2021-10-01 11:01:15 | 1 | 1 | 0 | 1732526 |
12 | 106 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 2 | 0 | 1 | NULL |
13 | 107 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1 | 0 | 1 | NULL |
14 | 108 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1 | 1 | 1 | NULL |
15 | 109 | 2002 | 2021-10-03 10:59:05 | 2021-10-03 11:00:05 | 0 | 1 | 0 | NULL |
(uid-用戶ID, video_id-視頻ID, start_time-開始觀看時(shí)間, end_time-結(jié)束觀看時(shí)間, if_follow-是否關(guān)注, if_like-是否點(diǎn)贊, if_retweet-是否轉(zhuǎn)發(fā), comment_id-評(píng)論ID)
短視頻信息表tb_video_info
id | video_id | author | tag | duration | release_time |
---|---|---|---|---|---|
1 | 2001 | 901 | 旅游 | 30 | 2020-01-01 07:00:00 |
2 | 2002 | 901 | 旅游 | 60 | 2021-01-01 07:00:00 |
3 | 2003 | 902 | 影視 | 90 | 2020-01-01 07:00:00 |
4 | 2004 | 902 | 美女 | 90 | 2020-01-01 08:00:00 |
(video_id-視頻ID, author-創(chuàng)作者ID, tag-類別標(biāo)簽, duration-視頻時(shí)長(zhǎng), release_time-發(fā)布時(shí)間)
問題:統(tǒng)計(jì)2021年國(guó)慶頭3天每類視頻每天的近一周總點(diǎn)贊量和一周內(nèi)最大單天轉(zhuǎn)發(fā)量,結(jié)果按視頻類別降序、日期升序排序丽惶。假設(shè)數(shù)據(jù)庫(kù)中數(shù)據(jù)足夠多藐鹤,至少每個(gè)類別下國(guó)慶頭3天及之前一周的每天都有播放記錄谴仙。
輸出示例:
示例數(shù)據(jù)的輸出結(jié)果如下
tag | dt | sum_like_cnt_7d | max_retweet_cnt_7d |
---|---|---|---|
旅游 | 2021-10-01 | 5 | 2 |
旅游 | 2021-10-02 | 5 | 3 |
旅游 | 2021-10-03 | 6 | 3 |
解釋:
由表tb_user_video_log里的數(shù)據(jù)可得只有旅游類視頻的播放屋休,2021年9月25到10月3日每天的點(diǎn)贊量和轉(zhuǎn)發(fā)量如下:
tag | dt | like_cnt | retweet_cnt |
---|---|---|---|
旅游 | 2021-09-25 | 1 | 2 |
旅游 | 2021-09-26 | 0 | 1 |
旅游 | 2021-09-27 | 1 | 0 |
旅游 | 2021-09-28 | 0 | 1 |
旅游 | 2021-09-29 | 0 | 1 |
旅游 | 2021-09-30 | 1 | 1 |
旅游 | 2021-10-01 | 2 | 1 |
旅游 | 2021-10-02 | 1 | 3 |
旅游 | 2021-10-03 | 1 | 0 |
因此國(guó)慶頭3天(10.0110.03)里10.01的近7天(9.2510.01)總點(diǎn)贊量為5次雷袋,單天最大轉(zhuǎn)發(fā)量為2次(9月25那天最大)持搜;同理可得10.02和10.03的兩個(gè)指標(biāo)密似。
1. 數(shù)據(jù)準(zhǔn)備
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid INT NOT NULL COMMENT '用戶ID',
video_id INT NOT NULL COMMENT '視頻ID',
start_time datetime COMMENT '開始觀看時(shí)間',
end_time datetime COMMENT '結(jié)束觀看時(shí)間',
if_follow TINYINT COMMENT '是否關(guān)注',
if_like TINYINT COMMENT '是否點(diǎn)贊',
if_retweet TINYINT COMMENT '是否轉(zhuǎn)發(fā)',
comment_id INT COMMENT '評(píng)論ID'
) CHARACTER SET utf8 COLLATE utf8_bin;
CREATE TABLE tb_video_info (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
video_id INT UNIQUE NOT NULL COMMENT '視頻ID',
author INT NOT NULL COMMENT '創(chuàng)作者ID',
tag VARCHAR(16) NOT NULL COMMENT '類別標(biāo)簽',
duration INT NOT NULL COMMENT '視頻時(shí)長(zhǎng)(秒數(shù))',
release_time datetime NOT NULL COMMENT '發(fā)布時(shí)間'
)CHARACTER SET utf8 COLLATE utf8_bin;
INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
(101, 2001, '2021-09-24 10:00:00', '2021-09-24 10:00:20', 1, 1, 0, null)
,(105, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 0, 0, 1, null)
,(102, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 1, 1, 1, null)
,(101, 2002, '2021-09-26 11:00:00', '2021-09-26 11:00:30', 1, 0, 1, null)
,(101, 2002, '2021-09-27 11:00:00', '2021-09-27 11:00:30', 1, 1, 0, null)
,(102, 2002, '2021-09-28 11:00:00', '2021-09-28 11:00:30', 1, 0, 1, null)
,(103, 2002, '2021-09-29 11:00:00', '2021-09-29 11:00:30', 1, 0, 1, null)
,(102, 2002, '2021-09-30 11:00:00', '2021-09-30 11:00:30', 1, 1, 1, null)
,(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:20', 1, 1, 0, null)
,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:15', 0, 0, 1, null)
,(103, 2001, '2021-10-01 11:00:50', '2021-10-01 11:01:15', 1, 1, 0, 1732526)
,(106, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 2, 0, 1, null)
,(107, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 0, 1, null)
,(108, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 1, 1, null)
,(109, 2002, '2021-10-03 10:59:05', '2021-10-03 11:00:05', 0, 1, 0, null);
INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
(2001, 901, '旅游', 30, '2020-01-01 7:00:00')
,(2002, 901, '旅游', 60, '2021-01-01 7:00:00')
,(2003, 902, '影視', 90, '2020-01-01 7:00:00')
,(2004, 902, '美女', 90, '2020-01-01 8:00:00');
2.查詢
SELECT * FROM tb_user_video_log;
SELECT * FROM tb_video_info;
3.問題
統(tǒng)計(jì)2021年國(guó)慶頭3天每類視頻每天的近一周總點(diǎn)贊量和一周內(nèi)最大單天轉(zhuǎn)發(fā)量,
結(jié)果按視頻類別降序葫盼、日期升序排序残腌。假設(shè)數(shù)據(jù)庫(kù)中數(shù)據(jù)足夠多,至少每個(gè)類別下國(guó)慶頭3天及之前一周的每天都有播放記錄贫导。
難點(diǎn):
- 近一周SQL怎么實(shí)現(xiàn)抛猫?
- 最大單天轉(zhuǎn)發(fā)量怎么求?if_retweet = 1的多條記錄求和
解析:
- 求每類視頻每天的點(diǎn)贊量和每天的轉(zhuǎn)發(fā)量孩灯,時(shí)間是2021-9-25 到 2021-10-3
- 使用窗口函數(shù)求 每個(gè)dt日期之前6天(題目中所要求的一周內(nèi))的 日點(diǎn)贊量 的和
- 以及單天轉(zhuǎn)發(fā)量的最大值
4. 求解
- 先求解求每類視頻每天的點(diǎn)贊量和每天的轉(zhuǎn)發(fā)量闺金,時(shí)間是2021-9-25 到 2021-10-3:
SELECT
y.tag,
DATE(x.start_time) dt,
SUM(x.if_like) AS daily_like_cnt,
SUM(x.if_retweet) AS daily_retweet_cnt
FROM
tb_user_video_log x,
tb_video_info y
WHERE
x.video_id = y.video_id
AND DATE(start_time) BETWEEN "2021-9-25" AND "2021-10-3"
GROUP BY tag, dt
ORDER BY tag, dt
- 求每個(gè)日期每個(gè)日期近一周的點(diǎn)贊量和單天轉(zhuǎn)發(fā)量。
WITH t AS (
SELECT
y.tag,
DATE(x.start_time) dt,
SUM(x.if_like) AS daily_like_cnt,
SUM(x.if_retweet) AS daily_retweet_cnt
FROM
tb_user_video_log x,
tb_video_info y
WHERE
x.video_id = y.video_id
AND DATE(start_time) BETWEEN "2021-9-25" AND "2021-10-3"
GROUP BY tag, dt
ORDER BY tag, dt
) SELECT
tag,
dt,
daily_like_cnt,
daily_retweet_cnt,
SUM(daily_like_cnt) over(partition by tag ORDER BY dt rows between 6 preceding AND current row) AS sum_like_cnt_7d,
max(daily_retweet_cnt) over(partition by tag order by dt rows between 6 preceding and current row) AS max_retweet_cnt_7d
FROM
t
注意:
SUM(daily_like_cnt) over(partition by tag ORDER BY dt rows between 6 preceding AND current row)
-- 根據(jù)tag分組峰档,dt升序排列败匹,按照行求出當(dāng)前日期對(duì)應(yīng)的daily_like_cnt之前6行的值到當(dāng)前行的值的和,結(jié)果剛好是一周內(nèi)的總點(diǎn)贊量讥巡。
注:窗口函數(shù)的使用
- 方法一
聚集函數(shù)/非聚集函數(shù) OVER window_name
WINDOW window_name AS (window_spec)
按照第一種方法上面的代碼為:
WITH t AS (
SELECT
y.tag,
DATE(x.start_time) dt,
SUM(x.if_like) AS daily_like_cnt,
SUM(x.if_retweet) AS daily_retweet_cnt
FROM
tb_user_video_log x,
tb_video_info y
WHERE
x.video_id = y.video_id
AND DATE(start_time) BETWEEN "2021-9-25" AND "2021-10-3"
GROUP BY tag, dt
ORDER BY tag, dt
) SELECT
tag,
dt,
daily_like_cnt,
daily_retweet_cnt,
SUM(daily_like_cnt) over(partition by tag ORDER BY dt rows between 6 preceding AND current row) AS sum_like_cnt_7d,
max(daily_retweet_cnt) over(partition by tag order by dt rows between 6 preceding and current row) AS max_retweet_cnt_7d
FROM
t
- 方法二
聚集函數(shù)/非聚集函數(shù) OVER(window_spec)
window_spec : [window_name] [partition_clause] [order_clause] [frame_clause]
按照第二種方法上面的代碼改寫為:
WITH t AS (
SELECT
y.tag,
DATE(x.start_time) dt,
SUM(x.if_like) AS daily_like_cnt,
SUM(x.if_retweet) AS daily_retweet_cnt
FROM
tb_user_video_log x,
tb_video_info y
WHERE
x.video_id = y.video_id
AND DATE(start_time) BETWEEN "2021-9-25" AND "2021-10-3"
GROUP BY tag, dt
ORDER BY tag, dt
) SELECT
tag,
dt,
daily_like_cnt,
daily_retweet_cnt,
SUM(daily_like_cnt) over w AS sum_like_cnt_7d,
max(daily_retweet_cnt) over w AS max_retweet_cnt_7d
FROM t
WINDOW w AS (partition by tag ORDER BY dt rows between 6 preceding AND current row);
我們對(duì)著兩種方式進(jìn)行對(duì)比發(fā)現(xiàn):
over(windos_spec), 在 select 后使用多個(gè)窗口函數(shù)時(shí)掀亩, windos_spec 過多,我們使用第二種方法相當(dāng)于把windos_spec重復(fù)的代碼只寫了 一次尚卫,進(jìn)而減少重復(fù)归榕。
- 以上我們已經(jīng)得到了所有日期的近一周的點(diǎn)贊量和轉(zhuǎn)發(fā)量,下面只需要 多一條where語(yǔ)句求出指定日期的即可:
SELECT tag, dt, sum_like_cnt_7d, max_retweet_cnt_7d
FROM (
上面的with代碼
) tt
WHERE dt BETWEEN '2021-10-01' AND '2021-10-03'
order by tag desc, dt asc
全部代碼如下:
SELECT tag, dt, sum_like_cnt_7d, max_retweet_cnt_7d
FROM (
WITH t AS (
SELECT
y.tag,
DATE(x.start_time) dt,
SUM(x.if_like) AS daily_like_cnt,
SUM(x.if_retweet) AS daily_retweet_cnt
FROM
tb_user_video_log x,
tb_video_info y
WHERE
x.video_id = y.video_id
AND DATE(start_time) BETWEEN "2021-9-25" AND "2021-10-3"
GROUP BY tag, dt
ORDER BY tag, dt
) SELECT
tag,
dt,
daily_like_cnt,
daily_retweet_cnt,
SUM(daily_like_cnt) over(partition by tag ORDER BY dt rows between 6 preceding AND current row) AS sum_like_cnt_7d,
max(daily_retweet_cnt) over(partition by tag order by dt rows between 6 preceding and current row) AS max_retweet_cnt_7d
FROM
t) tt
WHERE dt BETWEEN '2021-10-01' AND '2021-10-03'
order by tag desc, dt asc
結(jié)果圖: