Hive系列文章
- Hive表的基本操作
- Hive中的集合數(shù)據(jù)類型
- Hive動(dòng)態(tài)分區(qū)詳解
- hive中orc格式表的數(shù)據(jù)導(dǎo)入
- Java通過jdbc連接hive
- 通過HiveServer2訪問Hive
- SpringBoot連接Hive實(shí)現(xiàn)自助取數(shù)
- hive關(guān)聯(lián)hbase表
- Hive udf 使用方法
- Hive基于UDF進(jìn)行文本分詞
- Hive窗口函數(shù)row number的用法
- 數(shù)據(jù)倉庫之拉鏈表
同比環(huán)比的計(jì)算
測(cè)試數(shù)據(jù)
1,2020-04-20,420
2,2020-04-04,800
3,2020-03-28,500
4,2020-03-13,100
5,2020-02-27,300
6,2020-01-07,450
7,2019-04-07,800
8,2019-03-15,1200
9,2019-02-17,200
10,2019-02-07,600
11,2019-01-13,300
CREATE TABLE ods_saleorder (
order_id int ,
order_time date ,
order_num int
)ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
;
LOAD DATA LOCAL INPATH '/Users/liuwenqiang/workspace/hive/saleorder.txt' OVERWRITE INTO TABLE ods.ods_saleorder;
銷售量的月年占比
關(guān)聯(lián)實(shí)現(xiàn)
select
a.m_num,a.cmonth,b.y_num,b.cyear,round( m_num / y_num, 2 ) AS ratio
from(
select
sum(order_num) as m_num,
DATE_FORMAT(order_time,'yyyy-MM') as cmonth
from
ods_saleorder
group by
DATE_FORMAT(order_time,'yyyy-MM')
) a
inner join
(
select
sum(order_num) as y_num,
DATE_FORMAT(order_time,'yyyy') as cyear
from
ods_saleorder
group by
DATE_FORMAT(order_time,'yyyy')
) b
on
substring(a.cmonth,1,4)=b.cyear
;
窗口實(shí)現(xiàn)
SELECT
order_month,
num,
total,
round( num / total, 2 ) AS ratio
FROM
(
select
substr(order_time, 1, 7) as order_month,
sum(order_num) over (partition by substr(order_time, 1, 7)) as num,
sum(order_num) over (partition by substr( order_time, 1, 4 ) ) total,
row_number() over (partition by substr(order_time, 1, 7)) as rk
from ods_saleorder
) temp
where rk = 1;
同比環(huán)比
與上年度數(shù)據(jù)對(duì)比稱"同比"宝泵,與上月數(shù)據(jù)對(duì)比稱"環(huán)比"瘦真。
相關(guān)公式如下:
同比增長率計(jì)算公式
(當(dāng)年值-上年值)/上年值x100%
環(huán)比增長率計(jì)算公式
(當(dāng)月值-上月值)/上月值x100%
lead lag 的實(shí)現(xiàn)
這里我們就用環(huán)比做個(gè)例子猾封,同比類似
select
now_month,
now_num,
last_num,
round( (now_num-last_num) / last_num, 2 ) as ratio
FROM(
select
now_month,
now_num,
lag( t1.now_num, 1) over (order by t1.now_month ) as last_num
from
(
select
substr(order_time, 1, 7) as now_month,
sum(order_num) as now_num
from ods_saleorder
group by
substr(order_time, 1, 7)
) t1
) t2;
我們看到有null 值凯旋,這里我們可以使用他嫡,lag的默認(rèn)值做一次優(yōu)化
select
now_month,
now_num,
last_num,
-- 分母是0的話返回值是null
nvl(round( (now_num-last_num) / last_num, 2 ),0)as ratio
FROM(
select
now_month,
now_num,
lag( t1.now_num, 1,0) over (order by t1.now_month ) as last_num
from
(
select
substr(order_time, 1, 7) as now_month,
sum(order_num) as now_num
from ods_saleorder
group by
substr(order_time, 1, 7)
) t1
) t2;
其實(shí)到這里我們就處理完了颖医,但是這樣真的對(duì)嗎枉长,我們看到'2020-01' 的last_num 是800 也就是'2019-04',其實(shí)到這里我們就明白了冀续,我們的數(shù)據(jù)是不連續(xù)的,所以我們這樣計(jì)算是不行的必峰,如果每個(gè)月都齊全洪唐,都有數(shù)據(jù)lag(num,12)就可以。
那就只能做自關(guān)聯(lián)了,這樣的話我們可以對(duì)時(shí)間做精準(zhǔn)的限制
自關(guān)聯(lián)的實(shí)現(xiàn)
with a as (
select
now_month,
now_num,
substr(date(concat(now_month,'-','01')) - INTERVAL '1' month, 1, 7) as last_month
from(
select
substr(order_time, 1, 7) as now_month,
sum(order_num) as now_num
from ods_saleorder
group by
substr(order_time, 1, 7)
) tmp
)
select
a1.now_month,a1.now_num,a1.last_month,a2.now_num,
nvl(round( (a1.now_num-a2.now_num) / a2.now_num, 2 ),0) as ratio
from
a a1
inner join
a a2
on
a1.last_month=a2.now_month
;
這里的時(shí)間計(jì)算INTERVAL 你也可以換成其他函數(shù)
with a as (
select
now_month,
now_num,
substr(add_months(concat(now_month,'-','01'),-1), 1, 7) as last_month
from(
select
substr(order_time, 1, 7) as now_month,
sum(order_num) as now_num
from ods_saleorder
group by
substr(order_time, 1, 7)
) tmp
)
select
a1.now_month,a1.now_num,a1.last_month,nvl(a2.now_num,0),
nvl(round( (a1.now_num-a2.now_num) / a2.now_num, 2 ),0) as ratio
from
a a1
left join
a a2
on
a1.last_month=a2.now_month
;
猜你喜歡
Hadoop3數(shù)據(jù)容錯(cuò)技術(shù)(糾刪碼)
Hadoop 數(shù)據(jù)遷移用法詳解
Flink實(shí)時(shí)計(jì)算topN熱榜
數(shù)倉建模分層理論
數(shù)倉建模方法論