Hive分析函數(shù)
image.png
Example:
Ntile(分片)
使用場(chǎng)景:計(jì)算百分之幾的用戶的結(jié)果
- 給了用戶和每個(gè)用戶對(duì)應(yīng)的消費(fèi)信息表既琴, 計(jì)算花費(fèi)前50%的用戶的平均消費(fèi)含思;
-- 把用戶和消費(fèi)表,按消費(fèi)下降順序平均分成2份
drop table if exists test_by_payment_ntile;
create table test_by_payment_ntile as
select
nick,
payment ,
NTILE(2) OVER(ORDER BY payment desc) AS rn
from test_nick_payment;
-- 分別對(duì)每一份計(jì)算平均值贡必,就可以得到消費(fèi)靠前50%和后50%的平均消費(fèi)
select
'avg_payment' as inf,
t1.avg_payment_up_50 as avg_payment_up_50,
t2.avg_payment_down_50 as avg_payment_down_50
from
(select
avg(payment) as avg_payment_up_50
from test_by_payment_ntile
where rn=1
)t1
join
(select
avg(payment) as avg_payment_down_50
from test_by_payment_ntile
where rn=2
)t2
on (t1.dp_id=t2.dp_id);
Rank,Dense_Rank,Row_Number
使用場(chǎng)景:Top N
- Rank : 相同的排名會(huì)留下空缺,1,2栈源,2,4
- Dense_Rank: 相同的排名不會(huì)留下空缺竖般,1甚垦,2,2涣雕,3
- Row_Number:不會(huì)重復(fù)
Lag,Lead
使用場(chǎng)景:計(jì)算用戶頁(yè)面的停留時(shí)間
統(tǒng)計(jì)窗口內(nèi)往上(往下)第n行值,當(dāng)前行不算
-- 組內(nèi)排序后艰亮,向后或向前偏移
-- 如果省略掉第三個(gè)參數(shù),默認(rèn)為NULL挣郭,否則補(bǔ)上迄埃。
select
dp_id,
mt,
payment,
LAG(mt,2) over(partition by dp_id order by mt) mt_new
from test2;
image.png
-- 組內(nèi)排序后,向后或向前偏移
-- 如果省略掉第三個(gè)參數(shù)兑障,默認(rèn)為NULL侄非,否則補(bǔ)上。
select
dp_id,
mt,
payment,
LEAD(mt,2,'1111-11') over(partition by dp_id order by mt) mt_new
from test2;
image.png
FIRST_VALUE, LAST_VALUE
使用場(chǎng)景:計(jì)算每個(gè)部門(mén)的最高工資與最低工資
-- FIRST_VALUE 獲得組內(nèi)當(dāng)前行往前的首個(gè)值
-- LAST_VALUE 獲得組內(nèi)當(dāng)前行往前的最后一個(gè)值
-- FIRST_VALUE(DESC) 獲得組內(nèi)全局的最后一個(gè)值
select
dp_id,
mt,
payment,
FIRST_VALUE(payment) over(partition by dp_id order by mt) payment_g_first,
LAST_VALUE(payment) over(partition by dp_id order by mt) payment_g_last,
FIRST_VALUE(payment) over(partition by dp_id order by mt desc) payment_g_last_global
from test2
ORDER BY dp_id,mt;
image.png
多維度
使用場(chǎng)景:計(jì)算每一類(lèi)圈子的觀看量流译,和每一類(lèi)圈子下每一個(gè)標(biāo)簽視頻的觀看量
-- grouping sets
select
order_id,
departure_date,
count(*) as cnt
from ord_test
where order_id=410341346
group by order_id,
departure_date
grouping sets (order_id,(order_id,departure_date))
;
---- 等價(jià)于以下
group by order_id
union all
group by order_id,departure_date
-- cube
select
order_id,
departure_date,
count(*) as cnt
from ord_test
where order_id=410341346
group by order_id,
departure_date
with cube
;
---- 等價(jià)于以下
select count(*) as cnt from ord_test where order_id=410341346
union all
group by order_id
union all
group by departure_date
union all
group by order_id,departure_date
-- rollup
select
order_id,
departure_date,
count(*) as cnt
from ord_test
where order_id=410341346
group by order_id,
departure_date
with rollup
;
---- 等價(jià)于以下
select count(*) as cnt from ord_test where order_id=410341346
union all
group by order_id
union all
group by order_id,departure_date
計(jì)算比當(dāng)前小的百分比
使用場(chǎng)景:計(jì)算當(dāng)前組內(nèi)比你小的人數(shù)比例
cume_list: 小于等于當(dāng)前值的行數(shù)/分組內(nèi)總行數(shù)
percent_rank:分組內(nèi)當(dāng)前行的RANK值-1/分組內(nèi)總行數(shù)-1
根窗口聚合函數(shù)使用相同