1. posexplode
select date, sum(amount) as total_amount
from (
select date_add(start_date,id) as date, amount from tmp
lateral view posexplode(split(space(datediff(end_date,start_date)),' ')) t as id,t_date
) t
group by date ;
reference: http://www.reibang.com/p/d333e70c21d1
2. space
select split(space(10), ' ') from dual;
3. lag/lead
lag(col, n, default) 前n行數(shù)據(jù)
LAG(column_expression, offset, default_value) OVER (PARTITION BY partition_expression ORDER BY sort_expression)
lead(col, n, default) 后n行數(shù)據(jù)
LEAD(column_expression, offset, default_value) OVER (PARTITION BY partition_expression ORDER BY sort_expression)
https://zhuanlan.zhihu.com/p/397789258?utm_id=0
4. ntile
ntile(n) over(order by col) as bucket_num
n是指定的分箱數(shù)量剧董。如果不能平均分配缭召,則優(yōu)先分配較小編號的箱涵亏,并且各個箱中能放的行數(shù)最多相差1朋魔。
備注:NULL值的處理获黔,可以設(shè)置單獨為一組柳刮,或者默認(rèn)為最小值
5. first_value/last_value
first_value取分區(qū)內(nèi)排序后莺戒,截止到當(dāng)前行毡们,第一個值
last_value分組內(nèi)排序后,截止到當(dāng)前行春宣,最后一個值酵颁,如果要取分組內(nèi)的最后一行,需要使用 rows between current row and unbounded following
select *,
first_value(logday) over(partition by userid order by logday) as co1,
last_value(logday) over(partition by userid order by logday) as co2,
last_value(logday) over(partition by userid order by logday rows between current row and unbounded following) as co3
from test.test_window;
6. rank/dense_rank
rank 會留下空位
dense_rank 不會留下空位
7. percent_rank()
當(dāng)前行-1 / 當(dāng)前組總行數(shù)-1
select dept, userid, sal,
percent_rank() over(order by sal) as pr1
from cookie3;
結(jié)果分析:
d1 user1 1000 0.0 (1-1)/(5-1)=0.0
d1 user2 2000 0.25 (2-1)/(5-1)=0.25
d1 user3 3000 0.5 (3-1)/(5-1)=0.5
d2 user4 4000 0.75 (4-1)/(5-1)=0.75
d2 user5 5000 1.0 (5-1)/(5-1)=1.0
select dept, userid, sal,
percent_rank() over(partition by dept order by sal) as pr2
from cookie3;
結(jié)果分析:
d1 user1 1000 0.0 (1-1)/(3-1)=0.0
d1 user2 2000 0.5 (2-1)/(3-1)=0.5
d1 user3 3000 1.0 (3-1)/(3-1)=1.0
d2 user4 4000 0.0 (1-1)/(2-1)=0.0
d2 user5 5000 1.0 (2-1)/(2-1)=1.0
8. cume_dist
小于(或大于)等于當(dāng)前值的行數(shù)/分組內(nèi)總行數(shù)
select dept, userid, sal,
cume_dist() over(order by sal) as cume1
from cookie3;
結(jié)果:
d1 user1 1000 0.2
d1 user2 2000 0.4
d1 user3 3000 0.6
d2 user4 4000 0.8
d2 user5 5000 1.0
9. hive 窗口函數(shù)中的rows和range的區(qū)分
sum(id) over(order by id range between 1 preceding and 2 following)
窗口大小設(shè)置為該分區(qū)內(nèi)小于本記錄id-1和id+2天的窗口
order by id月帝,所以要看id的值躏惋,并對key(id進(jìn)行range操作),即[id-1, id+2]嚷辅,注意是閉區(qū)間
https://blog.csdn.net/DolphinF/article/details/131695958
窗口函數(shù)使用:https://www.baispace.cn/article/hive-knowledge-window-function.html
10. 時間日期函數(shù)
https://www.cnblogs.com/lubians/p/17480959.html
11. array
https://blog.csdn.net/dupizi/article/details/131779370
12. limit offset
跳過offset行概而,取接下來的limit行溶推。