Hive分析函數(shù)的使用
測試數(shù)據(jù)集:
SQL語句## COUNT、SUM管钳、MIN钦铁、MAX、AVG
## COUNT才漆、SUM牛曹、MIN、MAX醇滥、AVG
select user_id
? ? ? ,user_type
? ? ? ,sales
? ? ? ,sum(sales) OVER(partition by user_type order by sales asc) as sales_1 --默認(rèn)為從起點(diǎn)到當(dāng)前行
? ? ? ,sum(sales) OVER(partition by user_type order by sales asc ROWS between UNBOUNDED PRECEDING and current row) as sales_2 --從起點(diǎn)到當(dāng)前行黎比,結(jié)果與sales_1不同。
? ? ? ,sum(sales) OVER(partition by user_type order by sales asc ROWS between 3 PRECEDING and current row) as sales_3? --當(dāng)前行+往前3行
? ? ? ,sum(sales) OVER(partition by user_type order by sales asc ROWS between 3 PRECEDING and 1 FOLLOWING) as sales_4 --當(dāng)前行+往前3行+往后1行
? ? ? ,sum(sales) OVER(partition by user_type order by sales asc ROWS between current row and UNBOUNDED FOLLOWING) as sales_5 --當(dāng)前行+往后所有行?
? ? ? ,sum(sales) OVER(partition by user_type) as sales_6 --分組內(nèi)所有行
? from order_detail
order by user_type
? ? ? ? ,sales
? ? ? ? ,user_id;
注意:
結(jié)果和ORDER BY相關(guān),默認(rèn)為升序
如果不指定ROWS BETWEEN,默認(rèn)為從起點(diǎn)到當(dāng)前行;
如果不指定ORDER BY鸳玩,則將分組內(nèi)所有值累加;
關(guān)鍵是理解ROWS BETWEEN含義,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:當(dāng)前行
UNBOUNDED:無界限(起點(diǎn)或終點(diǎn))
UNBOUNDED PRECEDING:表示從前面的起點(diǎn)
UNBOUNDED FOLLOWING:表示到后面的終點(diǎn)
其他COUNT阅虫、AVG,MIN不跟,MAX颓帝,和SUM用法一樣。