前言:
以sql為基礎(chǔ)拥娄,利用題目進行hive的語句練習(xí)泻轰,逐步體會sql與hive的不同之處疙渣。
本次題目用到row_number()撇吞、collect_set()俗冻、concat_ws()。
其中collect_set()牍颈、concat_ws講解可參考
http://www.reibang.com/p/6ec508d1591a
文末講解row_number迄薄。
題目:
存在一份數(shù)據(jù)
userid,changjing,inttime
1,1001,1400
2,1002,1401
1,1002,1402
1,1001,1402
2,1003,1403
2,1004,1404
3,1003,1400
4,1004,1402
4,1003,1403
4,1001,1403
4,1002,1404
5,1001,1402
5,1001,1403
image.png
要求輸出,用戶號對應(yīng)前兩個不同場景
1-1001-1002
2-1002-1003
3-1003
4-1004-1003
5-1001
解題步驟:
一煮岁、創(chuàng)建表mianshi1
hive> create table mianshi1(userid string,changjing string,inttime int)
row format delimited fields terminated by ",";
二讥蔽、導(dǎo)入數(shù)據(jù)
hive> load data local inpath"/home/jiafeng/xxx.csv" into table mianshi1;
三、驗證數(shù)據(jù)是否準(zhǔn)確
hive> select * from mianshi1;
image.png
四画机、按inttime時間進行排序
hive> create table tmp_mianshi1 as
> select *,row_number() over(partition by userid order by inttime) as num
from mianshi1 order by userid,inttime;
image.png
五冶伞、篩選出前兩個不同的場景
hive> create table tmp1_mianshi1 as
> select userid,collect_set(changjing) as changjing2 from
tmp_mianshi1 where num<=2 group by userid;
image.png
六、獲得答案結(jié)果
方法一:
hive> select concat(userid,"-",concat_ws("-",changjing2)) as result
from tmp1_mianshi1;
image.png
方法二:
hive> select
(case when changjing2[1] is NULL then concat(userid,"-",changjing2[0])
else concat(userid,"-",changjing2[0],"-",changjing2[1]) end) as result
from tmp1_mianshi1;
注:方法二適用場景更多步氏,concat_ws函數(shù)只適用string類型响禽,做題過程中表類型設(shè)置為int,最終獲得了array<int>荚醒,無法利用類型轉(zhuǎn)換為array<string>(會報錯)芋类。
講解
row_number()
- 遇相同數(shù)據(jù)依舊會排序,
例如:小明80分界阁,小紅80分侯繁,小張75分,排序為:小明 1 小紅 2 小張3 - 通常結(jié)合partition by xxx order by xxx (asc/desc)使用
亦或者distribute by xxx sort by xxx(asc/desc)
partition by 即分組泡躯,order by 為排序 - 實際案例:
select * from tmp_test;
c1 c2
----- ------
1 str1
2 str2
3 str3
3 str31
3 str33
4 str41
4 str42
select t.*,row_number() over(distribute by c1 sort by c2 desc) rn
from tmp_test t;
c1 c2 rn
_________
1 str1 1
2 str2 1
3 str33 1
3 str31 2
3 str3 3
4 str42 1
4 str41 2