Hive分析窗口函數(shù)(四) LAG,LEAD,FIRST_VALUE,LAST_VALUE
注意: 這幾個(gè)函數(shù)不支持WINDOW子句
- 準(zhǔn)備數(shù)據(jù)
cookie1,2018-04-10 10:00:02,url2
cookie1,2018-04-10 10:00:00,url1
cookie1,2018-04-10 10:03:04,1url3
cookie1,2018-04-10 10:50:05,url6
cookie1,2018-04-10 11:00:00,url7
cookie1,2018-04-10 10:10:00,url4
cookie1,2018-04-10 10:50:01,url5
cookie2,2018-04-10 10:00:02,url22
cookie2,2018-04-10 10:00:00,url11
cookie2,2018-04-10 10:03:04,1url33
cookie2,2018-04-10 10:50:05,url66
cookie2,2018-04-10 11:00:00,url77
cookie2,2018-04-10 10:10:00,url44
cookie2,2018-04-10 10:50:01,url55
CREATE TABLE itzkx_t4 (
cookieid string,
createtime string, --頁面訪問時(shí)間
url STRING --被訪問頁面
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile;
加載數(shù)據(jù):
load data local inpath '/root/hivedata/itzkx_t4.dat' into table itzkx_t4;
- LAG
LAG(col,n,DEFAULT) 用于統(tǒng)計(jì)窗口內(nèi)往上第n行值
第一個(gè)參數(shù)為列名之景,第二個(gè)參數(shù)為往上第n行(可選斤富,默認(rèn)為1),第三個(gè)參數(shù)為默認(rèn)值(當(dāng)往上第n行為NULL時(shí)候锻狗,取默認(rèn)值满力,如不指定,則為NULL)
SELECT cookieid,
createtime,
url,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,
LAG(createtime,1,'1970-01-01 00:00:00') OVER(PARTITION BY cookieid ORDER BY createtime) AS last_1_time,
LAG(createtime,2) OVER(PARTITION BY cookieid ORDER BY createtime) AS last_2_time
FROM itzkx_t4;
last_1_time: 指定了往上第1行的值轻纪,default為'1970-01-01 00:00:00'
cookie1第一行油额,往上1行為NULL,因此取默認(rèn)值 1970-01-01 00:00:00
cookie1第三行,往上1行值為第二行值桐磁,2015-04-10 10:00:02
cookie1第六行悔耘,往上1行值為第五行值,2015-04-10 10:50:01
last_2_time: 指定了往上第2行的值我擂,為指定默認(rèn)值
cookie1第一行衬以,往上2行為NULL
cookie1第二行,往上2行為NULL
cookie1第四行校摩,往上2行為第二行值看峻,2015-04-10 10:00:02
cookie1第七行,往上2行為第五行值衙吩,2015-04-10 10:50:01
- LEAD
與LAG相反
LEAD(col,n,DEFAULT) 用于統(tǒng)計(jì)窗口內(nèi)往下第n行值
第一個(gè)參數(shù)為列名互妓,第二個(gè)參數(shù)為往下第n行(可選,默認(rèn)為1)坤塞,第三個(gè)參數(shù)為默認(rèn)值(當(dāng)往下第n行為NULL時(shí)候冯勉,取默認(rèn)值,如不指定摹芙,則為NULL)
SELECT cookieid,
createtime,
url,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,
LEAD(createtime,1,'1970-01-01 00:00:00') OVER(PARTITION BY cookieid ORDER BY createtime) AS next_1_time,
LEAD(createtime,2) OVER(PARTITION BY cookieid ORDER BY createtime) AS next_2_time
FROM itzkx_t4;
- FIRST_VALUE
取分組內(nèi)排序后灼狰,截止到當(dāng)前行,第一個(gè)值
SELECT cookieid,
createtime,
url,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,
FIRST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS first1
FROM itzkx_t4;
- LAST_VALUE
取分組內(nèi)排序后浮禾,截止到當(dāng)前行交胚,最后一個(gè)值
SELECT cookieid,
createtime,
url,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,
LAST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS last1
FROM itzkx_t4;
如果想要取分組內(nèi)排序后最后一個(gè)值,則需要變通一下:
SELECT cookieid,
createtime,
url,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,
LAST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS last1,
FIRST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime DESC) AS last2
FROM itzkx_t4
ORDER BY cookieid,createtime;
特別注意order by
如果不指定ORDER BY盈电,則進(jìn)行排序混亂蝴簇,會(huì)出現(xiàn)錯(cuò)誤的結(jié)果
SELECT cookieid,
createtime,
url,
FIRST_VALUE(url) OVER(PARTITION BY cookieid) AS first2
FROM itzkx_t4;