Hive日常數(shù)據(jù)需求

hive 日常數(shù)據(jù)需求(盡可能展示窗口函數(shù)的使用)

題目

題目一:每個(gè)用戶截止到每月為止的最大交易金額和累計(jì)到該月的總交易金額,結(jié)果數(shù)據(jù)格式如下

題目一
#先將每個(gè)人每個(gè)月的消費(fèi)進(jìn)行聚合
SELECT customer_key,DATE_FORMAT(create_date,"yyyy-MM") AS umonth,
COUNT(1) AS ucount,SUM(unit_price) AS sum_price
FROM ods_sales_orders
GROUP BY customer_key,DATE_FORMAT(create_date,"yyyy-MM")
ORDER BY umonth

#隨后通過窗口函數(shù)屁使,求出當(dāng)前月為止最大消費(fèi)以及累計(jì)消費(fèi)
SELECT customer_key,
umonth,
ucount,
MAX(sum_price) over(PARTITION BY customer_key ORDER BY umonth ROWS BETWEEN unbounded preceding AND current ROW) AS current_max,
SUM(sum_price) over(PARTITION BY customer_key ORDER BY umonth ROWS BETWEEN unbounded preceding AND current ROW) AS current_sum
FROM(  #每個(gè)人每月消費(fèi)
SELECT customer_key,DATE_FORMAT(create_date,"yyyy-MM") AS umonth,
COUNT(1) AS ucount,SUM(unit_price) AS sum_price
FROM ods_sales_orders
GROUP BY customer_key,DATE_FORMAT(create_date,"yyyy-MM")
ORDER BY umonth) a
LIMIT 10;

筆記:
1.窗口函數(shù)邊界:

窗口函數(shù)的邊界

2.hive中流济,取年月的兩種方式
取年月

題目二:計(jì)算用戶的回購(gòu)率和復(fù)購(gòu)率

復(fù)購(gòu)率: 當(dāng)前月份購(gòu)買2次及以上的客戶占所有客戶比例
回購(gòu)率:當(dāng)前月份購(gòu)買且上個(gè)月份也購(gòu)買的客戶占當(dāng)月所有月份客戶比例

思路:
復(fù)購(gòu)率
1、對(duì)當(dāng)月(2月份)的客戶分組技即,計(jì)數(shù)購(gòu)買次數(shù)
2著洼、篩選購(gòu)買次數(shù)為2以上的,認(rèn)為是復(fù)購(gòu)群體

SELECT 
umonth,
COUNT(1) AS `總客戶數(shù)`,
SUM(IF(buys >=2,1,0)) AS `復(fù)購(gòu)客戶`,
SUM(IF(buys >=2,1,0))/COUNT(1) AS `復(fù)購(gòu)率`
FROM (   #按月份統(tǒng)計(jì)客戶消費(fèi)次數(shù)情況
SELECT customer_key,
DATE_FORMAT(create_date,"yyyy-MM") umonth,
COUNT(1) buys
FROM ods_sales_orders
GROUP BY customer_key,DATE_FORMAT(create_date,"yyyy-MM")
) a
GROUP BY umonth
LIMIT 10;

回購(gòu)率
1、篩選當(dāng)月及上月部分
2身笤、利用客戶id進(jìn)行當(dāng)月連上月豹悬,推薦左連
3、對(duì)同一條客戶id均有購(gòu)買記錄的液荸,認(rèn)為是回購(gòu)群體

SELECT 
a.umonth,
SUM(IF(a.cons >=2,1,0))/COUNT(1) AS `復(fù)購(gòu)率`,
COUNT(b.customer_key)/COUNT(a.customer_key) AS `回購(gòu)率`
FROM
(SELECT customer_key,
DATE_FORMAT(create_date,'yyyy-MM') umonth,
COUNT(1) cons
FROM ods_sales_orders
GROUP BY customer_key,DATE_FORMAT(create_date,'yyyy-MM')
)a
LEFT JOIN
(SELECT customer_key,
DATE_FORMAT(create_date,'yyyy-MM') umonth,
COUNT(1) cons
FROM ods_sales_orders
GROUP BY customer_key,DATE_FORMAT(create_date,'yyyy-MM')
) b
ON a.customer_key = b.customer_key 
AND CONCAT(a.umonth,'-01') =add_months(CONCAT(b.umonth,'-01'),1)
GROUP BY a.umonth
LIMIT 10;

注意:
1.連接條件中瞻佛,條件1為客戶key字段,表示同一個(gè)客戶娇钱;條件2為兩個(gè)月的信息連接(本月日期 = 上月日期 +1個(gè)月)
2.ADD_MONTHS:別忘了有個(gè)'s'伤柄,add_months只能對(duì)完整日期格式進(jìn)行月份加減(add_months(‘yyyy-MM-dd hh:mm:ss’,N)文搂,add_months(‘yyyy-MM-dd’适刀,N)),所以這里先用concat給'年月'增加一個(gè)'日'字段细疚,其中N為正則增加蔗彤,N為負(fù)則減少

回購(gòu)率結(jié)果

題目三:求用戶最先購(gòu)買的兩種子類別產(chǎn)品信息,要求拼接成 “用戶號(hào)-產(chǎn)品1-產(chǎn)品2” 的形式

第一步:根據(jù)用戶分組疯兼,求出用戶購(gòu)買的產(chǎn)品及順序
cpzl_zw1是cpzl_zw的上一個(gè)購(gòu)買的產(chǎn)品

SELECT customer_key,cpzl_zw,
row_number() over(PARTITION BY customer_key  ORDER BY create_date ASC) AS order_num,
lag(cpzl_zw,1,0) over(PARTITION BY customer_key ORDER BY create_date ASC) AS cpzl_zw1
FROM ods_sales_orders
  • lag() over():
    lag(參數(shù)1然遏,參數(shù)2,參數(shù)3)
    參數(shù)1:表中列名 參數(shù)2:往下偏移多少位 參數(shù)3:超出行數(shù)時(shí)默認(rèn)設(shè)置值
    lag 往往和over結(jié)合使用
    如:lag(參數(shù)1吧彪,參數(shù)2待侵,參數(shù)3)over(order by 列)
    第二步:選擇相鄰兩個(gè)產(chǎn)品不同的記錄
SELECT 
  customer_key, 
  cpzl_zw, 
  order_num, 
  cpzl_zw1
FROM (
  SELECT 
    customer_key, 
    cpzl_zw, 
    row_number() OVER (PARTITION BY customer_key ORDER BY create_date ASC) AS order_num,
    lag(cpzl_zw, 1, 0) OVER (PARTITION BY customer_key ORDER BY create_date ASC) AS cpzl_zw1
  FROM ods_sales_orders
) a
WHERE cpzl_zw != cpzl_zw1

第三步:為相鄰的產(chǎn)品添加排序(即上面lag函數(shù)添加的產(chǎn)品)
若相鄰產(chǎn)品的排序?yàn)?,則表示第三件產(chǎn)品姨裸,題目要求前兩件產(chǎn)品秧倾,則可以根據(jù)小于3進(jìn)行篩選

WITH tmp1 AS  (
SELECT 
  customer_key, 
  cpzl_zw, 
  order_num, 
  cpzl_zw1
FROM (
  SELECT 
    customer_key, 
    cpzl_zw, 
    row_number() OVER (PARTITION BY customer_key ORDER BY create_date ASC) AS order_num,
    lag(cpzl_zw, 1, 0) OVER (PARTITION BY customer_key ORDER BY create_date ASC) AS cpzl_zw1
  FROM ods_sales_orders
) a
WHERE cpzl_zw != cpzl_zw1)
SELECT customer_key,cpzl_zw,order_num,cpzl_zw1,
row_number() over(PARTITION BY customer_key ORDER BY order_num ASC) AS order_num1
FROM tmp1
第二步結(jié)果
  • with as():類似于mysql中的視圖功能暫時(shí)引用版(as 括號(hào)后為暫時(shí)引用的表),必須在結(jié)束后使用select進(jìn)行查詢否則報(bào)錯(cuò)傀缩,且多個(gè)with as用逗號(hào)隔開那先。
    第四步:數(shù)據(jù)拼接
WITH tmp1 AS  (
SELECT 
  customer_key, 
  cpzl_zw, 
  order_num, 
  cpzl_zw1
FROM (
  SELECT 
    customer_key, 
    cpzl_zw, 
    row_number() OVER (PARTITION BY customer_key ORDER BY create_date ASC) AS order_num,
    lag(cpzl_zw, 1, 0) OVER (PARTITION BY customer_key ORDER BY create_date ASC) AS cpzl_zw1
  FROM ods_sales_orders
) a
WHERE cpzl_zw != cpzl_zw1)
,tmp2 AS (
SELECT customer_key,cpzl_zw,order_num,cpzl_zw1,
row_number() over(PARTITION BY customer_key ORDER BY order_num ASC) AS order_num1
FROM tmp1)
SELECT 
CONCAT(customer_key,'-',CONCAT_WS('-',collect_set(cpzl_zw))) AS z1
FROM tmp2
WHERE order_num1 < 3
GROUP BY customer_key;
  • concat_ws:如何指定參數(shù)之間的分隔符
    使用函數(shù)CONCAT_WS()。使用語法為:CONCAT_WS(separator,str1,str2,…)
    CONCAT_WS() 代表 CONCAT With Separator 赡艰,是CONCAT()的特殊形式售淡。第一個(gè)參數(shù)是其它參數(shù)的分隔符。分隔符的位置放在要連接的兩個(gè)字符串之間慷垮。分隔符可以是一個(gè)字符串揖闸,也可以是其它參數(shù)。如果分隔符為 NULL料身,則結(jié)果為 NULL汤纸。函數(shù)會(huì)忽略任何分隔符參數(shù)后的 NULL 值。但是CONCAT_WS()不會(huì)忽略任何空字符串芹血。 (然而會(huì)忽略所有的 NULL)贮泞。
  • collect_set():
    collect_set

    常用于聚合后的字段處理
    個(gè)人認(rèn)為第二步可以省略楞慈,collect_set自帶去重功能,有點(diǎn)多余了
WITH tmp AS (
SELECT customer_key,cpzl_zw,
row_number() over(PARTITION BY customer_key  ORDER BY create_date ASC) AS order_num,
lag(cpzl_zw,1,0) over(PARTITION BY customer_key ORDER BY create_date ASC) AS cpzl_zw1
FROM ods_sales_orders
),tmp2 AS(
SELECT customer_key,cpzl_zw,order_num,cpzl_zw1,
row_number() over(PARTITION BY customer_key ORDER BY order_num ASC) order_num1
FROM tmp)
SELECT 
CONCAT(customer_key,'-',CONCAT_WS('-',collect_set(cpzl_zw))) AS z1
FROM tmp2
WHERE order_num1 < 3
GROUP BY customer_key;

題目四:統(tǒng)計(jì)各個(gè)省份所屬城市下最受歡迎的Top 3產(chǎn)品和其銷量(不能出現(xiàn)有null)

#多次利用with as
#方法一:
SELECT chinese_city,product_key,cons,c.`rank`
FROM (
WITH tmp AS(
SELECT sales_order_key,product_key,chinese_city
FROM ods_sales_orders a
RIGHT JOIN ods_customer b
ON a.customer_key = b.customer_key)
,tmp2 AS (
SELECT chinese_city,product_key,COUNT(sales_order_key) AS cons
FROM tmp
GROUP BY chinese_city,product_key
)
SELECT chinese_city,product_key,cons,
row_number() over(PARTITION BY chinese_city ORDER BY cons DESC) AS `rank`
FROM tmp2
) c
WHERE c.`rank` <= 3
LIMIT 100;

#方法二:
SELECT chinese_city,product_key,cons,`rank`
FROM(
SELECT chinese_city,product_key,cons,
row_number() over(PARTITION BY chinese_city ORDER BY cons DESC) AS `rank`
FROM (
SELECT b.chinese_city,a.product_key,COUNT(sales_order_key) AS cons
FROM ods_sales_orders a
RIGHT JOIN ods_customer b
ON a.customer_key = b.customer_key
GROUP BY b.chinese_city,a.product_key
) c
) d
WHERE `rank` <= 3
AND  d.chinese_city != 'null'
LIMIT 100;

步驟:
第一步:將兩表進(jìn)行連接(customer為主表)
第二步:按城市啃擦、產(chǎn)品進(jìn)行聚合分組抖部,求出銷量
第三步:窗口函數(shù)對(duì)銷量進(jìn)行排序
第四步:where進(jìn)行篩選(where不能直接對(duì)窗口函數(shù)結(jié)果進(jìn)行篩選,所以要再次使用表子連接)

題目五:商品的銷售數(shù)量top10议惰,排名需考慮并列排名的情況

提示:只用到訂單表 ods_sales_orders慎颗,并列排序 dense_rank 窗口函數(shù)

SELECT product_key,cons,`rank`
FROM(
SELECT product_key,cons,dense_rank() over(ORDER BY cons DESC) AS `rank`
FROM (
SELECT product_key,COUNT(1) AS cons
FROM ods_sales_orders
GROUP BY product_key) a)b
WHERE b.`rank` <= 10;

題目六:計(jì)算累計(jì)和(統(tǒng)計(jì)2019年1-12月的累積銷量,即1月為1月份的值言询,2月為1俯萎、2月份值的和,3月為1运杭、2夫啊、3月份的和,12月為1-12月份值的和)

SELECT yearmonth,ROUND(sum_price,2),
ROUND(SUM(sum_price) over(ORDER BY yearmonth ASC),2) AS cumsum
FROM(
SELECT
DATE_FORMAT(create_date,"yyyy-MM") AS yearmonth,SUM(unit_price) AS sum_price
FROM ods_sales_orders 
WHERE YEAR(create_date) = '2019'
GROUP BY DATE_FORMAT(create_date,"yyyy-MM")
) a;
image.png

題目七:計(jì)算客戶平均購(gòu)買一次商品的間隔時(shí)間

SELECT customer_key,AVG(diff_date) AS avg_buy_period
FROM(
SELECT 
customer_key,
create_date,
lead(create_date,1) over(PARTITION BY customer_key ORDER BY create_date ASC) next_date,
DATEDIFF(lead(create_date,1) over(PARTITION BY customer_key ORDER BY create_date ASC),create_date) AS diff_date
FROM ods_sales_orders) a
GROUP BY customer_key
HAVING avg_buy_period IS NOT NULL;
  • lead() over():作用于lag相同辆憔,方向相反撇眯;Lag和Lead分析函數(shù)可以在同一次查詢中取出同一字段的前N行的數(shù)據(jù)(Lag)和后N行的數(shù)據(jù)(Lead)作為獨(dú)立的列。

題目八:查詢最近前20%時(shí)間的訂單信息

SELECT *
FROM(
SELECT *, ntile(5) OVER(ORDER BY create_date DESC) `grouping`
FROM ods_sales_orders ) a
WHERE a.`grouping` = 1
LIMIT 10;
  • ntile(n):用于將數(shù)據(jù)分成n組
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末虱咧,一起剝皮案震驚了整個(gè)濱河市熊榛,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌腕巡,老刑警劉巖玄坦,帶你破解...
    沈念sama閱讀 218,682評(píng)論 6 507
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異绘沉,居然都是意外死亡煎楣,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,277評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門车伞,熙熙樓的掌柜王于貴愁眉苦臉地迎上來择懂,“玉大人,你說我怎么就攤上這事另玖±铮” “怎么了?”我有些...
    開封第一講書人閱讀 165,083評(píng)論 0 355
  • 文/不壞的土叔 我叫張陵日矫,是天一觀的道長(zhǎng)赂弓。 經(jīng)常有香客問我绑榴,道長(zhǎng)哪轿,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,763評(píng)論 1 295
  • 正文 為了忘掉前任翔怎,我火速辦了婚禮窃诉,結(jié)果婚禮上杨耙,老公的妹妹穿的比我還像新娘。我一直安慰自己飘痛,他們只是感情好珊膜,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,785評(píng)論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著宣脉,像睡著了一般车柠。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上塑猖,一...
    開封第一講書人閱讀 51,624評(píng)論 1 305
  • 那天竹祷,我揣著相機(jī)與錄音,去河邊找鬼羊苟。 笑死塑陵,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的蜡励。 我是一名探鬼主播令花,決...
    沈念sama閱讀 40,358評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼,長(zhǎng)吁一口氣:“原來是場(chǎng)噩夢(mèng)啊……” “哼凉倚!你這毒婦竟也來了兼都?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,261評(píng)論 0 276
  • 序言:老撾萬榮一對(duì)情侶失蹤稽寒,失蹤者是張志新(化名)和其女友劉穎俯抖,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體瓦胎,經(jīng)...
    沈念sama閱讀 45,722評(píng)論 1 315
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡芬萍,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,900評(píng)論 3 336
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了搔啊。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片柬祠。...
    茶點(diǎn)故事閱讀 40,030評(píng)論 1 350
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖负芋,靈堂內(nèi)的尸體忽然破棺而出漫蛔,到底是詐尸還是另有隱情,我是刑警寧澤旧蛾,帶...
    沈念sama閱讀 35,737評(píng)論 5 346
  • 正文 年R本政府宣布莽龟,位于F島的核電站,受9級(jí)特大地震影響锨天,放射性物質(zhì)發(fā)生泄漏毯盈。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,360評(píng)論 3 330
  • 文/蒙蒙 一病袄、第九天 我趴在偏房一處隱蔽的房頂上張望搂赋。 院中可真熱鬧赘阀,春花似錦、人聲如沸脑奠。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,941評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽宋欺。三九已至轰豆,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間齿诞,已是汗流浹背秒咨。 一陣腳步聲響...
    開封第一講書人閱讀 33,057評(píng)論 1 270
  • 我被黑心中介騙來泰國(guó)打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留掌挚,地道東北人雨席。 一個(gè)月前我還...
    沈念sama閱讀 48,237評(píng)論 3 371
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像吠式,于是被迫代替她去往敵國(guó)和親陡厘。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,976評(píng)論 2 355

推薦閱讀更多精彩內(nèi)容