Hive實戰(zhàn)練習(xí)

背景

日常工作中有許多數(shù)據(jù)處理需求需要解決,在此之間作箍,獲得需求,用hive實現(xiàn)需求前硫,最終完成任務(wù)胞得。

題目

數(shù)據(jù)源在:hive中的adventure_ods庫的ods_sales_orders表

表名 表注釋 字段 字段注釋
ods_sales_orders 訂單明細(xì)表 sales_order_key 訂單主鍵 一個訂單表示銷售一個產(chǎn)品
ods_sales_orders 訂單明細(xì)表 create_date 訂單日期
ods_sales_orders 訂單明細(xì)表 customer_key 客戶編號
ods_sales_orders 訂單明細(xì)表 product_key 產(chǎn)品編號
ods_sales_orders 訂單明細(xì)表 english_product_name 產(chǎn)品名
ods_sales_orders 訂單明細(xì)表 cpzl_zw 產(chǎn)品子類
ods_sales_orders 訂單明細(xì)表 cplb_zw 產(chǎn)品類別
ods_sales_orders 訂單明細(xì)表 unit_price 產(chǎn)品單價

找出前前幾行看看:

1592889021912

題目一:每個用戶截止到每月為止的最大交易金額和累計到該月的總交易金額,結(jié)果數(shù)據(jù)格式如下

方法1屹电,先對日期匯總

先針對日期做窗口匯總阶剑,然后group by+max 干掉每個月里面的不是第一天的那種'小數(shù)據(jù)'.

select 
customer_key, 
substr(create_date, 1, 7) as umonth,
count() over(partition by customer_key, substr(create_date, 1, 7)) as month_count,
count() over(partition by customer_key) as total_count,
last_value(create_date) over(partition by customer_key) as last_id,
max(unit_price) over(partition by customer_key order by create_date) max_until_this_time,
sum(unit_price) over(partition by customer_key order by create_date) cumsum_until_this_time
from ods_sales_orders;
1592896441615

這時候只是多了冗余數(shù)據(jù),去掉即可.怎么去掉呢危号?我們先把這個結(jié)果存起來牧愁。CTAS。

create table temp_table as
(select 
customer_key, 
substr(create_date, 1, 7) as umonth,
count() over(partition by customer_key, substr(create_date, 1, 7)) as month_count,
count() over(partition by customer_key) as total_count,
last_value(create_date) over(partition by customer_key) as last_id,
max(unit_price) over(partition by customer_key order by create_date) max_until_this_time,
sum(unit_price) over(partition by customer_key order by create_date) cumsum_until_this_time
from ods_sales_orders
);
1592897995262

這時候我們就可以去掉冗余了外莲,使用max函數(shù)干掉小的那種猪半。

select customer_key, umonth, month_count,
max(max_until_this_time) as current_max,
max(cumsum_until_this_time) as current_sum
from temp_table
group by customer_key, umonth, month_count;
1592898843979

方法二,先對月份做匯總

select 
customer_key, substr(create_date, 1, 7) as umonth,
count() over(partition by customer_key, substr(create_date, 1, 7)) as ucount,
max(unit_price) over(partition by customer_key, substr(create_date, 1, 7)) as max_this_month,
sum(unit_price) over(partition by customer_key, substr(create_date, 1, 7)) as sum_this_month
from ods_sales_orders;
1592899355077

這時候有重復(fù)偷线,用group by 去重磨确。


select customer_key, umonth, ucount, max_this_month, sum_this_month
from (
select 
customer_key, substr(create_date, 1, 7) as umonth,
count() over(partition by customer_key, substr(create_date, 1, 7)) as ucount,
max(unit_price) over(partition by customer_key, substr(create_date, 1, 7)) as max_this_month,
sum(unit_price) over(partition by customer_key, substr(create_date, 1, 7)) as sum_this_month
from ods_sales_orders)mubiao
group by customer_key, umonth, ucount, max_this_month, sum_this_month;
-- 保存為臨時表
create table temp_tt as (
select customer_key, umonth, ucount, max_this_month, sum_this_month
from (
select 
customer_key, substr(create_date, 1, 7) as umonth,
count() over(partition by customer_key, substr(create_date, 1, 7)) as ucount,
max(unit_price) over(partition by customer_key, substr(create_date, 1, 7)) as max_this_month,
sum(unit_price) over(partition by customer_key, substr(create_date, 1, 7)) as sum_this_month
from ods_sales_orders)mubiao
group by customer_key, umonth, ucount, max_this_month, sum_this_month);

1592899570032
1592899901432

這時候我們就可以計算截止當(dāng)前的最大和累計了。

select customer_key, umonth, ucount, 
cast(max(max_this_month) over(partition by customer_key order by umonth) as decimal(10,2)) as current_max,
cast(sum(sum_this_month) over(partition by customer_key order by umonth) as decimal(10,2)) as current_sum
from temp_tt;

1592900568935
customer_key umonth(當(dāng)月) ucount(月訂單量) current_max(最大交易金額) current_sum(累計該月總交易金額)
11009 2018-12 1 53.99 53.99
1358999 2019-2 1 28.99 28.99
1358999 2019-4 1 69.99 98.98
1359000 2019-1 1 2294.99 2294.99
1359002 2019-11 1 8.99 8.99
1359003 2020-1 1 1120.49 1120.49
1359005 2019-2 1 782.99 782.99
1359009 2019-1 1 2384.07 2384.07
1359014 2019-1 1 69.99 69.99
1359014 2019-2 1 69.99 94.98

思路:

  • 1.對數(shù)據(jù)按照客戶及其年-月分組

  • 2.分組后就每月銷售金額之和

  • 3.使用窗口函數(shù)淋昭,對每個客戶不同月份分組求最大值(max)和累計值(sum)

題目二:計算用戶的回購率和復(fù)購率

復(fù)購率

復(fù)購率: 當(dāng)前月份購買2次及以上的客戶占所有客戶比例,以2月為例進(jìn)行計算

  1. 先增加購買一個本月購買次數(shù)的列
   
select customer_key, count(customer_key) as count_this_month from 
   
   ods_sales_orders
   
   where year(create_date)=2019 and month(create_date)=2
   
   group by customer_key;
1592902362138

下面就好算了俐填。

select concat("", cast(sum(if(count_this_month>1,1,0))*100/count(*) as decimal(14,4)), "%")
from (
select customer_key, count(customer_key) as count_this_month from 
   
   ods_sales_orders
   
   where year(create_date)=2019 and month(create_date)=2
   
   group by customer_key
) mubiao ;
1592903111227

回購率

回購率:當(dāng)前月份購買且上個月份也購買的客戶占當(dāng)月所有月份客戶比例

1.先得到當(dāng)前月份有哪些用戶

select customer_key,month(create_date) as month_2_count

from ods_sales_orders

where month(create_date)=2

group by customer_key, month(create_date);  --  2月份的用戶

select customer_key,month(create_date) as month_1_count

from ods_sales_orders

where month(create_date)=1

group by customer_key, month(create_date);  --  1月份的用戶

上面可以分別得到2,1月份每個customer的購買情況翔忽,left_join 篩選非NULL行就可以得到上個月是否買了東西英融。

select sum(if (month_1_count is not null, 1, 0)) / count(*)

from (select customer_key,month(create_date) as month_2_count

from ods_sales_orders

where month(create_date)=2

group by customer_key, month(create_date)) a

left join (select customer_key,month(create_date) as month_1_count

from ods_sales_orders

where month(create_date)=1

group by customer_key, month(create_date)) b on a.customer_key=b.customer_key;
1592923229512

思路

復(fù)購率

  • 1、對當(dāng)月(2月份)的客戶分組歇式,計數(shù)購買次數(shù)
  • 2驶悟、篩選購買次數(shù)為2以上的,認(rèn)為是復(fù)購群體

回購率

  • 1材失、篩選當(dāng)月及上月部分
  • 2痕鳍、利用客戶id進(jìn)行當(dāng)月連上月,推薦左連
  • 3、對同一條客戶id均有購買記錄的笼呆,認(rèn)為是回購群體

題目三:求用戶號對應(yīng)不同的產(chǎn)品

用戶號 產(chǎn)品 購買時間
1 A 2019-12-23
1 B 2019-12-23
2 C 2019-12-23
2 A 2019-12-24
2 B 2019-12-23

解答

先找?guī)讞l看看數(shù)據(jù)的樣子熊响。

select customer_key, create_date, product_key from ods_sales_orders limit 100;
1592953179472

其實就是怕連續(xù)買了兩個產(chǎn)品1,所以需要去重诗赌,對于同一個產(chǎn)品只保留最前面的那一次購買汗茄,分組然后min時間即可。

select customer_key, product_key, min(create_date)  as create_date

from ods_sales_orders

group by customer_key, product_key;
1592953922587
with temp as (

select customer_key, product_key, min(create_date)  as create_date

from ods_sales_orders

group by customer_key, product_key

) 

select 

*,

row_number() over(partition by customer_key order by create_date) as product_index_by_time

from temp;
1592954161731

這時候就可以篩選了铭若,選出其中序號<3的那些訂單們洪碳。

with temp as (

select customer_key, product_key, min(create_date)  as create_date

from ods_sales_orders

group by customer_key, product_key

) , -- end of temp table1

temp_with_index as(

select 

*,

row_number() over(partition by customer_key order by create_date) as product_index_by_time

from temp) --end of temp table2

select concat(customer_key, '-', concat_ws('-', collect_set(product_key)))

from temp_with_index 

where product_index_by_time < 3

group by customer_key;
1592954954300

當(dāng)然也可以改為算子類:

with temp as (

select customer_key, cpzl_zw, min(create_date)  as create_date

from ods_sales_orders

group by customer_key, cpzl_zw

) , -- end of temp table1

temp_with_index as(

select 

*,

row_number() over(partition by customer_key order by create_date) as product_index_by_time

from temp) --end of temp table2

select concat(customer_key, '-', concat_ws('-', collect_set(cpzl_zw)))

from temp_with_index 

where product_index_by_time < 3

group by customer_key;
1592955065121
1592955316071

要求輸出例子**:用戶號-產(chǎn)品1-產(chǎn)品2

例如:1-A-B (按先后時間順序,同時不限定)

參考:http://www.reibang.com/p/90d0657c0218

思路:

  • 1.利用窗口函數(shù)叼屠,對用戶號分組瞳腌,按時間對產(chǎn)品進(jìn)行排序
  • 2.利用左連或其他方法拼接,篩選排序順序為1镜雨、2的
  • 3.用concat或者其他函數(shù)拼接獲得結(jié)果

題目四:統(tǒng)計各個省份所屬城市下最受歡迎的Top 3產(chǎn)品和其銷量(不能出現(xiàn)有null)

  1. 先看看城市和顧客的從屬關(guān)系

    1592970242408
  2. 關(guān)聯(lián)一下嫂侍,給每個訂單確定一個城市。

select

a.*,

b.chinese_city

from ods_sales_orders a inner join ods_customer  b

on a.customer_key = b.customer_key 
where product_key = 214;
1592970499187
with temp as
(select
product_key,
b.chinese_city,
count(*)
from ods_sales_orders a inner join ods_customer  b
on (a.customer_key = b.customer_key)
and (chinese_city is not NULL) and (product_key is not NULL)
group by chinese_city, product_key
)
select * from temp;
1592970871254

截圖里面有個null冷离,好奇嘛吵冒?原來他用的是字符串null

1592972100699

用正規(guī)的NULL關(guān)鍵字反而搞不出來纯命。所以我們應(yīng)該入鄉(xiāng)隨俗西剥,用!=null 來表示。
1592972153906
-- 匯總每個城市里面的產(chǎn)品降序
with temp as
(select
b.chinese_city,
product_key,
count(*) as pro_count
from ods_sales_orders a inner join ods_customer  b
on (a.customer_key = b.customer_key)
and (chinese_city != 'null') and (product_key != 'null')
group by chinese_city, product_key
)
select *,
row_number() over(partition by chinese_city order by pro_count desc) as rn
from temp;
1592972545377

下面考慮只要top3.

with temp as
(select
b.chinese_city,
product_key,
count(*) as pro_count
from ods_sales_orders a inner join ods_customer  b
on (a.customer_key = b.customer_key)
and (chinese_city != 'null') and (product_key != 'null')
group by chinese_city, product_key
),
temp_rn as(
select *,
row_number() over(partition by chinese_city order by pro_count desc) as rn
from temp)
select * from temp_rn
where rn < 4;
1592972683061

題目五:商品的銷售數(shù)量top10亿汞,排名需考慮并列排名的情況

select product_key,
count(*) as num
from ods_sales_orders
group by product_key;
1592972806796
with temp as(
select product_key,
count(*) as num
from ods_sales_orders
group by product_key
),
temp_with_rn as
(
select 
*,
rank() over(order by num desc) as rn
from temp
)
select * from temp_with_rn
where rn < 11;
1592972999469

題目六:計算累計和(統(tǒng)計2019年1-12月的累積銷量瞭空,即1月為1月份的值,2月為1疗我、2月份值的和咆畏,3月為1、2吴裤、3月份的和旧找,12月為1-12月份值的和)

先統(tǒng)計每個月的和

select substr(create_date, 1, 7) as umonth,
sum(unit_price) as umonth_total
from ods_sales_orders
where year(create_date)=2019
group by substr(create_date, 1, 7);
1592973185048

累加和的計算

with temp as(
select substr(create_date, 1, 7) as umonth,
sum(unit_price) as umonth_total
from ods_sales_orders
where year(create_date)=2019
group by substr(create_date, 1, 7)
) 
select *,
cast(sum(umonth_total) over(order by umonth) as decimal(12, 2)) as cumsum
from temp;
1592973366784

題目七:計算客戶平均購買一次商品的間隔時間

先記錄一下上一次的購買時間

select customer_key,
lag(create_date) over(partition by customer_key order by create_date) as prev_date
from ods_sales_orders;
1592973864868

記錄兩次之間的時間間隔:

with temp as
(select customer_key, create_date,
lag(create_date) over(partition by customer_key order by create_date) as prev_date
from ods_sales_orders)
select customer_key, create_date, prev_date,
datediff(create_date, prev_date) as gap_date
from temp
where prev_date is not NULL;
1592974122319
with temp as
(select customer_key, create_date,
lag(create_date) over(partition by customer_key order by create_date) as prev_date
from ods_sales_orders),
temp_with_diff as
(
select customer_key, create_date, prev_date,
datediff(create_date, prev_date) as gap_date
from temp
where prev_date is not NULL
)
select customer_key, avg(gap_date) as avg_gap
from temp_with_diff
group by customer_key;
1592974262952

題目八:查詢最近前20%時間的訂單信息

提示:使用ntile 函數(shù)記錄條目在哪個分段。

select *,

ntile(5) over(order by create_date) as tile_index

from ods_sales_orders;
with temp as 
(
select *,

ntile(5) over(order by create_date) as tile_index

from ods_sales_orders
) 
select *
from temp
where tile_index = 2; -- 20%-40%
1592976373523
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末麦牺,一起剝皮案震驚了整個濱河市钮蛛,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌剖膳,老刑警劉巖魏颓,帶你破解...
    沈念sama閱讀 218,284評論 6 506
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異吱晒,居然都是意外死亡甸饱,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,115評論 3 395
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來叹话,“玉大人偷遗,你說我怎么就攤上這事⊥蘸” “怎么了鹦肿?”我有些...
    開封第一講書人閱讀 164,614評論 0 354
  • 文/不壞的土叔 我叫張陵,是天一觀的道長辅柴。 經(jīng)常有香客問我箩溃,道長,這世上最難降的妖魔是什么碌嘀? 我笑而不...
    開封第一講書人閱讀 58,671評論 1 293
  • 正文 為了忘掉前任涣旨,我火速辦了婚禮,結(jié)果婚禮上股冗,老公的妹妹穿的比我還像新娘霹陡。我一直安慰自己,他們只是感情好止状,可當(dāng)我...
    茶點故事閱讀 67,699評論 6 392
  • 文/花漫 我一把揭開白布烹棉。 她就那樣靜靜地躺著,像睡著了一般怯疤。 火紅的嫁衣襯著肌膚如雪浆洗。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,562評論 1 305
  • 那天集峦,我揣著相機(jī)與錄音伏社,去河邊找鬼。 笑死塔淤,一個胖子當(dāng)著我的面吹牛摘昌,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播高蜂,決...
    沈念sama閱讀 40,309評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼聪黎,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了备恤?” 一聲冷哼從身側(cè)響起稿饰,我...
    開封第一講書人閱讀 39,223評論 0 276
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎烘跺,沒想到半個月后湘纵,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,668評論 1 314
  • 正文 獨居荒郊野嶺守林人離奇死亡滤淳,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,859評論 3 336
  • 正文 我和宋清朗相戀三年梧喷,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 39,981評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡铺敌,死狀恐怖汇歹,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情偿凭,我是刑警寧澤产弹,帶...
    沈念sama閱讀 35,705評論 5 347
  • 正文 年R本政府宣布,位于F島的核電站弯囊,受9級特大地震影響痰哨,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜匾嘱,卻給世界環(huán)境...
    茶點故事閱讀 41,310評論 3 330
  • 文/蒙蒙 一斤斧、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧霎烙,春花似錦撬讽、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,904評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至尝蠕,卻和暖如春烘豌,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背趟佃。 一陣腳步聲響...
    開封第一講書人閱讀 33,023評論 1 270
  • 我被黑心中介騙來泰國打工扇谣, 沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留昧捷,地道東北人闲昭。 一個月前我還...
    沈念sama閱讀 48,146評論 3 370
  • 正文 我出身青樓,卻偏偏與公主長得像靡挥,于是被迫代替她去往敵國和親序矩。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 44,933評論 2 355

推薦閱讀更多精彩內(nèi)容

  • 一跋破、項目概況 1. 成果預(yù)覽 分析概述:本文是對Adventure項目的總結(jié)簸淀,記錄項目分析與實現(xiàn)過程,主要任務(wù)是對...
    kh辰辰辰閱讀 219評論 0 0
  • 本文是對Adventure Works Cycles案例的一個總結(jié)毒返,通過現(xiàn)有數(shù)據(jù)監(jiān)控商品的線上銷售情況租幕,獲取最新的...
    會飛的骰子閱讀 704評論 1 0
  • 1、業(yè)務(wù)背景 Adventure Works Cycle是國內(nèi)一家制造公司拧簸,該公司生產(chǎn)和銷售金屬和復(fù)合材料自行車在...
    努力修煉的小小菜鳥閱讀 421評論 0 0
  • 本文是對Adventure項目案例的分析總結(jié)劲绪,主要使用jupyter進(jìn)行數(shù)據(jù)處理,將處理好的數(shù)據(jù)存儲到數(shù)據(jù)庫中,連...
    我就是那個無敵大長腿閱讀 708評論 0 0
  • 一贾富、明確分析要求 1歉眷、分析報告的背景: 2019年12月業(yè)務(wù)組組長需要向領(lǐng)導(dǎo)匯報2019年11月自行車銷售情況,為...
    人間桑閱讀 320評論 0 0