hive作業(yè)筆記

#能把列名刪除(就是第一行)
sed '1d' orders.csv 

# 查看有哪些字段(維度)
$ desc orders.csv
輸出:
order_id                string              
user_id                 string              
eval_set                string              
order_number            string              
order_dow               string              
order_hour_of_day       string              
days_since_prior_order  string   

create table orders 
(
order_id string,
user_id string,
dow string
) row format delimited fields terminated by ',' 
lines terminated by '\n';

2.每個(gè)用戶有多少個(gè)訂單[orders] pv瀏覽量 > uv用戶量

select user_id, count(order_id) as order_cnt
from orders
group by user_id
order by order_cnt
limit 100; 

輸出:
202272  100
133983  100
172806  100
69995   100
162455  100
67798   100
82420   100
35190   100
67736   100
117890  100

3.每個(gè)用戶【2.平均【1.每個(gè)訂單是多少商品】】 avg

分析:
$ desc orders 
order_id                string              
user_id                 string              
eval_set                string              
order_number            string              
order_dow               string              
order_hour_of_day       string              
days_since_prior_order  string  

$desc desc order_product_prior;
order_id                string              
product_id              string        訂單里面的商品數(shù)      
add_to_cart             string              
reordered               string    

$ select * from order_product_prior limit 10;
order_id        product_id      add_to_cart_order       reordered
5       18569   19      1
5       41176   20      1
5       48366   21      1
5       47209   22      0
5       46522   23      0
5       38693   24      0
5       48825   25      0
5       8479    26      0

Hive SQL :
select user_id, avg(prod_cnt) as avg_prod_cnt
from (
select orders.user_id, prod.prod_cnt from orders
join (
select order_id, count(product_id) as prod_cnt
from order_product_prior
group by order_id) prod
on orders.order_id=prod.order_id
) t
group by user_id
order by avg_prod_cnt desc
limit 100;

kill 子任務(wù):
① kill job: hadoop job -kill job_1553939296911_0006
②kill application:
yarn application -kill application_1537547243219_0056
  1. 每個(gè)用戶在一周中的購(gòu)買訂單的分布 --列轉(zhuǎn)行
    if 語(yǔ)句
    case when 語(yǔ)句
  1. 每個(gè)用戶平均每個(gè)購(gòu)買天中,購(gòu)買的商品數(shù)量
    days_since_prior_order 這個(gè)字段當(dāng)做:訂單里面的商品數(shù)

select
if(days_since_prior_order='','0',days_since_prior_order)
as days_since_prior_order

select user_id, count(distinct days_since_prior_order) as day_cnt
from orders
group by user_id

每個(gè)訂單是有多少商品
select order_id, count(product_id) as prod_cnt
from order_products_prior
group by order_id

  1. 每個(gè)用戶最喜歡的3個(gè)商品是什么
    ①表join關(guān)聯(lián)
    select user_id, product_id from orders join order_product_prior pri on orders.order_id=pri.order_id limit 100;
select user_id, concat_ws(',', collect_list(order_id)) as order_ids from orders group by user_id limit 100;
1       2539329,2398795,473747,2254736,431534,3367565,550135,3108588,2295261,2550362,1187899
10      1224907,68288,2115522,83395,1353310,1822501
100     680467,3159209,2443738,2337051,2875733,3302990
1000    3045856,1456368,2700013,2003258,568709,18216,2885089,903956
10000   1492646,1444722,3272690,2152976,859342,103998,3146859,2306339,1765454,377814,1090401,3204347,2077534,1731354,1191621,428409,936532,2300676,236049,715590,2196356,192826,2652468,2600064,12156,1550736,793788,1824260,1044054,2505014,1104652,2646743,2166278,247668,545893,1902338,2327642,2610522,3038094,2856130,675149,1762088,2662079,775362,956424,3007165,858276,957745,664258,2492781,686954,3128885,1946074,611183,3079937,193643,1788987,2864915,842149,1404909,2477429,2999447,2511241,3060036,3268994,584896,343340,3306425,1125006,951827,1990815,2989046,1925217
100000  507791,284691,1016379,1666305,2380440,2159051,1844720,2433517,2104152,576286



select user_id, collect_list(order_id) as order_ids from orders group by user_id limit 100;
1       ["2539329","2398795","473747","2254736","431534","3367565","550135","3108588","2295261","2550362","1187899"]
10      ["1224907","68288","2115522","83395","1353310","1822501"]
100     ["680467","3159209","2443738","2337051","2875733","3302990"]
1000    ["3045856","1456368","2700013","2003258","568709","18216","2885089","903956"]
10000   ["1492646","1444722","3272690","2152976","859342","103998","3146859","2306339","1765454","377814","1090401","3204347","2077534","1731354","1191621","428409","936532","2300676","236049","715590","2196356","192826","2652468","2600064","12156","1550736","793788","1824260","1044054","2505014","1104652","2646743","2166278","247668","545893","1902338","2327642","2610522","3038094","2856130","675149","1762088","2662079","775362","956424","3007165","858276","957745","664258","2492781","686954","3128885","1946074","611183","3079937","193643","1788987","2864915","842149","1404909","2477429","2999447","2511241","3060036","3268994","584896","343340","3306425","1125006","951827","1990815","2989046","1925217"]
100000  ["507791","284691","1016379","1666305","2380440","2159051","1844720","2433517","2104152","576286"]



我的思考:

  1. hive中的數(shù)據(jù)如何導(dǎo)入到redis中税课?

回答:

  1. hive中的數(shù)據(jù)如何導(dǎo)入到redis中闲延?

待補(bǔ)充豹缀。
https://www.fuwuqizhijia.com/redis/201704/67005.html
https://blog.csdn.net/DSLZTX/article/details/50775500

use test;
drop table if exists student2;
create table student2(
id int,
name string,
age int,
course array<string>,
body map<string,float>,
address struct<street:string,number:int>
)
partitioned by (state string,city string)
row format delimited
fields terminated by '\t'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n'
stored as textfile;

--1. 將orders和order_products_prior建表入hive
--sed '1d' orders.csv 能把列名刪除
create table orders
(
order_id string,
user_id string,
eval_set string,
order_number string,
order_dow string,
order_hour_of_day string,
days_since_prior_order string
)
row format delimited fields terminated by ',' --'\t'
lines terminated by '\n';
--location '/data/orders' --(orders是文件夾)

--導(dǎo)入數(shù)據(jù)
load local data inpath '/home/badou/Documents/data/order_data/orders.csv'
overwrite into table orders;

order_id string
product_id string
add_to_cart_order string
reordered string

--2.每個(gè)用戶有多少個(gè)訂單[orders] pv瀏覽量 > uv用戶量
--order_id,user_id
select user_id,count(order_id) as order_cnt
from orders
group by user_id
order by order_cnt desc
limit 10

--3.每個(gè)用戶【2.平均【1.每個(gè)訂單是多少商品】】 avg
orders[用戶,訂單] order_products_prior【訂單慨代,商品】order_id product_id
--3.1.每個(gè)訂單是多少商品
select order_id,count(product_id) as prod_cnt
from order_products_prior
group by order_id

--3.2
set hive.cli.print.header=true;
select orders.user_id,prod.prod_cnt
from orders
join(
select order_id,count(product_id) as prod_cnt
from order_products_prior
group by order_id)prod
on orders.order_id=prod.order_id
limit 10;

三張小票

  1. 10個(gè)prod
  2. 20個(gè)prod
  3. 30個(gè)prod

(10+20+30)/3 avg

user_id avg([10,20,30])=20

1user 100order sum 1*100prod avg 1
2user 2order sum 10+20prod avg 15

select user_id, avg(prod_cnt) as avg_prod_cnt
from
(
select orders.user_id,prod.prod_cnt
from orders
join(
select order_id,count(product_id) as prod_cnt
from order_products_prior
group by order_id)prod
on orders.order_id=prod.order_id
)t
group by user_id
order by avg_prod_cnt desc
limit 100

都是kill子任務(wù):
-- kill job:
hadoop job -kill job_1537547243219_0058
-- kill application:
yarn application -kill application_1537547243219_0056

select orders.user_id,avg(prod.prod_cnt) as avg_prod_cnt
from orders
join(
select order_id,count(product_id) as prod_cnt
from order_products_prior
group by order_id)prod
on orders.order_id=prod.order_id
group by orders.user_id
limit 10;

--4. 每個(gè)用戶在一周中的購(gòu)買訂單的分布 --列轉(zhuǎn)行
set hive.cli.print.header=true;
select
user_id,
sum(case order_dow when '0' then 1 else 0 end) as dow_0,
sum(case order_dow when '1' then 1 else 0 end) as dow_1,
sum(case order_dow when '2' then 1 else 0 end) as dow_2,
sum(case order_dow when '3' then 1 else 0 end) as dow_3,
sum(case order_dow when '4' then 1 else 0 end) as dow_4,
sum(case order_dow when '5' then 1 else 0 end) as dow_5,
sum(case order_dow when '6' then 1 else 0 end) as dow_6
from orders
group by user_id
limit 20;

--5. 每個(gè)用戶平均每個(gè)購(gòu)買天中邢笙,購(gòu)買的商品數(shù)量
--【把days_since_prior_order當(dāng)做一個(gè)日期20181014】
--1.orders user_id days_since_prior_order 天數(shù)
select if(days_since_prior_order='','0',days_since_prior_order) as days_since_prior_order

select user_id,count(distinct days_since_prior_order) as day_cnt
from orders
group by user_id

--2.每個(gè)訂單是多少商品
select order_id,count(product_id) as prod_cnt
from order_products_prior
group by order_id

1.user_id prod_cnt day_cnt join orders order_products_prior
2.user_id prod_cnt dt(days_since_prior_order) join

--1.
select orders.user_id,sum(prod.prod_cnt)/count(distinct days_since_prior_order) as day_cnt
from orders
join
(select order_id,count(product_id) as prod_cnt
from order_products_prior
group by order_id)prod
on orders.order_id=prod.order_id
group by orders.user_id
limit 100

--2.
select user_id,avg(prod_cnt) as avg_day_prod_cnt
from(
select orders.user_id,orders.days_since_prior_order,sum(prod.prod_cnt) as prod_cnt
from orders
join
(select order_id,count(product_id) as prod_cnt
from order_products_prior
group by order_id)prod
on orders.order_id=prod.order_id
group by orders.user_id,orders.days_since_prior_order
)t
group by user_id
limit 100

最終兩個(gè)結(jié)果一樣,執(zhí)行邏輯一樣

  1. 每個(gè)用戶最喜愛購(gòu)買的三個(gè)product是什么侍匙,最終表結(jié)構(gòu)可以是3個(gè)列氮惯,或者一個(gè)字符串
    user_id,product_id,prod_buy_cnt 一個(gè)用戶對(duì)同一個(gè)商品購(gòu)買多少次

select user_id,collect_list(concat_ws('_',product_id,cast(row_num as string))) as top_3_prods
from(
select user_id,product_id,
row_number() over(partition by user_id order by prod_buy_cnt desc) as row_num,
prod_buy_cnt
from(
select orders.user_id,pri.product_id,count(1) as prod_buy_cnt
from orders
join order_products_prior pri
on orders.order_id=pri.order_id
group by orders.user_id,pri.product_id
)t
)tt
where row_num<=3
group by user_id
limit 10;

--concat_ws
select user_id,collect_list(order_id) as order_ids
from orders
group by user_id
limit 10;

orders
app, m, pc
orders_app
orders_m
orders_pc
group by == word count
1.combiner
2.set hive.groupby.skewindata=true;
一個(gè)map reduce拆成兩個(gè)MR

--多個(gè)mapreduce
a.order_id = b.ord_id
c.product_id = b.prod_id

user_id product_desc

--一個(gè)mapreduce
a.order_id = b.ord_id
a.order_id = c.ord_id
partition

a:orders訂單表 user_id order_id
b: 訂單對(duì)應(yīng)的商品表 order_id 產(chǎn)品
c: 訂單類型表 order_id 類型,在哪些廣告渠道過(guò)來(lái)的下的訂單
loading page

--一個(gè) MR job
select * from orders
join order_products_prior pri on orders.order_id=pri.order_id
join tmp_order tord on orders.order_id=tord.order_id

--

select * from orders
join order_products_prior pri on orders.order_id=pri.order_id
join tmp_order tord on pri.order_id=tord.order_id

dict ={"order_id":"product_id"}

order_id ,user_id
a: user_id, product_id
for line in sys.stdin:
s = line.split()
user_id=s[0]
order_id = s[1]
product_id = dict.get(order_id)
print('%s\t%s\t%s'%(user_id,order_id,product_id))

a b
1,
2,
3
hue cdh

  1. [4,5[7,8,9[11,21,23]],6]

val s = "The figure of Irene, never, as the reader may possibly have observed, present, except through the senses of other characters, is a concretion of disturbing Beauty impinging on a possessive world."
s.split(" ")
s.split(" ").length

import scala.io.Source
val lines = Source.fromFile("./The_man_of_property.txt").getLines().toList
lines: List[String]

lines.map(x=>x.split(" "))
res0: List[Array[String]]

lines.map(x=>x.split(" ")).flatten
res2: List[String]
<=>
lines.flatMap(x=>x.split(" "))
res2: List[String]

MR的map處理:

lines.flatMap(.split(" ")).map(x=>(x,1))
lines.flatMap(
.split(" ")).map((_,1))

lines.flatMap(.split(" ")).map(x=>(x,1)).groupBy(x=>x.1)
lines.flatMap(
.split(" ")).map(x=>(x,1)).groupBy(
._1)
scala.collection.immutable.Map[String,List[(String, Int)]]
key->value
forgotten -> List((forgotten,1), (forgotten,1), (forgotten,1), (forgotten,1), (forgotten,1), (forgotten,1)

MR word count:
lines.flatMap(.split(" ")).map(x=>(x,1)).groupBy(._1).map(x=>(x._1,x._2.length))

lines.flatMap(.split(" "))
.map(x=>(x,1))
.groupBy(
._1)
.map(x=>(x._1,x.2.map(._2).sum))
--x.2 List((String,Int))
--List[(Int, Int, Int)] = List((1,2,3), (4,5,6), (7,8,9))
--b.map(
._2).sum

--reduce
lines.flatMap(.split(" ")).map(x=>(x,1)).groupBy(._1).map(x=>(x.1,x.2.map(.2).reduce(+)))

sum = reduce(+)
reduce()
list(1,1,1) ((1+1)+1)
sum += x
topN

--排序 按照詞頻選擇top10
lines.flatMap(.split(" ")).map(x=>(x,1)).groupBy(._1)
.map(x=>(x._1,x.2.size))
.toList
.sortBy(
._2)
.reverse
.slice(0,10)

lines.flatMap(.split(" "))
.map(x=>(x,1)).groupBy(
._1)
.map(x=>(x._1,x.2.size))
.toList.sortWith(
.2>._2)

lines.flatMap(.split(" ")).map((,1))
.groupBy(.1)
.mapValues(
.size).toArray
.sortWith(
.2>._2)
.slice(0,10)

rdd DataFrame

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末想暗,一起剝皮案震驚了整個(gè)濱河市妇汗,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌说莫,老刑警劉巖杨箭,帶你破解...
    沈念sama閱讀 218,036評(píng)論 6 506
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異储狭,居然都是意外死亡互婿,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,046評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門辽狈,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)慈参,“玉大人,你說(shuō)我怎么就攤上這事刮萌⊥耘洌” “怎么了?”我有些...
    開封第一講書人閱讀 164,411評(píng)論 0 354
  • 文/不壞的土叔 我叫張陵着茸,是天一觀的道長(zhǎng)壮锻。 經(jīng)常有香客問(wèn)我,道長(zhǎng)涮阔,這世上最難降的妖魔是什么猜绣? 我笑而不...
    開封第一講書人閱讀 58,622評(píng)論 1 293
  • 正文 為了忘掉前任,我火速辦了婚禮澎语,結(jié)果婚禮上途事,老公的妹妹穿的比我還像新娘。我一直安慰自己擅羞,他們只是感情好尸变,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,661評(píng)論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著减俏,像睡著了一般召烂。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上娃承,一...
    開封第一講書人閱讀 51,521評(píng)論 1 304
  • 那天奏夫,我揣著相機(jī)與錄音怕篷,去河邊找鬼。 笑死酗昼,一個(gè)胖子當(dāng)著我的面吹牛廊谓,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播麻削,決...
    沈念sama閱讀 40,288評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼蒸痹,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來(lái)了呛哟?” 一聲冷哼從身側(cè)響起叠荠,我...
    開封第一講書人閱讀 39,200評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎扫责,沒(méi)想到半個(gè)月后榛鼎,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,644評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡鳖孤,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,837評(píng)論 3 336
  • 正文 我和宋清朗相戀三年者娱,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片淌铐。...
    茶點(diǎn)故事閱讀 39,953評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡肺然,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出腿准,到底是詐尸還是另有隱情,我是刑警寧澤拾碌,帶...
    沈念sama閱讀 35,673評(píng)論 5 346
  • 正文 年R本政府宣布吐葱,位于F島的核電站,受9級(jí)特大地震影響校翔,放射性物質(zhì)發(fā)生泄漏弟跑。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,281評(píng)論 3 329
  • 文/蒙蒙 一防症、第九天 我趴在偏房一處隱蔽的房頂上張望孟辑。 院中可真熱鬧,春花似錦蔫敲、人聲如沸饲嗽。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,889評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)貌虾。三九已至,卻和暖如春裙犹,著一層夾襖步出監(jiān)牢的瞬間尽狠,已是汗流浹背衔憨。 一陣腳步聲響...
    開封第一講書人閱讀 33,011評(píng)論 1 269
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留袄膏,地道東北人践图。 一個(gè)月前我還...
    沈念sama閱讀 48,119評(píng)論 3 370
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像沉馆,于是被迫代替她去往敵國(guó)和親平项。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,901評(píng)論 2 355

推薦閱讀更多精彩內(nèi)容