數(shù)倉實(shí)戰(zhàn)05:數(shù)倉搭建-DWS層

1.業(yè)務(wù)術(shù)語

1)用戶
用戶以設(shè)備為判斷標(biāo)準(zhǔn),在移動(dòng)統(tǒng)計(jì)中摔桦,每個(gè)獨(dú)立設(shè)備認(rèn)為是一個(gè)獨(dú)立用戶社付。Android 系統(tǒng)根據(jù) IMEI 號,IOS 系統(tǒng)根據(jù) OpenUDID 來標(biāo)識一個(gè)獨(dú)立用戶邻耕,每部手機(jī)一個(gè)用戶鸥咖。
2)新增用戶
首次聯(lián)網(wǎng)使用應(yīng)用的用戶。如果一個(gè)用戶首次打開某 APP兄世,那這個(gè)用戶定義為新增用 戶啼辣;卸載再安裝的設(shè)備,不會(huì)被算作一次新增碘饼。新增用戶包括日新增用戶熙兔、周新增用戶、月新增用戶艾恼。
3)活躍用戶
打開應(yīng)用的用戶即為活躍用戶住涉,不考慮用戶的使用情況。每天一臺設(shè)備打開多次會(huì)被計(jì)為一個(gè)活躍用戶钠绍。
4)周(月)活躍用戶
某個(gè)自然周(月)內(nèi)啟動(dòng)過應(yīng)用的用戶舆声,該周(月)內(nèi)的多次啟動(dòng)只記一個(gè)活躍用戶。
5)月活躍率
月活躍用戶與截止到該月累計(jì)的用戶總和之間的比例柳爽。
6)沉默用戶
用戶僅在安裝當(dāng)天(次日)啟動(dòng)一次媳握,后續(xù)時(shí)間無再啟動(dòng)行為。該指標(biāo)可以反映新增用 戶質(zhì)量和用戶與 APP 的匹配程度磷脯。
7)版本分布
不同版本的周內(nèi)各天新增用戶數(shù)蛾找,活躍用戶數(shù)和啟動(dòng)次數(shù)。利于判斷 APP 各個(gè)版本之 間的優(yōu)劣和用戶行為習(xí)慣赵誓。
8)本周回流用戶
上周未啟動(dòng)過應(yīng)用打毛,本周啟動(dòng)了應(yīng)用的用戶。
9)連續(xù) n 周活躍用戶
連續(xù) n 周俩功,每周至少啟動(dòng)一次幻枉。
10)忠誠用戶
連續(xù)活躍 5 周以上的用戶
11)連續(xù)活躍用戶
連續(xù) 2 周及以上活躍的用戶
12)近期流失用戶
連續(xù) n(2<= n <= 4)周沒有啟動(dòng)應(yīng)用的用戶。(第 n+1 周沒有啟動(dòng)過)
13)留存用戶
某段時(shí)間內(nèi)的新增用戶诡蜓,經(jīng)過一段時(shí)間后熬甫,仍然使用應(yīng)用的被認(rèn)作是留存用戶;這部分 用戶占當(dāng)時(shí)新增用戶的比例即是留存率蔓罚。
例如椿肩,5 月份新增用戶 200瞻颂,這 200 人在 6 月份啟動(dòng)過應(yīng)用的有 100 人,7 月份啟動(dòng)過 應(yīng)用的有 80 人郑象,8 月份啟動(dòng)過應(yīng)用的有 50 人蘸朋;則 5 月份新增用戶一個(gè)月后的留存率是 50%, 二個(gè)月后的留存率是 40%扣唱,三個(gè)月后的留存率是 25%。
14)用戶新鮮度
每天啟動(dòng)應(yīng)用的新老用戶比例团南,即新增用戶數(shù)占活躍用戶數(shù)的比例噪沙。
15)單次使用時(shí)長
每次啟動(dòng)使用的時(shí)間長度。
16)日使用時(shí)長
累計(jì)一天內(nèi)的使用時(shí)間長度吐根。
17)啟動(dòng)次數(shù)計(jì)算標(biāo)準(zhǔn)
IOS 平臺應(yīng)用退到后臺就算一次獨(dú)立的啟動(dòng)正歼;Android 平臺我們規(guī)定,兩次啟動(dòng)之間的 間隔小于 30 秒拷橘,被計(jì)算一次啟動(dòng)局义。用戶在使用過程中,若因收發(fā)短信或接電話等退出應(yīng)用 30 秒又再次返回應(yīng)用中冗疮,那這兩次行為應(yīng)該是延續(xù)而非獨(dú)立的萄唇,所以可以被算作一次使用 行為术幔,即一次啟動(dòng)另萤。業(yè)內(nèi)大多使用 30 秒這個(gè)標(biāo)準(zhǔn)诅挑,但用戶還是可以自定義此時(shí)間間隔四敞。

2.系統(tǒng)函數(shù)

2.1 collect_set 函數(shù)
1)創(chuàng)建原數(shù)據(jù)表

hive (gmall)> drop table if exists stud; 
create table stud (name string, area string, course string, score int);

2)向原數(shù)據(jù)表中插入數(shù)據(jù)

hive (gmall) > INSERT INTO TABLE stud
VALUES
    ('zhang3', 'bj', 'math', 88);

INSERT INTO TABLE stud
VALUES
    ('li4', 'bj', 'math', 99);

INSERT INTO TABLE stud
VALUES
    ('wang5', 'sh', 'chinese', 92);

INSERT INTO TABLE stud
VALUES
    ('zhao6', 'sh', 'chinese', 54);

INSERT INTO TABLE stud
VALUES
    ('tian7', 'bj', 'chinese', 91);

3)查詢表中數(shù)據(jù)

hive (gmall)> select * from stud; 

輸出:

stud.name stud.area stud.course stud.score
zhang3 bj math 88 
li4 bj math 99 
wang5 sh chinese 92 
zhao6 sh chinese 54 
tian7 bj chinese 91

4)把同一分組的不同行的數(shù)據(jù)聚合成一個(gè)集合

hive (gmall) > SELECT
    course,
    collect_set (area),
    avg(score)
FROM
    stud
GROUP BY
    course;

輸出:

chinese ["sh","bj"] 79.0 
math ["bj"] 93.5

5) 用下標(biāo)可以取某一個(gè)

hive (gmall)> select course, collect_set(area)[0], 
avg(score) from stud group by course;
chinese sh 79.0 math bj 93.5

2.2 nvl函數(shù)
1)基本語法
NVL(表達(dá)式 1,表達(dá)式 2)
如果表達(dá)式 1 為空值拔妥,NVL 返回值為表達(dá)式 2 的值忿危,否則返回表達(dá)式 1 的值。 該函 數(shù)的目的是把一個(gè)空值(null)轉(zhuǎn)換成一個(gè)實(shí)際的值没龙。其表達(dá)式的值可以是數(shù)字型铺厨、字符型 和日期型。但是表達(dá)式 1 和表達(dá)式 2 的數(shù)據(jù)類型必須為同一個(gè)類型兜畸。

2.3 日期處理函數(shù)
1)date_format 函數(shù)(根據(jù)格式整理日期)

hive (gmall)> select date_format('2020-03-10','yyyy-MM');

2020-03

2)date_add 函數(shù)(加減日期)

hive (gmall)> select date_add('2020-03-10',-1); 
2020-03-09 
hive (gmall)> select date_add('2020-03-10',1); 
2020-03-11

3)next_day 函數(shù)
(1)取當(dāng)前天的下一個(gè)周一

hive (gmall)> select next_day('2020-03-12','MO'); 
2020-03-16

說明:星期一到星期日的英文(Monday努释,Tuesday、Wednesday咬摇、Thursday伐蒂、Friday、Saturday肛鹏、Sunday)
(2)取當(dāng)前周的周一

hive (gmall)> select date_add(next_day('2020-03-12','MO'),-7); 
2020-03-11

4)last_day 函數(shù)(求當(dāng)月最后一天日期)

hive (gmall)> select last_day('2020-03-10'); 
2020-03-31
3.DWS 層(用戶行為)

3.1 每日設(shè)備行為
每日設(shè)備行為逸邦,主要按照設(shè)備 id 統(tǒng)計(jì)恩沛。


1)建表語句

hive (gmall) > DROP TABLE
IF EXISTS dws_uv_detail_daycount;

CREATE external TABLE dws_uv_detail_daycount (
    `mid_id` string COMMENT '設(shè)備唯一標(biāo)識',
    `user_id` string COMMENT '用戶標(biāo)識',
    `version_code` string COMMENT '程序版本號',
    `version_name` string COMMENT '程序版本名',
    `lang` string COMMENT '系統(tǒng)語言',
    `source` string COMMENT '渠道號',
    `os` string COMMENT '安卓系統(tǒng)版本',
    `area` string COMMENT '區(qū)域',
    `model` string COMMENT '手機(jī)型號',
    `brand` string COMMENT '手機(jī)品牌',
    `sdk_version` string COMMENT 'sdkVersion',
    `gmail` string COMMENT 'gmail',
    `height_width` string COMMENT '屏幕寬高',
    `app_time` string COMMENT '客戶端日志產(chǎn)生時(shí)的時(shí)間',
    `network` string COMMENT '網(wǎng)絡(luò)模式',
    `lng` string COMMENT '經(jīng)度',
    `lat` string COMMENT '緯度',
    `login_count` BIGINT COMMENT '活躍次數(shù)'
) partitioned BY (dt string) stored AS parquet location '/warehouse/gmall/dws/dws_uv_detail_daycount';

2)數(shù)據(jù)裝載

hive (gmall) > 
INSERT overwrite TABLE dws_uv_detail_daycount PARTITION (dt = '2020-03-10') SELECT
    mid_id,
    concat_ws('|', collect_set(user_id)) user_id,
    concat_ws(
        '|',
        collect_set (version_code)
    ) version_code,
    concat_ws(
        '|',
        collect_set (version_name)
    ) version_name,
    concat_ws('|', collect_set(lang)) lang,
    concat_ws('|', collect_set(source)) source,
    concat_ws('|', collect_set(os)) os,
    concat_ws('|', collect_set(area)) area,
    concat_ws('|', collect_set(model)) model,
    concat_ws('|', collect_set(brand)) brand,
    concat_ws(
        '|',
        collect_set (sdk_version)
    ) sdk_version,
    concat_ws('|', collect_set(gmail)) gmail,
    concat_ws(
        '|',
        collect_set (height_width)
    ) height_width,
    concat_ws('|', collect_set(app_time)) app_time,
    concat_ws('|', collect_set(network)) network,
    concat_ws('|', collect_set(lng)) lng,
    concat_ws('|', collect_set(lat)) lat,
    count(*) login_count
FROM
    dwd_start_log
WHERE
    dt = '2020-03-10'
GROUP BY
    mid_id;

3)查詢加載結(jié)果

hive (gmall)> 
select * from dws_uv_detail_daycount 
where dt='2020-03-10';
4.DWS層(業(yè)務(wù))

DWS 層的寬表字段,是站在不同維度的視角去看事實(shí)表缕减。重點(diǎn)關(guān)注事實(shí)表的度量值雷客。


4.1 每日會(huì)員行為
1)建表語句

hive (gmall) > DROP TABLE
IF EXISTS dws_user_action_daycount;

CREATE external TABLE dws_user_action_daycount (
    user_id string COMMENT '用戶 id',
    login_count BIGINT COMMENT '登錄次數(shù)',
    cart_count BIGINT COMMENT '加入購物車次數(shù)',
    cart_amount DOUBLE COMMENT '加入購物車金額',
    order_count BIGINT COMMENT '下單次數(shù)',
    order_amount DECIMAL (16, 2) COMMENT '下單金額',
    payment_count BIGINT COMMENT '支付次數(shù)',
    payment_amount DECIMAL (16, 2) COMMENT '支付金額'
) COMMENT '每日用戶行為' PARTITIONED BY (`dt` string) stored AS parquet location '/warehouse/gmall/dws/dws_user_action_daycount/' tblproperties (
    "parquet.compression" = "lzo"
);

2)數(shù)據(jù)裝載

hive (gmall) > 
with tmp_login as (
 select 
  user_id,
  count(*) login_count 
 from dwd_start_log
 where dt='2020-12-25'
 group by user_id
),
tmp_cart as (
 select 
  user_id,
  count(*) cart_count,
  sum(cart_price*sku_num)cart_amount
 from dwd_fact_cart_info
 where dt='2020-12-27'
 group by user_id
),
tmp_order as (
 select
  user_id,
  count(*) order_count,
  sum(final_total_amount) order_amount
 from dwd_fact_order_info
 where dt='2020-12-27'
 group by user_id
),
tmp_pament as (
 select
  user_id,
  count(*) payment_count,
  sum(payment_amount)payment_amount
 from dwd_fact_payment_info
 where dt='2020-12-27'
 group by user_id
)

insert overwrite table dws_user_action_daycount
partition(dt='2020-12-27')
select 
 user_actions.user_id,
 sum(user_actions.login_count),
 sum(user_actions.cart_count),
 sum(user_actions.cart_amount),
 sum(user_actions.order_count),
 sum(user_actions.order_amount),
 sum(user_actions.payment_count),
 sum(user_actions.payment_amount)
from 
(
 select
  user_id,
  login_count,
  0 cart_count,
  0 cart_amount,
  0 order_count,
  0 order_amount,
  0 payment_count,
  0 payment_amount
 from tmp_login
 union all
  select
   user_id,
   0 login_count,
   cart_count,
   cart_amount,
   0 order_count,
   0 order_amount,
   0 payment_count,
   0 payment_amount
  from tmp_cart
  union all
   select 
    user_id,
    0 login_count,
    0 cart_count,
    0 cart_amount,
    order_count,
    order_amount,
    0 payment_count,
    0 payment_amount
   from tmp_order
   union all
    select 
     user_id,
     0 login_count,
     0 cart_count,
     0 cart_amount,
     0 order_count,
     0 order_amount,
     payment_count,
     payment_amount
   from tmp_pament
)user_actions
group by user_id;

select *from dws_user_action_daycount


3)查詢加載結(jié)果

hive (gmall)> 
select * from dws_user_action_daycount 
where dt='2020-03-10';

3.2 每日商品行為
1)建表語句

hive (gmall) > DROP TABLE
IF EXISTS dws_sku_action_daycount;

CREATE external TABLE dws_sku_action_daycount (
    sku_id string COMMENT 'sku_id',
    order_count BIGINT COMMENT '被下單次數(shù)',
    order_num BIGINT COMMENT '被下單件數(shù)',
    order_amount DECIMAL (16, 2) COMMENT '被下單金額',
    payment_count BIGINT COMMENT '被支付次數(shù)',
    payment_num BIGINT COMMENT '被支付件數(shù)',
    payment_amount DECIMAL (16, 2) COMMENT '被支付金額',
    refund_count BIGINT COMMENT '被退款次數(shù)',
    refund_num BIGINT COMMENT '被退款件數(shù)',
    refund_amount DECIMAL (16, 2) COMMENT '被退款金額',
    cart_count BIGINT COMMENT '被加入購物車次數(shù)',
    cart_num BIGINT COMMENT '被加入購物車件數(shù)',
    favor_count BIGINT COMMENT '被收藏次數(shù)',
    appraise_good_count BIGINT COMMENT '好評數(shù)',
    appraise_mid_count BIGINT COMMENT '中評數(shù)',
    appraise_bad_count BIGINT COMMENT '差評數(shù)',
    appraise_default_count BIGINT COMMENT '默認(rèn)評價(jià)數(shù)'
) COMMENT '每日商品行為' PARTITIONED BY (`dt` string) stored AS parquet location '/warehouse/gmall/dws/dws_sku_action_daycount/' tblproperties (
    "parquet.compression" = "lzo"
);

2)數(shù)據(jù)裝載
注意:如果是 23 點(diǎn) 59 下單线婚,支付日期跨天丢郊。需要從訂單詳情里面取出支付時(shí)間是今天坝橡,訂單時(shí)間是昨天或者今天的訂單枷恕。

hive (gmall) > 
with
tmp_order as
(
 select 
  sku_id,
  count(*) order_count,
  sum(sku_num) order_num,
  sum(total_amount) order_amount
 from dwd_fact_order_detail
 where dt = '2020-12-27'
 GROUP BY sku_id
),
tmp_payment AS (
SELECT
 sku_id,
 count(*) payment_count,
 sum(sku_num) payment_num,
 sum(total_amount) payment_amount
FROM
 dwd_fact_order_detail
WHERE
 dt = '2020-12-27'
AND order_id IN (
 SELECT
  id
 FROM
  dwd_fact_order_info
 WHERE
(
dt = '2020-12-27'
OR dt = date_add('2020-12-27' ,- 1)
)
AND date_format(payment_time, 'yyyy-MM-dd') = '2020-12-27'
)
GROUP BY
sku_id
),
tmp_refund AS (
  SELECT
   sku_id,
   count(*) refund_count,
       sum(refund_num) refund_num,
       sum(refund_amount) refund_amount
    FROM
        dwd_fact_order_refund_info
    WHERE
        dt = '2020-12-27'
    GROUP BY
        sku_id
),
tmp_cart AS (
SELECT
sku_id,
count(*) cart_count,
sum(sku_num) cart_num
FROM
dwd_fact_cart_info
WHERE
dt = '2020-12-27'
AND date_format(create_time, 'yyyy-MM-dd') = '2020-12-27'
GROUP BY
sku_id
),
tmp_favor AS (
SELECT
sku_id,
count(*) favor_count
FROM
dwd_fact_favor_info
GROUP BY
sku_id
),
tmp_appraise AS (
SELECT
sku_id,
sum(IF(appraise = '1201', 1, 0)) appraise_good_count,
sum(IF(appraise = '1202', 1, 0)) appraise_mid_count,
sum(IF(appraise = '1203', 1, 0)) appraise_bad_count,
sum(IF(appraise = '1204', 1, 0)) appraise_default_count
FROM
dwd_fact_comment_info
WHERE
dt = '2020-12-27'
GROUP BY
sku_id
) INSERT overwrite TABLE dws_sku_action_daycount PARTITION (dt = '2020-12-27') SELECT
sku_id,
sum(order_count),
sum(order_num),
sum(order_amount),
sum(payment_count),
sum(payment_num),
sum(payment_amount),
sum(refund_count),
sum(refund_num),
sum(refund_amount),
sum(cart_count),
sum(cart_num),
sum(favor_count),
sum(appraise_good_count),
sum(appraise_mid_count),
sum(appraise_bad_count),
sum(appraise_default_count)
FROM
(
SELECT
sku_id,
order_count,
order_num,
order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
FROM
tmp_order
UNION ALL
SELECT
sku_id,
0 order_count,
0 order_num,
0 order_amount,
payment_count,
payment_num,
payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
FROM
tmp_payment
UNION ALL
SELECT
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
refund_count,
refund_num,
refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
FROM
tmp_refund
UNION ALL
SELECT
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
cart_count,
cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
FROM
tmp_cart
UNION ALL
SELECT
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
FROM
tmp_favor
UNION ALL
SELECT
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
appraise_good_count,
appraise_mid_count,
appraise_bad_count,
appraise_default_count
FROM
tmp_appraise
) tmp
GROUP BY
sku_id;

3)查詢加載結(jié)果
hive (gmall)>
select * from dws_sku_action_daycount where dt='2020-03-10';

3.3 每日優(yōu)惠券統(tǒng)計(jì)(預(yù)留)

1)建表語句

hive (gmall) > DROP TABLE
IF EXISTS dws_coupon_use_daycount;

CREATE external TABLE dws_coupon_use_daycount (
    `coupon_id` string COMMENT '優(yōu)惠券 ID',
    `coupon_name` string COMMENT '購物券名稱',
    `coupon_type` string COMMENT '購物券類型 1 現(xiàn)金券 2 折扣券 3 滿減券 4 滿件打折券',
    `condition_amount` string COMMENT '滿額數(shù)',
    `condition_num` string COMMENT '滿件數(shù)',
    `activity_id` string COMMENT '活動(dòng)編號',
    `benefit_amount` string COMMENT '減金額',
    `benefit_discount` string COMMENT '折扣',
    `create_time` string COMMENT '創(chuàng)建時(shí)間',
    `range_type` string COMMENT '范圍類型 1兄朋、商品 2、品類 3怜械、品牌',
    `spu_id` string COMMENT '商品 id',
    `tm_id` string COMMENT '品牌 id',
    `category3_id` string COMMENT '品類 id',
    `limit_num` string COMMENT '最多領(lǐng)用次數(shù)',
    `get_count` BIGINT COMMENT '領(lǐng)用次數(shù)',
    `using_count` BIGINT COMMENT '使用(下單)次數(shù)',
    `used_count` BIGINT COMMENT '使用(支付)次數(shù)'
) COMMENT '每日優(yōu)惠券統(tǒng)計(jì)' PARTITIONED BY (`dt` string) stored AS parquet location '/warehouse/gmall/dws/dws_coupon_use_daycount/' tblproperties (
    "parquet.compression" = "lzo"
);

2)數(shù)據(jù)裝載

hive (gmall) > INSERT overwrite TABLE dws_coupon_use_daycount PARTITION (dt = '2020-03-10') SELECT
    cu.coupon_id,
    ci.coupon_name,
    ci.coupon_type,
    ci.condition_amount,
    ci.condition_num,
    ci.activity_id,
    ci.benefit_amount,
    ci.benefit_discount,
    ci.create_time,
    ci.range_type,
    ci.spu_id,
    ci.tm_id,
    ci.category3_id,
    ci.limit_num,
    cu.get_count,
    cu.using_count,
    cu.used_count
FROM
    (
        SELECT
            coupon_id,
            sum(

                IF (
                    date_format(get_time, 'yyyy-MM-dd') = '2020-03-10',
                    1,
                    0
                )
            ) get_count,
            sum(

                IF (
                    date_format(using_time, 'yyyy-MM-dd') = '2020-03-10',
                    1,
                    0
                )
            ) using_count,
            sum(

                IF (
                    date_format(used_time, 'yyyy-MM-dd') = '2020-03-10',
                    1,
                    0
                )
            ) used_count
        FROM
            dwd_fact_coupon_use
        WHERE
            dt = '2020-03-10'
        GROUP BY
            coupon_id
    ) cu
LEFT JOIN (
    SELECT
        *
    FROM
        dwd_dim_coupon_info
    WHERE
        dt = '2020-03-10'
) ci ON cu.coupon_id = ci.id;

3)查詢加載結(jié)果
hive (gmall)>
select * from dws_coupon_use_daycount where dt='2020-03-10';

3.4 每日活動(dòng)統(tǒng)計(jì)(預(yù)留)

1)建表語句

hive (gmall) > DROP TABLE
IF EXISTS dws_activity_info_daycount;

CREATE external TABLE dws_activity_info_daycount (
    `id` string COMMENT '編號',
    `activity_name` string COMMENT '活動(dòng)名稱',
    `activity_type` string COMMENT '活動(dòng)類型',
    `start_time` string COMMENT '開始時(shí)間',
    `end_time` string COMMENT '結(jié)束時(shí)間',
    `create_time` string COMMENT '創(chuàng)建時(shí)間',
    `order_count` BIGINT COMMENT '下單次數(shù)',
    `payment_count` BIGINT COMMENT '支付次數(shù)'
) COMMENT '購物車信息表' PARTITIONED BY (`dt` string) ROW format delimited FIELDS TERMINATED BY '\t' location '/warehouse/gmall/dws/dws_activity_info_daycount/' tblproperties (
    "parquet.compression" = "lzo"
);

2)數(shù)據(jù)裝載

hive (gmall) > INSERT overwrite TABLE dws_activity_info_daycount PARTITION (dt = '2020-03-10') SELECT
    oi.activity_id,
    ai.activity_name,
    ai.activity_type,
    ai.start_time,
    ai.end_time,
    ai.create_time,
    oi.order_count,
    oi.payment_count
FROM
    (
        SELECT
            activity_id,
            sum(

                IF (
                    date_format(create_time, 'yyyy-MM-dd') = '2020-03-10',
                    1,
                    0
                )
            ) order_count,
            sum(

                IF (
                    date_format(payment_time, 'yyyy-MM-dd') = '2020-03-10',
                    1,
                    0
                )
            ) payment_count
        FROM
            dwd_fact_order_info
        WHERE
            (
                dt = '2020-03-10'
                OR dt = date_add('2020-03-10' ,- 1)
            )
        AND activity_id IS NOT NULL
        GROUP BY
            activity_id
    ) oi
JOIN (
    SELECT
        *
    FROM
        dwd_dim_activity_info
    WHERE
        dt = '2020-03-10'
) ai ON oi.activity_id = ai.id;

3)查詢加載結(jié)果
hive (gmall)>
select * from dws_activity_info_daycount
where dt='2020-03-10';

3.5 每日購買行為

1)建表語句

hive (gmall) > DROP TABLE
IF EXISTS dws_sale_detail_daycount;

CREATE external TABLE dws_sale_detail_daycount (
    user_id string COMMENT '用戶 id',
    sku_id string COMMENT '商品 id',
    user_gender string COMMENT '用戶性別',
    user_age string COMMENT '用戶年齡',
    user_level string COMMENT '用戶等級',
    order_price DECIMAL (10, 2) COMMENT '商品價(jià)格',
    sku_name string COMMENT '商品名稱',
    sku_tm_id string COMMENT '品牌 id',
    sku_category3_id string COMMENT '商品三級品類 id',
    sku_category2_id string COMMENT '商品二級品類 id',
    sku_category1_id string COMMENT '商品一級品類 id',
    sku_category3_name string COMMENT '商品三級品類名稱',
    sku_category2_name string COMMENT '商品二級品類名稱',
    sku_category1_name string COMMENT '商品一級品類名稱',
    spu_id string COMMENT '商品 spu',
    sku_num INT COMMENT '購買個(gè)數(shù)',
    order_count BIGINT COMMENT '當(dāng)日下單單數(shù)',
    order_amount DECIMAL (16, 2) COMMENT '當(dāng)日下單金額'
) COMMENT '每日購買行為' PARTITIONED BY (`dt` string) stored AS parquet location '/warehouse/gmall/dws/dws_sale_detail_daycount/' tblproperties (
    "parquet.compression" = "lzo"
);

2)數(shù)據(jù)裝載

hive (gmall) > INSERT overwrite TABLE dws_sale_detail_daycount PARTITION (dt = '2020-03-10') SELECT
    op.user_id,
    op.sku_id,
    ui.gender,
    months_between ('2020-03-10', ui.birthday) / 12 age,
    ui.user_level,
    si.price,
    si.sku_name,
    si.tm_id,
    si.category3_id,
    si.category2_id,
    si.category1_id,
    si.category3_name,
    si.category2_name,
    si.category1_name,
    si.spu_id,
    op.sku_num,
    op.order_count,
    op.order_amount
FROM
    (
        SELECT
            user_id,
            sku_id,
            sum(sku_num) sku_num,
            count(*) order_count,
            sum(total_amount) order_amount
        FROM
            dwd_fact_order_detail
        WHERE
            dt = '2020-03-10'
        GROUP BY
            user_id,
            sku_id
    ) op
JOIN (
    SELECT
        *
    FROM
        dwd_dim_user_info_his
    WHERE
        end_date = '9999-99-99'
) ui ON op.user_id = ui.id
JOIN (
    SELECT
        *
    FROM
        dwd_dim_sku_info
    WHERE
        dt = '2020-03-10'
) si ON op.sku_id = si.id;

3)查詢加載結(jié)果
hive (gmall)> select * from dws_sale_detail_daycount
where dt='2020-03-10';

5.DWS 層數(shù)據(jù)導(dǎo)入腳本

1)在/home/atguigu/bin 目錄下創(chuàng)建腳本 dwd_to_dws.sh

[atguigu@hadoop102 bin]$ vim dwd_to_dws.sh

在腳本中填寫如下內(nèi)容

#!/bin/bash 
APP=gmall hive=/opt/module/hive/bin/hive 
# 如果是輸入的日期按照取輸入日期有额;如果沒輸入日期取當(dāng)前時(shí)間的前一天 
if [ -n "$1" ] ;
then 
do
_date=$1 
else
do
_date=`date -d "-1 day" +%F` 
fisql="
INSERT overwrite TABLE $ { APP }.dws_uv_detail_daycount PARTITION (dt = '$do_date') SELECT
    mid_id,
    concat_ws('|', collect_set(user_id)) user_id,
    concat_ws(
        '|',
        collect_set (version_code)
    ) version_code,
    concat_ws(
        '|',
        collect_set (version_name)
    ) version_name,
    concat_ws('|', collect_set(lang)) lang,
    concat_ws('|', collect_set(source)) source,
    concat_ws('|', collect_set(os)) os,
    concat_ws('|', collect_set(area)) area,
    concat_ws('|', collect_set(model)) model,
    concat_ws('|', collect_set(brand)) brand,
    concat_ws(
        '|',
        collect_set (sdk_version)
    ) sdk_version,
    concat_ws('|', collect_set(gmail)) gmail,
    concat_ws(
        '|',
        collect_set (height_width)
    ) height_width,
    concat_ws('|', collect_set(app_time)) app_time,
    concat_ws('|', collect_set(network)) network,
    concat_ws('|', collect_set(lng)) lng,
    concat_ws('|', collect_set(lat)) lat,
    count(*) login_count
FROM
    $ { APP }.dwd_start_log
WHERE
    dt = '$do_date'
GROUP BY
    mid_id;

WITH tmp_login AS (
    selectuser_id,
    count(*) login_count
FROM
    $ { APP }.dwd_start_log
WHERE
    dt = '$do_date'
AND user_id IS NOT NULL
GROUP BY
    user_id
),
 tmp_cart AS (
    SELECT
        user_id,
        count(*) cart_count,
        sum(cart_price * sku_num) cart_amount
    FROM
        $ { APP }.dwd_fact_cart_info
    WHERE
        dt = '$do_date'
    AND user_id IS NOT NULL
    AND date_format(create_time, 'yyyy-MM-dd') = '$do_date'
    GROUP BY
        user_id
),
 tmp_order AS (
    SELECT
        user_id,
        count(*) order_count,
        sum(final_total_amount) order_amount
    FROM
        $ { APP }.dwd_fact_order_info
    WHERE
        dt = '$do_date'
    GROUP BY
        user_id
),
 tmp_payment AS (
    SELECT
        user_id,
        count(*) payment_count,
        sum(payment_amount) payment_amount
    FROM
        $ { APP }.dwd_fact_payment_info
    WHERE
        dt = '$do_date'
    GROUP BY
        user_id
) INSERT overwrite TABLE $ { APP }.dws_user_action_daycount PARTITION (dt = '$do_date') SELECT
    user_actions.user_id,
    sum(user_actions.login_count),
    sum(user_actions.cart_count),
    sum(user_actions.cart_amount),
    sum(user_actions.order_count),
    sum(user_actions.order_amount),
    sum(user_actions.payment_count),
    sum(
        user_actions.payment_amount
    )
FROM
    (
        SELECT
            user_id,
            login_count,
            0 cart_count,
            0 cart_amount,
            0 order_count,
            0 order_amount,
            0 payment_count,
            0 payment_amount
        FROM
            tmp_loginunion ALL SELECT
                user_id,
                0 login_count,
                cart_count,
                cart_amount,
                0 order_count,
                0 order_amount,
                0 payment_count,
                0 payment_amount
            FROM
                tmp_cart
            UNION ALL
                SELECT
                    user_id,
                    0 login_count,
                    0 cart_count,
                    0 cart_amount,
                    order_count,
                    order_amount,
                    0 payment_count,
                    0 payment_amount
                FROM
                    tmp_order
                UNION ALL
                    SELECT
                        user_id,
                        0 login_count,
                        0 cart_count,
                        0 cart_amount,
                        0 order_count,
                        0 order_amount,
                        payment_count,
                        payment_amount
                    FROM
                        tmp_payment
    ) user_actions
GROUP BY
    user_id;

WITH tmp_order AS (
    SELECT
        sku_id,
        count(*) order_count,
        sum(sku_num) order_num,
        sum(total_amount) order_amount
    FROM
        $ { APP }.dwd_fact_order_detail
    WHERE
        dt = '$do_date'
    GROUP BY
        sku_id
),
 tmp_payment AS (
    SELECT
        sku_id,
        count(*) payment_count,
        sum(sku_num) payment_num,
        sum(total_amount) payment_amount
    FROM
        $ { APP }.dwd_fact_order_detail
    WHERE
        dt = '$do_date'
    AND order_id IN (
        SELECT
            idfrom $ { APP }.dwd_fact_order_info
        WHERE
            (
                dt = '$do_date'
                OR dt = date_add('$do_date' ,- 1)
            )
        AND date_format(payment_time, 'yyyy-MM-dd') = '$do_date'
    )
    GROUP BY
        sku_id
),
 tmp_refund AS (
    SELECT
        sku_id,
        count(*) refund_count,
        sum(refund_num) refund_num,
        sum(refund_amount) refund_amount
    FROM
        $ { APP }.dwd_fact_order_refund_info
    WHERE
        dt = '$do_date'
    GROUP BY
        sku_id
),
 tmp_cart AS (
    SELECT
        sku_id,
        count(*) cart_count,
        sum(sku_num) cart_num
    FROM
        $ { APP }.dwd_fact_cart_info
    WHERE
        dt = '$do_date'
    AND date_format(create_time, 'yyyy-MM-dd') = '$do_date'
    GROUP BY
        sku_id
),
 tmp_favor AS (
    SELECT
        sku_id,
        count(*) favor_count
    FROM
        $ { APP }.dwd_fact_favor_info
    WHERE
        dt = '$do_date'
    AND date_format(create_time, 'yyyy-MM-dd') = '$do_date'
    GROUP BY
        sku_id
),
 tmp_appraise AS (
    SELECT
        sku_id,
        sum(IF(appraise = '1201', 1, 0)) appraise_good_count,
        sum(IF(appraise = '1202', 1, 0)) appraise_mid_count,
        sum(IF(appraise = '1203', 1, 0)) appraise_bad_count,
        sum(IF(appraise = '1204', 1, 0)) appraise_default_count
    FROM
        $ { APP }.dwd_fact_comment_info
    WHERE
        dt = '$do_date'
    GROUP BY
        sku_id
) INSERT overwrite TABLE $ { APP }.dws_sku_action_daycount PARTITION (dt = '$do_date') SELECT
    sku_id,
    sum(order_count),
    sum(order_num),
    sum(order_amount),
    sum(payment_count),
    sum(payment_num),
    sum(payment_amount),
    sum(refund_count),
    sum(refund_num),
    sum(refund_amount),
    sum(cart_count),
    sum(cart_num),
    sum(favor_count),
    sum(appraise_good_count),
    sum(appraise_mid_count),
    sum(appraise_bad_count),
    sum(appraise_default_count)
FROM
    (
        SELECT
            sku_id,
            order_count,
            order_num,
            order_amount,
            0 payment_count,
            0 payment_num,
            0 payment_amount,
            0 refund_count,
            0 refund_num,
            0 refund_amount,
            0 cart_count,
            0 cart_num,
            0 favor_count,
            0 appraise_good_count,
            0 appraise_mid_count,
            0 appraise_bad_count,
            0 appraise_default_count
        FROM
            tmp_order
        UNION ALL
            SELECT
                sku_id,
                0 order_count,
                0 order_num,
                0 order_amount,
                payment_count,
                payment_num,
                payment_amount,
                0 refund_count,
                0 refund_num,
                0 refund_amount,
                0 cart_count,
                0 cart_num,
                0 favor_count,
                0 appraise_good_count,
                0 appraise_mid_count,
                0 appraise_bad_count,
                0 appraise_default_count
            FROM
                tmp_payment
            UNION ALL
                SELECT
                    sku_id,
                    0 order_count,
                    0 order_num,
                    0 order_amount,
                    0 payment_count,
                    0 payment_num,
                    0 payment_amount,
                    refund_count,
                    refund_num,
                    refund_amount,
                    0 cart_count,
                    0 cart_num,
                    0 favor_count,
                    0 appraise_good_count,
                    0 appraise_mid_count,
                    0 appraise_bad_count,
                    0 appraise_default_count
                FROM
                    tmp_refund
                UNION ALL
                    SELECT
                        sku_id,
                        0 order_count,
                        0 order_num,
                        0 order_amount,
                        0 payment_count,
                        0 payment_num,
                        0 payment_amount,
                        0 refund_count,
                        0 refund_num,
                        0 refund_amount,
                        cart_count,
                        cart_num,
                        0 favor_count,
                        0 appraise_good_count,
                        0 appraise_mid_count,
                        0 appraise_bad_count,
                        0 appraise_default_count
                    FROM
                        tmp_cart
                    UNION ALL
                        SELECT
                            sku_id,
                            0 order_count,
                            0 order_num,
                            0 order_amount,
                            0 payment_count,
                            0 payment_num,
                            0 payment_amount,
                            0 refund_count,
                            0 refund_num,
                            0 refund_amount,
                            0 cart_count,
                            0 cart_num,
                            favor_count,
                            0 appraise_good_count,
                            0 appraise_mid_count,
                            0 appraise_bad_count,
                            0 appraise_default_count
                        FROM
                            tmp_favor
                        UNION ALL
                            SELECT
                                sku_id,
                                0 order_count,
                                0 order_num,
                                0 order_amount,
                                0 payment_count,
                                0 payment_num,
                                0 payment_amount,
                                0 refund_count,
                                0 refund_num,
                                0 refund_amount,
                                0 cart_count,
                                0 cart_num,
                                0 favor_count,
                                appraise_good_count,
                                appraise_mid_count,
                                appraise_bad_count,
                                appraise_default_count
                            FROM
                                tmp_appraise
    ) tmp
GROUP BY
    sku_id;

INSERT overwrite TABLE $ { APP }.dws_coupon_use_daycount PARTITION (dt = '$do_date') SELECT
    cu.coupon_id,
    ci.coupon_name,
    ci.coupon_type,
    ci.condition_amount,
    ci.condition_num,
    ci.activity_id,
    ci.benefit_amount,
    ci.benefit_discount,
    ci.create_time,
    ci.range_type,
    ci.spu_id,
    ci.tm_id,
    ci.category3_id,
    ci.limit_num,
    cu.get_count,
    cu.using_count,
    cu.used_count
FROM
    (
        SELECT
            coupon_id,
            sum(

                IF (
                    date_format(get_time, 'yyyy-MM-dd') = '$do_date',
                    1,
                    0
                )
            ) get_count,
            sum(

                IF (
                    date_format(using_time, 'yyyy-MM-dd') = '$do_date',
                    1,
                    0
                )
            ) using_count,
            sum(

                IF (
                    date_format(used_time, 'yyyy-MM-dd') = '$do_date',
                    1,
                    0
                )
            ) used_count
        FROM
            $ { APP }.dwd_fact_coupon_use
        WHERE
            dt = '$do_date'
        GROUP BY
            coupon_id
    ) cu
LEFT JOIN (
    SELECT
        *
    FROM
        $ { APP }.dwd_dim_coupon_info
    WHERE
        dt = '$do_date'
) ci ON cu.coupon_id = ci.id;

INSERT overwrite TABLE $ { APP }.dws_activity_info_daycount PARTITION (dt = '$do_date') SELECT
    oi.activity_id,
    ai.activity_name,
    ai.activity_type,
    ai.start_time,
    ai.end_time,
    ai.create_time,
    oi.order_count,
    oi.payment_count
FROM
    (
        SELECT
            activity_id,
            sum(

                IF (
                    date_format(create_time, 'yyyy-MM-dd') = '$do_date',
                    1,
                    0
                )
            ) order_count,
            sum(

                IF (
                    date_format(payment_time, 'yyyy-MM-dd') = '$do_date',
                    1,
                    0
                )
            ) payment_count
        FROM
            $ { APP }.dwd_fact_order_info
        WHERE
            (
                dt = '$do_date'
                OR dt = date_add('$do_date' ,- 1)
            )
        AND activity_id IS NOT nullgroup BY activity_id
    ) oi
JOIN (
    SELECT
        *
    FROM
        $ { APP }.dwd_dim_activity_info
    WHERE
        dt = '$do_date'
) ai ON oi.activity_id = ai.id;

INSERT overwrite TABLE $ { APP }.dws_sale_detail_daycount PARTITION (dt = '$do_date') SELECT
    op.user_id,
    op.sku_id,
    ui.gender,
    months_between ('$do_date', ui.birthday) / 12 age,
    ui.user_level,
    si.price,
    si.sku_name,
    si.tm_id,
    si.category3_id,
    si.category2_id,
    si.category1_id,
    si.category3_name,
    si.category2_name,
    si.category1_name,
    si.spu_id,
    op.sku_num,
    op.order_count,
    op.order_amount
FROM
    (
        SELECT
            user_id,
            sku_id,
            sum(sku_num) sku_num,
            count(*) order_count,
            sum(total_amount) order_amount
        FROM
            $ { APP }.dwd_fact_order_detail
        WHERE
            dt = '$do_date'
        GROUP BY
            user_id,
            sku_id
    ) op
JOIN (
    SELECT
        *
    FROM
        $ { APP }.dwd_dim_user_info_his
    WHERE
        end_date = '9999-99-99'
) ui ON op.user_id = ui.id
JOIN (
    SELECT
        *
    FROM
        $ { APP }.dwd_dim_sku_info
    WHERE
        dt = '$do_date'
) si ON op.sku_id = si.id;
"
$hive -e "$sql"

2)增加腳本執(zhí)行權(quán)限

[atguigu@hadoop102 bin]$ chmod 777 dwd_to_dws.sh 

3)執(zhí)行腳本導(dǎo)入數(shù)據(jù)

[atguigu@hadoop102 bin]$ dwd_to_dws.sh 2020-03-11 4)

查看導(dǎo)入數(shù)據(jù)
hive (gmall)>
select * from dws_uv_detail_daycount
where dt='2020-03-11';
select * from dws_user_action_daycount
where dt='2020-03-11';
select * from dws_sku_action_daycount
where dt='2020-03-11';
select * from dws_sale_detail_daycount
where dt='2020-03-11';
select * from dws_coupon_use_daycount
where dt='2020-03-11';
select * from dws_activity_info_daycount
where dt='2020-03-11';

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市椿争,隨后出現(xiàn)的幾起案子秦踪,更是在濱河造成了極大的恐慌,老刑警劉巖柠逞,帶你破解...
    沈念sama閱讀 217,657評論 6 505
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件板壮,死亡現(xiàn)場離奇詭異合住,居然都是意外死亡透葛,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,889評論 3 394
  • 文/潘曉璐 我一進(jìn)店門殿如,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人门岔,你說我怎么就攤上這事烤送“锛幔” “怎么了?”我有些...
    開封第一講書人閱讀 164,057評論 0 354
  • 文/不壞的土叔 我叫張陵,是天一觀的道長阅悍。 經(jīng)常有香客問我节视,道長,這世上最難降的妖魔是什么霍掺? 我笑而不...
    開封第一講書人閱讀 58,509評論 1 293
  • 正文 為了忘掉前任抗楔,我火速辦了婚禮连躏,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘拍棕。我一直安慰自己绰播,他們只是感情好尚困,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,562評論 6 392
  • 文/花漫 我一把揭開白布事甜。 她就那樣靜靜地躺著,像睡著了一般掌实。 火紅的嫁衣襯著肌膚如雪贱鼻。 梳的紋絲不亂的頭發(fā)上滋将,一...
    開封第一講書人閱讀 51,443評論 1 302
  • 那天随闽,我揣著相機(jī)與錄音橱脸,去河邊找鬼。 笑死屁桑,一個(gè)胖子當(dāng)著我的面吹牛蘑斧,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播沟突,決...
    沈念sama閱讀 40,251評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼惠拭,長吁一口氣:“原來是場噩夢啊……” “哼庸论!你這毒婦竟也來了聂示?” 一聲冷哼從身側(cè)響起鱼喉,我...
    開封第一講書人閱讀 39,129評論 0 276
  • 序言:老撾萬榮一對情侶失蹤扛禽,失蹤者是張志新(化名)和其女友劉穎,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體灵巧,經(jīng)...
    沈念sama閱讀 45,561評論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡刻肄,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,779評論 3 335
  • 正文 我和宋清朗相戀三年敏弃,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了噪馏。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片欠肾。...
    茶點(diǎn)故事閱讀 39,902評論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡刺桃,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出桃移,到底是詐尸還是另有隱情借杰,我是刑警寧澤,帶...
    沈念sama閱讀 35,621評論 5 345
  • 正文 年R本政府宣布疮装,位于F島的核電站廓推,受9級特大地震影響翩隧,放射性物質(zhì)發(fā)生泄漏堆生。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,220評論 3 328
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望墩弯。 院中可真熱鬧寞射,春花似錦、人聲如沸引矩。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,838評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽履腋。三九已至,卻和暖如春晚吞,著一層夾襖步出監(jiān)牢的瞬間谋国,已是汗流浹背芦瘾。 一陣腳步聲響...
    開封第一講書人閱讀 32,971評論 1 269
  • 我被黑心中介騙來泰國打工近弟, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人窗宦。 一個(gè)月前我還...
    沈念sama閱讀 48,025評論 2 370
  • 正文 我出身青樓赴涵,卻偏偏與公主長得像订讼,于是被迫代替她去往敵國和親欺殿。 傳聞我的和親對象是個(gè)殘疾皇子祈餐,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,843評論 2 354

推薦閱讀更多精彩內(nèi)容