Adventure電商分析項目總結(jié)

本文是對Adventure Bicycle案例的一個總結(jié)，記錄了整個項目需求分析與實現(xiàn)的過程搁凸，主要任務(wù)是使用Hive SQL完成ETL過程萌朱，并且連接到PowerBI實現(xiàn)可視化屠阻，最終將整個分析成果展示出來红省。

一、項目背景

Adventure Works Cycle是國內(nèi)一家制造公司国觉，該公司生產(chǎn)和銷售金屬和復(fù)合材料自行車在全國各個市場吧恃。銷售方式主要有線上零售和線下批發(fā)或零售。

產(chǎn)品介紹
目前公司主要有下面四個產(chǎn)品線：
Adventure Works Cycles生產(chǎn)的自行車麻诀。
自行車部件痕寓，例如車輪缸逃，踏板或制動組件。
從供應(yīng)商處購買的自行車服裝厂抽，用于轉(zhuǎn)售給Adventure Works Cycles的客戶需频。
從供應(yīng)商處購買的自行車配件，用于轉(zhuǎn)售給Adventure Works Cycles的客戶筷凤。

二昭殉、項目任務(wù)

隨著線上業(yè)務(wù)的開展，需要增強公司數(shù)據(jù)化方面的治理藐守，讓前線的業(yè)務(wù)同學(xué)能夠?qū)崿F(xiàn)自主分析挪丢，從而能實現(xiàn)對市場的快速判斷。因此卢厂，要求數(shù)據(jù)部門和業(yè)務(wù)部門溝通需求的自主分析的數(shù)據(jù)指標乾蓬，從而實現(xiàn)可視化看板。
業(yè)務(wù)需求：查看最新的銷量慎恒，銷售額趨勢以及個商品的銷售占比任内，獲取當天，前一天融柬，當月死嗦，當季，當年的各區(qū)域各城市銷量銷售額粒氧，以及同比數(shù)據(jù)越除。

三、分析過程

要實現(xiàn)用戶自主分析外盯，必須具備兩點：
（1）具有可視化操作頁面
（2）數(shù)據(jù)能自動更新

Power BI可以實現(xiàn)用戶的可視化操作摘盆，只要把相關(guān)的表聚合后展示需要的信息到Power BI上即可。但聚合后的數(shù)據(jù)是固定的饱苟，所以要把聚合表的代碼部署到linux服務(wù)器上孩擂，讓系統(tǒng)自動去運行聚合表的代碼，更新數(shù)據(jù)掷空，從而實現(xiàn)自主分析肋殴。

整體分析流程如下圖所示：

項目流程

準備工作：mysql 數(shù)據(jù)源囤锉，Hive數(shù)據(jù)庫坦弟，工具：Sqoop,Power BI 服務(wù)器:linux

1.mysql數(shù)據(jù)源中觀察數(shù)據(jù)

數(shù)據(jù)庫中一共有26張表，根據(jù)業(yè)務(wù)需求官地，梳理出要使用到的三張表：

2.構(gòu)建指標體系

指標維度：
時間維度：今日酿傍、昨日、當月驱入、當季赤炒、當年
地域維度：銷售大區(qū)氯析、省份、城市
產(chǎn)品維度：產(chǎn)品類別莺褒、產(chǎn)品占比掩缓、熱銷產(chǎn)品

3.通過sqoop抽取數(shù)據(jù)到hive數(shù)據(jù)庫

Sqoop：SQL-to-Hadoop

連接傳統(tǒng)關(guān)系型數(shù)據(jù)庫和 Hadoop 的工具

Sqoop是一個轉(zhuǎn)換工具，用于在關(guān)系型數(shù)據(jù)庫與Hive等之間進行數(shù)據(jù)轉(zhuǎn)換

通過sqoop將日期維度表遵岩、每日新增用戶表你辣、訂單明細表將數(shù)據(jù)從mysql中抽取到hive的ods層，通常將代碼寫在shell腳本上尘执，在linux 系統(tǒng)中運行即可舍哄。

下面是部分shell腳本代碼（sqoop_ods_sales_orders.sh）從訂單明細表中抽取數(shù)據(jù)到hive：

hive -e "truncate table ods.ods_sales_orders"  # 刪除hive原有的舊表
sqoop import \
--hive-import \                    # 將數(shù)據(jù)導(dǎo)入hive中
--connect "jdbc:mysql://106.13.128.83:3306/adventure_ods?useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&dontTrackOpenResources=true&defaultFetchSize=50000&useCursorFetch=true" \
--driver com.mysql.jdbc.Driver \   # jdbc驅(qū)動類型
--username *** \   # 數(shù)據(jù)庫用戶名
--password *** \   #數(shù)據(jù)庫連接密碼
--query \
"select * from ods_sales_orders where "'$CONDITIONS'" " \   # 導(dǎo)入查詢結(jié)果集
--fetch-size 50000 \                   # 一次從數(shù)據(jù)庫讀取的條目數(shù)
--hive-table ods.ods_sales_orders \    # 輸出表的名稱
--hive-drop-import-delims \   # 在導(dǎo)入數(shù)據(jù)到hive時，去掉數(shù)據(jù)中的\r\n\013\010這樣的字符
--delete-target-dir \          # 刪除導(dǎo)入目標目錄
--target-dir /user/hadoop/sqoop/ods_sales_orders \   # 將數(shù)據(jù)導(dǎo)出目標文件目錄(hdfs目錄)
-m 1             #啟動多個mapper并行執(zhí)行導(dǎo)入

4.建立數(shù)據(jù)倉庫誊锭，對數(shù)據(jù)進行聚合

聚合流程圖：

編寫hive sql從數(shù)據(jù)倉庫ods層的日期維度表表悬、每日新增用戶表、訂單明細表讀取數(shù)據(jù)進行數(shù)據(jù)聚合丧靡，完成當日維度表（dw_amount_diff）蟆沫、時間-地區(qū)-產(chǎn)品聚合表（dw_customer_order）,每日環(huán)比表（dw_order_by_day）的聚合操作。

下面是部分shell腳本代碼（create_dw_order_by_day.sh）從ods層的訂單明細表中讀取數(shù)據(jù)來聚合每日環(huán)比表：

首先在DW層創(chuàng)建聚合表

## 創(chuàng)建聚合表
hive -e "drop table if exists ods.dw_order_by_day"
hive -e "
CREATE TABLE ods.dw_order_by_day(
  create_date string,
  is_current_year bigint,
  is_last_year bigint,
  is_yesterday bigint,
  is_today bigint,
  is_current_month bigint,
  is_current_quarter bigint,
  sum_amount double,
  order_count bigint)
"

然后將聚合結(jié)果導(dǎo)入數(shù)據(jù)表中：

## 這里是hive的查詢語句温治，因為做聚合需要關(guān)聯(lián)多張表做聚合饥追，這里使用with查詢來提高查詢性能
hive -e "
with dim_date as
(select create_date,
            is_current_year,
            is_last_year,
            is_yesterday,
            is_today,
            is_current_month,
            is_current_quarter
            from ods.dim_date_df),
sum_day as
(select create_date,
        sum(unit_price) as sum_amount,
        count(customer_key) as order_count
        from ods.ods_sales_orders
        group by create_date)
insert into ods.dw_order_by_day
    select b.create_date,
    b.is_current_year,
    b.is_last_year,
    b.is_yesterday,
    b.is_today,
    b.is_current_month,
    b.is_current_quarter,
    a.sum_amount,
    a.order_count
    from sum_day as a
    inner join dim_date as b
    on a.create_date=b.create_date
"

5.Sqoop從Hive導(dǎo)出數(shù)據(jù)到mysql

Sqoop Export :導(dǎo)出
將數(shù)據(jù)從Hadoop（如hive等）導(dǎo)入關(guān)系型數(shù)據(jù)庫導(dǎo)中
- 步驟1：Sqoop與數(shù)據(jù)庫Server通信，獲取數(shù)據(jù)庫表的元數(shù)據(jù)信息罐盔；
- 步驟2：并行導(dǎo)入數(shù)據(jù)：
- 將Hadoop上文件劃分成若干個split但绕；
- 每個split由一個Map Task進行數(shù)據(jù)導(dǎo)入

現(xiàn)在需要通過sqoop把時間-地區(qū)-產(chǎn)品聚合表（dw_customer_order）,每日環(huán)比表（dw_order_by_day）、當日維度表（dw_amount_diff）分別從Hive數(shù)據(jù)庫遷入到mysql的數(shù)據(jù)庫中惶看。

CREATE TABLE `dw_order_by_day` (
   `create_date` date DEFAULT NULL,
   `sum_amount` double DEFAULT NULL,
   `sum_order` bigint(20) DEFAULT NULL,
   `amount_div_order` double DEFAULT NULL,
   `sum_amount_goal` double DEFAULT NULL,
   `sum_order_goal` double DEFAULT NULL,
   `is_current_year` int(11) DEFAULT NULL,
   `is_last_year` int(11) DEFAULT NULL,
   `is_yesterday` int(11) DEFAULT NULL,
   `is_today` int(11) DEFAULT NULL,
   `is_current_month` int(11) DEFAULT NULL,
   `is_current_quarter` int(11) DEFAULT NULL,
   `is_21_day` int(11) DEFAULT NULL,
   `amount_diff` double DEFAULT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

下面是部分shell腳本代碼（export_dw_order_by_day .sh）將數(shù)據(jù)每日環(huán)比表中數(shù)據(jù)從hive遷入mysql中：

在數(shù)據(jù)導(dǎo)出之前先在mysql數(shù)據(jù)庫中創(chuàng)建數(shù)據(jù)表（dw_order_by_day）

CREATE TABLE `dw_order_by_day` (
   `create_date` date DEFAULT NULL,
   `sum_amount` double DEFAULT NULL,
   `sum_order` bigint(20) DEFAULT NULL,
   `amount_div_order` double DEFAULT NULL,
   `sum_amount_goal` double DEFAULT NULL,
   `sum_order_goal` double DEFAULT NULL,
   `is_current_year` int(11) DEFAULT NULL,
   `is_last_year` int(11) DEFAULT NULL,
   `is_yesterday` int(11) DEFAULT NULL,
   `is_today` int(11) DEFAULT NULL,
   `is_current_month` int(11) DEFAULT NULL,
   `is_current_quarter` int(11) DEFAULT NULL,
   `is_21_day` int(11) DEFAULT NULL,
   `amount_diff` double DEFAULT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

開始進行sqoop從hive數(shù)據(jù)庫抽取數(shù)據(jù)到mysql數(shù)據(jù)庫

sqoop export --connect "jdbc:mysql://106.15.121.232:3306/datafrog05_adventure" \
--username *** \
--password *** \
--table dw_order_by_day \     # mysql數(shù)據(jù)庫建好的表
--export-dir /user/hive/warehouse/ods.db/dw_order_by_day \   #hive數(shù)據(jù)庫數(shù)據(jù)路徑
--input-null-string "\\\\N" \
--input-null-non-string "\\\\N"  \
--input-fields-terminated-by "\001"  \
--input-lines-terminated-by "\\n"  \
-m 1

6.在linux上做定時部署

linux的定時任務(wù)使用crontab文件來實現(xiàn)捏顺，
（1）編寫shedule.sh文件，按執(zhí)行順序添加文件

#!/bin/bash
sh /home/frog005/adventure_Bourton/sqoop_ods_sales_order.sh
sh /home/frog005/adventure_Bourton/sqoop_dim_date.sh
sh /home/frog005/adventure_Bourton/sqoop_ods_sales_orders.sh

sh /home/frog005/adventure_Bourton/create_dw_order_by_day.sh
sh /home/frog005/adventure_Bourton/create_dw_amount_diff.sh
sh /home/frog005/adventure_Bourton/create_dw_customer_order.sh

sh /home/frog005/adventure_Bourton/export_dw_order_by_day.sh
sh /home/frog005/adventure_Bourton/export_dw_amount_diff.sh
sh /home/frog005/adventure_Bourton/export_dw_customer_order.sh

（2）添加定時任務(wù)纬黎，設(shè)定每天早上6點執(zhí)行
編輯crontab 文件 :vi /etc/crontab
添加定時任務(wù)：

0 6 * * * /home/frog005/adventure_sunnyxhd/schedule.sh

四幅骄、連接Power bi 部署展示

前面的步驟基本完成后，就可以把mysql與power bi 連接起來本今，實現(xiàn)bi數(shù)據(jù)的自動更新拆座。

6.1 Power bi報表展示

報表一共有3頁，包括主頁冠息、時間趨勢圖逛艰、區(qū)域分布圖菇绵。

主頁展示內(nèi)容

基本銷售指標：銷售額咬最、訂單量永乌、客戶數(shù)量硝桩、客單價及相應(yīng)同比指標
從時間維度分析年度、季度衙伶、月度害碾、周慌随、日銷售情況
銷售排名前10的產(chǎn)品
產(chǎn)品的結(jié)構(gòu)
區(qū)域阁猜、商品類型切片器

時間趨勢圖

展示時間維度：從日剃袍，月憔维，季，年維度分析產(chǎn)品的銷售額凶赁，訂單量，客戶數(shù)量咏窿，客單價趨勢變化
區(qū)域素征，產(chǎn)品，時間類型切片器

區(qū)域分布圖
1.展示各區(qū)域在一定時間段的銷售金額根欧，訂單量凤粗，客戶量今豆。
2.展示各城市在一定時間段的銷售金額异逐，訂單量，客戶量辅甥。
3.各區(qū)域銷售金額占比
4.區(qū)域袍祖，產(chǎn)品蕉陋，時間類型切片器