1. 第一層ODS層
1 ODS層存放您從業(yè)務(wù)系統(tǒng)獲取的最原始的數(shù)據(jù)壮不,是其他上層數(shù)據(jù)的源數(shù)據(jù)蔼夜。
2 本項目中使用的ODS層數(shù)據(jù)主要包括: 用戶基本信息判呕、商品分類信息、商品信息磁浇、店鋪信息斋陪、 訂單數(shù)據(jù)、訂單支付信息置吓、活動信息无虚、行為日志信息
1.1 業(yè)務(wù)數(shù)據(jù)表
數(shù)倉ods層和業(yè)務(wù)數(shù)據(jù)表的對應(yīng)關(guān)系
中文含義 | MYSQL表名 | HIVE表名 |
---|---|---|
用戶基本信表 | nshop.customer | ods_nshop.ods_nshop_02_customer |
用戶關(guān)注表 | nshop.customer_attention | ods_nshop.ods_nshop_02_customer_attention |
收貨地址管理表 | nshop.customer_consignee | ods_nshop.ods_nshop_02_customer_consignee |
訂單表 | nshop.orders | ods_nshop.ods_nshop_02_orders |
訂單詳情表 | nshop.order_detail | ods_nshop.ods_nshop_02_order_detail |
訂單支付記錄表 | nshop.orders_pay_records | ods_nshop.ods_nshop_02_orders_pay_records |
1.1.1 MySQL數(shù)據(jù)庫建表
創(chuàng)建數(shù)據(jù)庫:
create database if not exists ods_nshop;
創(chuàng)建表:
下載數(shù)據(jù):
https://download.csdn.net/download/qq_28286027/12529873
解壓后為nshop.sql,將數(shù)據(jù)直接執(zhí)行source nshop.sql命令交洗,導入mysql的ods_nshop表即可骑科。
導入后數(shù)據(jù)表如下:
1.1.2 Hive數(shù)據(jù)庫建表
創(chuàng)建Hive庫并進入:
create database if not exists ods_nshop;
use ods_nshop;
創(chuàng)建表:
1.創(chuàng)建用戶基本信息表:
CREATE TABLE
IF NOT EXISTS ods_nshop.ods_02_customer (
customer_id string COMMENT '用戶ID',
customer_login string COMMENT '用戶登錄名',
customer_nickname string COMMENT '用戶名(昵稱)',
customer_name string COMMENT '用戶真實姓名',
customer_pass string COMMENT '用戶密碼',
customer_mobile string COMMENT '用戶手機',
customer_idcard string COMMENT '身份證',
customer_gender TINYINT COMMENT '性別:1男 0女',
customer_birthday string COMMENT '出生年月',
customer_email string COMMENT '用戶郵箱',
customer_natives string COMMENT '所在地區(qū)',
customer_ctime BIGINT COMMENT '創(chuàng)建時間',
customer_utime BIGINT COMMENT '修改時間'
) location '/data/nshop/ods/ods_02_customer/';
從MySQL導入數(shù)據(jù):
sqoop import \
--connect jdbc:mysql://master:3306/nshop \
--username root \
--password 111111 \
--table customer \
--hive-import \
--hive-overwrite \
--hive-table ods_nshop.ods_02_customer \
-m 1
用戶關(guān)注表:
CREATE TABLE
IF NOT EXISTS ods_nshop.ods_02_customer_attention (
customer_id string COMMENT '用戶ID',
attention_id string COMMENT '關(guān)注對象ID',
attention_type TINYINT COMMENT '關(guān)注類型:3店鋪 4商品',
attention_status BIGINT COMMENT '關(guān)注狀態(tài):1關(guān)注 0取消',
attention_ctime BIGINT COMMENT '生成時間'
) location '/data/nshop/ods/ods_02_customer_attention/';
收貨地址管理表:
CREATE TABLE
IF NOT EXISTS ods_nshop.ods_02_customer_consignee (
consignee_id string COMMENT '收貨地址ID',
customer_id string COMMENT '用戶ID',
consignee_name string COMMENT '收貨人',
consignee_mobile string COMMENT '收貨人電話',
consignee_zipcode string COMMENT '收貨人地區(qū)',
consignee_addr string COMMENT '收貨人詳細地址',
consignee_tag string COMMENT '標簽:1家 2公司 3學校',
ctime BIGINT COMMENT '創(chuàng)建時間'
) location '/data/nshop/ods/ods_02_customer_consignee/'
從MySQL導入數(shù)據(jù):
sqoop import \
--connect jdbc:mysql://master:3306/nshop \
--username root \
--password 111111 \
--table customer_consignee \
--hive-import \
--hive-overwrite \
--hive-table ods_nshop.ods_02_customer_consignee \
-m 1
訂單表:
CREATE TABLE
IF NOT EXISTS ods_nshop.ods_02_orders (
order_id string COMMENT '訂單ID(時間+商品ID+4位隨機)',
customer_id string COMMENT '下單用戶ID',
order_status TINYINT COMMENT '訂單狀態(tài)',
customer_ip string COMMENT '下單用戶IP',
user_longitude string COMMENT '用戶地理:經(jīng)度',
user_latitude string COMMENT '用戶地理:緯度',
user_areacode string COMMENT '用戶所在地區(qū)',
consignee_name string COMMENT '收貨人',
consignee_mobile string COMMENT '收貨人電話',
consignee_zipcode string COMMENT '收貨人地址',
pay_type string COMMENT '支付類型:線上支付 10 網(wǎng)上銀行 11 微信 12 支付寶 | 線下支付 (貨到付款) 20 ',
pay_code string COMMENT '支付對應(yīng)唯一標識橡淑,如微信號构拳、支付寶號',
pay_nettype string COMMENT '支付網(wǎng)絡(luò)方式:0 wifi | 1 4g | 2 3g |3 線下支付',
district_money DECIMAL (8, 1) COMMENT '優(yōu)惠金額',
shipping_money DECIMAL (8, 1) COMMENT '運費金額',
payment_money DECIMAL (10, 1) COMMENT '支付金額',
order_ctime BIGINT COMMENT '創(chuàng)建時間',
shipping_time BIGINT COMMENT '發(fā)貨時間',
receive_time BIGINT COMMENT '收貨時間'
) location '/data/nshop/ods/ods_02_orders/';
從MySQL導入數(shù)據(jù):
sqoop import \
--connect jdbc:mysql://master:3306/nshop \
--username root \
--password 111111 \
--table orders \
--hive-import \
--hive-overwrite \
--hive-table ods_nshop.ods_02_orders \
-m 1
訂單詳情表:
CREATE TABLE
IF NOT EXISTS ods_nshop.ods_02_order_detail (
order_detail_id string COMMENT '訂單詳情表ID',
order_id string COMMENT '訂單表ID',
product_id string COMMENT '訂單商品ID',
product_name string COMMENT '商品名稱',
product_remark string COMMENT '商品描述',
product_cnt INT COMMENT '購買商品數(shù)量',
product_price DECIMAL (5, 1) COMMENT '購買商品單價',
weighing_cost DECIMAL (2, 1) COMMENT '商品加權(quán)價格',
district_money DECIMAL (4, 1) COMMENT '優(yōu)惠金額',
is_activity BIGINT COMMENT '1:參加活動|0:沒有參加活動',
order_detail_ctime BIGINT COMMENT '下單時間'
) location '/data/nshop/ods/ods_02_order_detail/';
從MySQL導入數(shù)據(jù):
sqoop import \
--connect jdbc:mysql://master:3306/nshop \
--username root \
--password 111111 \
--table order_detail \
--hive-import \
--hive-overwrite \
--hive-table ods_nshop.ods_02_order_detail \
-m 1
訂單支付記錄表:
CREATE TABLE
IF NOT EXISTS ods_nshop.ods_02_orders_pay_records (
pay_id string COMMENT '支付記錄ID',
order_id string COMMENT '訂單ID',
customer_id string COMMENT '用戶ID',
pay_status string COMMENT '支付狀態(tài):0 支付失敗| 1 支付成功',
pay_type string COMMENT '支付類型:線上支付 10 網(wǎng)上銀行 11 微信 12 支付寶 | 線下支付 (貨到付款) 20 ',
pay_code string COMMENT '支付對應(yīng)唯一標識,如微信號梁棠、支付寶號',
pay_nettype string COMMENT '支付網(wǎng)絡(luò)方式:1 wifi | 2 4g | 3 3g |4 線下支付',
pay_amount DOUBLE COMMENT '支付金額',
pay_ctime BIGINT COMMENT '創(chuàng)建時間'
) location '/data/nshop/ods/ods_02_orders_pay_records/';
從MySQL導入數(shù)據(jù):
sqoop import \
--connect jdbc:mysql://master:3306/nshop \
--username root \
--password 111111 \
--table orders_pay_records \
--hive-import \
--hive-overwrite \
--hive-table ods_nshop.ods_02_orders_pay_records \
-m 1
1.2 埋點日志
電商平臺在網(wǎng)頁版或APP版終端上進行業(yè)務(wù)埋點置森,采集相關(guān)用戶行為數(shù)據(jù)發(fā)送給服務(wù)器進行用戶行為分析。
中文含義 | HIVE表名 |
---|---|
用戶行為日志表 | ods_nshop.ods_nshop_01_useractlog |
用戶行為日志表創(chuàng)建:
CREATE external TABLE
IF NOT EXISTS ods_nshop.ods_nshop_01_useractlog (
action string COMMENT '行為類型:install安裝|launch啟動|interactive交 互|page_enter_h5頁面曝光|page_enter_native頁面進入|exit退出',
event_type string COMMENT '行為類型:click點擊|view瀏覽|slide滑動|input輸入',
customer_id string COMMENT '用戶id',
device_num string COMMENT '設(shè)備號',
device_type string COMMENT '設(shè)備類型',
os string COMMENT '手機系統(tǒng)',
os_version string COMMENT '手機系統(tǒng)版本',
manufacturer string COMMENT '手機制造商',
carrier string COMMENT '電信運營商',
network_type string COMMENT '網(wǎng)絡(luò)類型',
area_code string COMMENT '地區(qū)編碼',
longitude string COMMENT '經(jīng)度',
latitude string COMMENT '緯度',
extinfo string COMMENT '擴展信息(json格式)',
ct BIGINT COMMENT '創(chuàng)建時間'
) partitioned BY (bdp_day string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE location '/data/nshop/ods/user_action_log/';
注意:json格式數(shù)據(jù)表需要通過serde機制處理
- 在hive-site.xml中設(shè)置三方j(luò)ar包
<property>
<name>hive.aux.jars.path</name>
<value>/usr/local/src/apache-hive-3.1.0-bin/lib/</value>
</property>
- 在hive.aux.jars.path設(shè)置的路徑中增加hive-hcatalog-core-2.1.1.jar 到/usr/local/src/apache-hive-3.1.0-bin/lib/中
執(zhí)行:
]# /usr/local/src/apache-hive-3.1.0-bin/hcatalog/share/hcatalog/hive-hcatalog-core-2.1.1.jar /usr/local/src/apache-hive-3.1.0-bin/lib/
導入用戶行為數(shù)據(jù)到hive中:
數(shù)據(jù)下載地址:https://download.csdn.net/download/qq_28286027/12532059
用戶行為日志文件000000_0前15行如下:
{"action":"05","event_type":"01","customer_id":"20101000324999676","device_num":"586344","device_type":"9","os":"2","os_version":"2.2","manufacturer":"05","carrier":"2","network_type":"2","area_code":"41092","longitude":"116.35636","latitude":"40.06919","extinfo":"{\"target_type\":\"4\",\"target_keys\":\"20402\",\"target_order\":\"31\",\"target_ids\":\"[\\\"4320402595801\\\",\\\"4320402133801\\\",\\\"4320402919201\\\",\\\"4320402238501\\\"]\"}","ct":1567896035000}
{"action":"05","event_type":"02","customer_id":"20101000405999595","device_num":"956236","device_type":"9","os":"2","os_version":"4.3","manufacturer":"08","carrier":"2","network_type":"3","area_code":"10015","longitude":"116.35985","latitude":"40.069590000000005","extinfo":"{\"target_type\":\"3\",\"target_id\":\"4320406544401\",\"target_action\":\"01\"}","ct":1567896035000}
{"action":"05","event_type":"02","customer_id":"20101000648999352","device_num":"876263","device_type":"2","os":"2","os_version":"6.2","manufacturer":"07","carrier":"2","network_type":"2","area_code":"42058","longitude":"116.34721","latitude":"40.06685","extinfo":"{\"target_type\":\"3\",\"target_id\":\"4320701402201\",\"target_action\":\"01\"}","ct":1567896035000}
{"action":"07","event_type":"02","customer_id":"20101000729999271","device_num":"451481","device_type":"9","os":"2","os_version":"4.5","manufacturer":"09","carrier":"3","network_type":"2","area_code":"37162","longitude":"116.35737","latitude":"40.06962","extinfo":"{\"target_id\":\"4320207200701\"}","ct":1567896035000}
{"action":"06","event_type":"","customer_id":"20101000810999190","device_num":"114649","device_type":"9","os":"2","os_version":"8.2","manufacturer":"02","carrier":"3","network_type":"0","area_code":"35062","longitude":"116.38244999999999","latitude":"40.06447","extinfo":"{}","ct":1567896035000}
{"action":"07","event_type":"01","customer_id":"20101000972999028","device_num":"332835","device_type":"2","os":"2","os_version":"7.5","manufacturer":"02","carrier":"2","network_type":"0","area_code":"15082","longitude":"116.33457","latitude":"40.06449","extinfo":"{\"target_id\":\"4320701224301\"}","ct":1567896035000}
{"action":"05","event_type":"02","customer_id":"20101000972999028","device_num":"332835","device_type":"2","os":"2","os_version":"7.5","manufacturer":"02","carrier":"2","network_type":"0","area_code":"15082","longitude":"116.33457","latitude":"40.06449","extinfo":"{\"target_type\":\"3\",\"target_id\":\"4320701224301\",\"target_action\":\"01\"}","ct":1567896035000}
{"action":"08","event_type":"02","customer_id":"20101001377998623","device_num":"269526","device_type":"9","os":"2","os_version":"2.2","manufacturer":"05","carrier":"2","network_type":"3","area_code":"13053","longitude":"116.36325","latitude":"40.06456","extinfo":"{\"target_id\":\"4320901702901\"}","ct":1567896035000}
{"action":"08","event_type":"01","customer_id":"20101001458998542","device_num":"252612","device_type":"9","os":"2","os_version":"7.5","manufacturer":"03","carrier":"2","network_type":"2","area_code":"10013","longitude":"116.39949","latitude":"40.069250000000004","extinfo":"{\"target_id\":\"4320901657101\"}","ct":1567896035000}
{"action":"07","event_type":"02","customer_id":"20101001701998299","device_num":"314141","device_type":"9","os":"2","os_version":"4.8","manufacturer":"06","carrier":"2","network_type":"2","area_code":"45102","longitude":"116.37762","latitude":"40.06812","extinfo":"{\"target_id\":\"4320206030201\"}","ct":1567896035000}
{"action":"05","event_type":"02","customer_id":"20101001782998218","device_num":"273141","device_type":"9","os":"2","os_version":"2.5","manufacturer":"04","carrier":"3","network_type":"3","area_code":"19621","longitude":"116.39714","latitude":"40.06693","extinfo":"{\"target_type\":\"3\",\"target_id\":\"4320506927501\",\"target_action\":\"01\"}","ct":1567896035000}
{"action":"05","event_type":"02","customer_id":"20101001944998056","device_num":"962792","device_type":"9","os":"2","os_version":"2.3","manufacturer":"07","carrier":"2","network_type":"2","area_code":"10008","longitude":"116.39224","latitude":"40.067510000000006","extinfo":"{\"target_type\":\"3\",\"target_id\":\"4320102333401\",\"target_action\":\"01\"}","ct":1567896035000}
{"action":"05","event_type":"01","customer_id":"20101002025997975","device_num":"133595","device_type":"2","os":"2","os_version":"2.2","manufacturer":"04","carrier":"3","network_type":"3","area_code":"37091","longitude":"116.37975","latitude":"40.06192","extinfo":"{\"target_type\":\"4\",\"target_keys\":\"20206\",\"target_order\":\"20\",\"target_ids\":\"[\\\"4320206871201\\\",\\\"4320206738601\\\",\\\"4320206632601\\\",\\\"4320206465801\\\"]\"}","ct":1567896035000}
{"action":"08","event_type":"01","customer_id":"20101002187997813","device_num":"648218","device_type":"2","os":"2","os_version":"8.2","manufacturer":"03","carrier":"2","network_type":"0","area_code":"13018","longitude":"116.38544","latitude":"40.06662","extinfo":"{\"target_id\":\"4320701967401\"}","ct":1567896035000}
{"action":"08","event_type":"02","customer_id":"20101002187997813","device_num":"648218","device_type":"2","os":"2","os_version":"8.2","manufacturer":"03","carrier":"2","network_type":"0","area_code":"13018","longitude":"116.38544","latitude":"40.06662","extinfo":"{\"target_id\":\"4320701967401\"}","ct":1567896035000}
導入:
]# load data local inpath '/usr/local/user_behavior_log/000000_0' into table ods_nshop_01_useractlog partition(bdp_day='20200618');
行為日志解析:
1.3 外部廣告投放相關(guān)數(shù)據(jù)
外部數(shù)據(jù)來源主要有: 1 企業(yè)間的接口調(diào)用(如其他企業(yè)平臺上的廣告投放營銷) 2 公共數(shù)據(jù)的爬蟲數(shù)據(jù)(公共數(shù)據(jù)符糊、友商數(shù)據(jù)凫海、社區(qū)|社交平臺公開數(shù)據(jù))
中文含義 | HIVE表名 |
---|---|
廣告投放信息表 | ods_nshop.ods_nshop_01_releasedatas |
創(chuàng)建廣告投放信息表:
CREATE external TABLE
IF NOT EXISTS ods_nshop.ods_01_releasedatas (
customer_id string COMMENT '用戶id',
device_num string COMMENT '設(shè)備號',
device_type string COMMENT '設(shè)備類型',
os string COMMENT '手機系統(tǒng)',
os_version string COMMENT '手機系統(tǒng)版本',
manufacturer string COMMENT '手機制造商',
area_code string COMMENT '地區(qū)編碼',
release_sid string COMMENT '投放請求id',
release_session string COMMENT '投放會話id',
release_sources string COMMENT '投放渠道',
release_params string COMMENT '投放請求參數(shù)',
ct BIGINT COMMENT '創(chuàng)建時間'
) partitioned BY (bdp_day string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored AS textfile location '/data/nshop/ods/release/';
導入數(shù)據(jù):
數(shù)據(jù)下載地址:https://download.csdn.net/download/qq_28286027/12529938
數(shù)據(jù)文件1.csv前10行:
customer_id,device_num,device_type,os,os_version,manufacturer,area_code,release_sid,release_session,release_sources,release_params,release_ctime,nbdp_day
20103000002999998,433229,9,2,9.4,03,51152,1567291235000fjdtkp,1567291235000433229742121,douyin,ip=200.95.79.96&deviceNum=433229&lon=null&lat=null&aid=null&ctime=1567291302000&sources=douyin&session=1567291235000433229742121&productPage=4320309458301,1567896035000,20190908
20105000004999996,121417,2,2,4.2,09,44160,1567291235000sjibrq,1567291235000121417572782,liebao,ip=65.200.95.194&deviceNum=121417&lon=null&lat=null&aid=null&ctime=1567291346000&sources=liebao&session=1567291235000121417572782&productPage=4320904071301,1567896035000,20190908
20302000019999981,349828,2,2,1.7,02,21142,1567291235000gkpgvs,1567291235000349828624925,netyex,ip=70.183.190.208&deviceNum=349828&lon=null&lat=null&aid=null&ctime=1567291385000&sources=netyex&session=1567291235000349828624925&productPage=4320309376701,1567896035000,20190908
20506000041999959,969247,2,2,8.7,10,21132,1567291235000qgijuq,1567291235000969247224224,ximalaya,ip=157.66.156.74&deviceNum=969247&lon=null&lat=null&aid=null&ctime=1567291421000&sources=ximalaya&session=1567291235000969247224224&productPage=4320103859601,1567896035000,20190908
20508000043999957,865977,9,2,8.9,04,62052,1567291235000gppiwp,1567291235000865977661164,baidu,ip=58.151.213.230&deviceNum=865977&lon=null&lat=null&aid=null&ctime=1567291485000&sources=baidu&session=1567291235000865977661164&productPage=4320909655701,1567896035000,20190908
20705000058999942,752814,9,2,4.1,10,43022,1567291235000fftdsv,1567291235000752814286721,tencent,ip=57.131.111.229&deviceNum=752814&lon=null&lat=null&aid=null&ctime=1567291419000&sources=tencent&session=1567291235000752814286721&productPage=4320404173901,1567896035000,20190908
20708000061999939,466888,2,2,5.3,10,14092,1567291235000kesjrt,1567291235000466888256571,netyex,ip=72.29.250.178&deviceNum=466888&lon=null&lat=null&aid=null&ctime=1567291324000&sources=netyex&session=1567291235000466888256571&productPage=4320408025201,1567896035000,20190908
20106000086999914,794259,9,2,1.8,03,51342,1567291235000wjsqjs,1567291235000794259134297,liebao,ip=241.84.18.204&deviceNum=794259&lon=null&lat=null&aid=null&ctime=1567291318000&sources=liebao&session=1567291235000794259134297&productPage=4320902876101,1567896035000,20190908
20107000087999913,225827,2,2,5.9,09,65010,1567291235000sgxjgn,1567291235000225827843226,liebao,ip=75.29.250.157&deviceNum=225827&lon=null&lat=null&aid=null&ctime=1567291304000&sources=liebao&session=1567291235000225827843226&productPage=4320603519801,1567896035000,20190908
導入:
load data local inpath '/usr/local/user_behavior_log/1.csv' into table ods_01_releasedatas partition(bdp_day='20200618');
三方數(shù)據(jù)ODS層的建表檢查:
檢查,訪問50070端口查看:
1.4 數(shù)據(jù)清洗
數(shù)據(jù)在采集或進行外部接口調(diào)用過程中可能會產(chǎn)生不符合要求的“臟數(shù)據(jù)”,臟數(shù)據(jù)產(chǎn) 生原因及處理如下:
1.4.1 格式內(nèi)容問題產(chǎn)生的原因:
1)不同數(shù)據(jù)源采集而來的數(shù)據(jù)內(nèi)容和格式定義不一致
2)時間男娄、日期格式不一致清洗行贪,根據(jù)實際情況,把時間/日期數(shù)據(jù)庫轉(zhuǎn)換成統(tǒng)一 的表示方式模闲。
3)數(shù)據(jù)類型不符清洗
1.4.2 邏輯錯誤清洗
1)數(shù)據(jù)重復(fù)清洗
2)數(shù)據(jù)不完全相同建瘫,但從業(yè)務(wù)角度看待數(shù)據(jù)是同一個數(shù)據(jù),如頁面埋點時尸折,進入 頁面都會上報一次數(shù)據(jù)啰脚,只有時間不一樣,其他字段相同实夹,在統(tǒng)計pv/uv時應(yīng)該進 行去重橄浓。
3)矛盾內(nèi)容的修正粒梦,如身份證號可以驗證我們的年齡,然后我們可以根據(jù)字段的 數(shù)據(jù)來源荸实,判斷哪個字段提供的信息更為可靠來做修正匀们。
1.4.3 缺失值的清洗
1)數(shù)據(jù)值缺失是數(shù)據(jù)分析中經(jīng)常遇到的問題之一。
2)造成原因:有些對象的某個或某些屬性是不可用的准给,信息被遺漏昼蛀,或者實時性 高還未來得及做出判斷。
3)數(shù)據(jù)填充:用一定的值去填充空值圆存,從而使信息表完備化叼旋。
4)去空處理.
1.4.4 不符合業(yè)務(wù)需求(或挖掘分析需求)的數(shù)據(jù)
在這里,為了減少復(fù)雜沦辙,就不進行數(shù)據(jù)清洗了夫植。