Adventure Works Cycles公司數(shù)據(jù)分析
1 背景
????????????Adventure Works Cycles是AdventureWorks樣本數(shù)據(jù)庫(kù)所虛構(gòu)的公司断部,這是一家大型跨國(guó)制造公司。該公司生產(chǎn)和銷(xiāo)售金屬和復(fù)合材料自行車(chē)到北美,歐洲和亞洲的商業(yè)市場(chǎng)慎式。
Adventure Works Cycle這家公司的客戶主要有兩種:
???????????? 個(gè)體:這些客戶購(gòu)買(mǎi)商品是通過(guò)網(wǎng)上零售店鋪
?????????????商店:這些是從Adventure Works Cycles銷(xiāo)售代表處購(gòu)買(mǎi)轉(zhuǎn)售產(chǎn)品的零售店或批發(fā)店
這家公司主要有下面四個(gè)產(chǎn)品線:
??????????????Adventure Works Cycles 生產(chǎn)的自行車(chē)
??????????????自行車(chē)部件,例如車(chē)輪,踏板或制動(dòng)組件
??????????????從供應(yīng)商處購(gòu)買(mǎi)的自行車(chē)服裝瘪吏,用于轉(zhuǎn)售給Adventure Works Cycles的客戶癣防。
??????????????從供應(yīng)商處購(gòu)買(mǎi)的自行車(chē)配件,用于轉(zhuǎn)售給Adventure Works Cycles客戶掌眠。
2?項(xiàng)目目的
? ? ? ? ? ? ??將數(shù)據(jù)導(dǎo)入Hive數(shù)據(jù)庫(kù)
??????????????探索數(shù)據(jù)庫(kù)并羅列分析指標(biāo)
??????????????匯總數(shù)據(jù)建立數(shù)據(jù)倉(cāng)庫(kù)(銷(xiāo)售主題)
??????????????powerbi可視化
? ????????????制作11月自行車(chē)業(yè)務(wù)分析報(bào)告
項(xiàng)目成果部分圖片:
ppt成果圖片:PPT—11月自行車(chē)業(yè)務(wù)分析報(bào)告
3?數(shù)據(jù)處理
3.1?數(shù)據(jù)來(lái)源
項(xiàng)目數(shù)據(jù)描述:數(shù)據(jù)來(lái)源于adventure Works Cycles公司的的樣本數(shù)據(jù)庫(kù)蕾盯,包括了公司4大應(yīng)用場(chǎng)景的數(shù)據(jù):Sales、Finance蓝丙、Product级遭、Manufacture
3.2?數(shù)據(jù)理解
3.3?數(shù)據(jù)導(dǎo)入?
現(xiàn)有資料是一個(gè)從sqlserver導(dǎo)出的.sql文件,里面有表名渺尘、表字段以及每個(gè)表的csv數(shù)據(jù)挫鸽,利用文件讀寫(xiě)+正則表達(dá)式解析出 hive 建表語(yǔ)句,然后利用 pandas 指定編碼讀取再保存為 csv 文件
1鸥跟、在hive中創(chuàng)建一個(gè)基礎(chǔ)數(shù)據(jù)層庫(kù)adventure_cj掠兄,用來(lái)存放基礎(chǔ)表數(shù)據(jù)
2、在python中利用正則表達(dá)式循環(huán)讀取文件锌雀,獲取表名和字段名
得到以下數(shù)據(jù)
3蚂夕、解析 table_info 字典,用來(lái)創(chuàng)建表
4腋逆、導(dǎo)入數(shù)據(jù)到hive
5婿牍、用遍歷文件和pandas 讀取數(shù)據(jù)使之轉(zhuǎn)換成指定 utf8 編碼格式的代碼
得到utf8編碼格式的csv文件并以|分隔
6、建立數(shù)據(jù)倉(cāng)庫(kù)匯總層
(1)創(chuàng)建一個(gè)新的數(shù)據(jù)庫(kù) adventure_dw_cj惩歉,為了方便查閱將adventure_cj中的基礎(chǔ)表遷移過(guò)來(lái)
use adventure_dw_cj;
# 銷(xiāo)售地區(qū)維度表
create table dimsalesterritory as select * from adventure_cj.dimsalesterritory;
# 網(wǎng)絡(luò)銷(xiāo)售表
create table FactInternetSales as select * from adventure_cj.FactInternetSales;
# 產(chǎn)品維度表
create table DimProduct as select * from adventure_cj.DimProduct;
(2)建立事實(shí)表 fact_time 等脂,對(duì)比去年撑蚌、上個(gè)月的銷(xiāo)售額上遥、銷(xiāo)售額數(shù)量的不同
use adventure_dw_cj;
create table fact_time as
SELECT
a.*,
? b.amount? ? ? AS amount_last_year,
? b.order_number AS order_number_last_year,
? c.amount? ? ? AS amount_last_month,
? c.order_number AS order_number_last_month,
? round(((a.amount-c.amount)/c.amount)*100,2)? ? ? ? ? ? ? ? ? AS amount_comp_last_month,
? round(((a.order_number-c.order_number)/c.order_number)*100,2) AS order_number_comp_last_month,
? round(((a.amount-b.amount)/b.amount)*100,2) AS amount_comp_last_year,
? round(((a.order_number-b.order_number)/b.order_number)*100,2) AS order_number_comp_last_year
FROM? (
? SELECT
? ? SalesTerritoryKey,
? ? concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2)) AS orderdate,
? ? year(concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2)))? ? ? ? AS time_YEAR,
? ? QUARTER(concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2)))? ? ? AS time_QUARTER,
? ? MONTH(concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2)))? ? ? ? AS time_MONTH,
? ? WEEKofyear(concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2)))? AS time_WEEK,
? ? count( SalesAmount )? ? ? ? ? ? ? ? ? ? ? ? ? AS order_number,
? ? round(count(SalesAmount)*(0.9+rand ()*0.4),2) AS order_number_forcost,
? ? round( sum( SalesAmount ), 2 )? ? ? ? ? ? ? ? AS amount,
? ? round(sum(SalesAmount)*(0.9+rand ()*0.4),2)? AS amount_forcost,
? ? round(sum(SalesAmount)/count(SalesAmount),2)? AS customerunitprice,
? ? round( avg( TotalProductCost ), 2 )? ? ? ? ? AS per_productcost,
? ? round( avg( TaxAmt ), 2 )? ? ? ? ? ? ? ? ? ? AS per_tax,
? ? round( avg( freight ), 2 )? ? ? ? ? ? ? ? ? ? AS avg_freight
? FROM
? ? adventure_cj.FactInternetSales
? GROUP BY
? ? SalesTerritoryKey,
? ? OrderDateKey
? ) a
? LEFT JOIN (
? ? SELECT
? ? ? SalesTerritoryKey,
? ? ? OrderDateKey,
? ? ? date_add(concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2)), 365 ) AS orderdate,
? ? ? count( SalesAmount )? ? ? ? ? ? AS order_number,
? ? ? round( sum( SalesAmount ), 2 )? AS amount
? ? FROM
? ? ? adventure_cj.FactInternetSales
? ? GROUP BY
? ? ? SalesTerritoryKey,
? ? ? OrderDateKey
? ) b
ON a.SalesTerritoryKey = b.SalesTerritoryKey? AND a.orderdate = b.orderdate
LEFT JOIN (
? ? SELECT
? ? ? SalesTerritoryKey,
? ? ? OrderDateKey,
? ? ? date_add(concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2)),30) AS orderdate,
? ? ? count( SalesAmount ) AS order_number,
? ? ? round( sum( SalesAmount ), 2 ) AS amount
? ? FROM
? ? ? adventure_cj.FactInternetSales
? ? GROUP BY
? ? ? SalesTerritoryKey,
? ? ? OrderDateKey
? ) c
ON a.SalesTerritoryKey = c.SalesTerritoryKey? AND a.orderdate = c.orderdate
WHERE
? a.orderdate <= current_date()
ORDER BY
? a.SalesTerritoryKey,
? a.orderdate;
(3)創(chuàng)建事實(shí)表 Factinternet,對(duì)比銷(xiāo)售額争涌、銷(xiāo)售數(shù)量與目標(biāo)的完成率分析
use adventure_dw_cj;
create table Factinternet
as
select a.*,round(a.order_number/a.order_number_forcost,2) as order_number_forcost_comp,
round(a.order_number/a.order_number_forcost,2) as amount_forcost_comp
from (
SELECT
? ? a.OrderDatekey as orderdate,
? ? year(concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2)))? ? ? AS time_YEAR,
? ? QUARTER(concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2)))? ? AS time_QUARTER,
? ? MONTH(concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2)))? ? ? AS time_MONTH,
? ? WEEKofyear(concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2))) AS time_WEEK,
? ? a.SalesTerritoryKey,
? ? b.ProductSubcategoryKey,
? ? count( a.CustomerKey ) AS order_number,
? ? round(count(a.CustomerKey)* ( 0.9+rand ( ) * 0.4 ), 2 ) AS order_number_forcost,
? ? round( sum( a.SalesAmount ), 2 ) AS Amount,
? ? round(sum(SalesAmount)*(0.9+rand ()*0.4),2) AS amount_forcost,
? ? round(sum(a.SalesAmount)/count(a.SalesAmount),2) AS customerunitprice,
? ? round( avg( a.TotalProductCost ), 2 ) AS per_productcost,
? ? round( avg( a.TaxAmt ), 2 ) AS per_tax,
? ? round( avg( a.freight ), 2 ) AS avg_freight
FROM
? ? adventure_cj.FactinternetSales a
LEFT JOIN adventure_ods_lan.DimProduct b ON a.ProductKey = b.ProductKey
where concat_ws('-',substr(orderdatekey,0,4),substr(orderdatekey,5,2),substr(orderdatekey,7,2)) <= current_date()
GROUP BY
? ? a.OrderDatekey,
? ? a.SalesTerritoryKey,
? ? b.ProductSubcategoryKey) a;
7粉楚、每日定時(shí)更新數(shù)據(jù)
模擬基礎(chǔ)數(shù)據(jù)更新,把每天新的銷(xiāo)售數(shù)據(jù)入庫(kù)到基礎(chǔ)庫(kù)中亮垫,并且更新匯總層的數(shù)據(jù)
0 1 * * * sh /root/adventure_cj/update_shell/update_data_everyday.sh? ?#每天1點(diǎn)0分執(zhí)行這個(gè)腳本
4?PowerBI展示數(shù)據(jù)
借助ODBC使用powerBI連接Hive數(shù)據(jù)庫(kù)
目的是從以日模软、月、季饮潦、年為時(shí)間維度燃异,銷(xiāo)售區(qū)域來(lái)分析Adventure Works Cycles這家公司的銷(xiāo)售額、銷(xiāo)量继蜡、客單價(jià)回俐、平均稅費(fèi)逛腿、平均運(yùn)費(fèi)、銷(xiāo)售額完成率仅颇、平均成本鳄逾、銷(xiāo)售大區(qū)占比以及產(chǎn)品銷(xiāo)量
展示連接如下:Adventure_Work