數(shù)據(jù)來(lái)源
數(shù)據(jù)來(lái)源于kaggle是己,https://www.kaggle.com/carrie1/ecommerce-data
理解數(shù)據(jù)
InvoiceNo:發(fā)票編號(hào) 狭姨;每筆交易分配唯一的6位整數(shù)纱新,而退貨訂單的代碼以字母'c'開(kāi)頭
StockCode:產(chǎn)品編號(hào)牍戚;每個(gè)不同的產(chǎn)品分配唯一的5位整數(shù)
Description:產(chǎn)品描述放钦;對(duì)每件產(chǎn)品的簡(jiǎn)略描述
Quantity:產(chǎn)品數(shù)量;每筆交易的每件產(chǎn)品的數(shù)量
InvoiceDate:交易日期皇忿;每筆交易發(fā)生的日期和時(shí)間
UnitPrice:?jiǎn)蝺r(jià)(英鎊);單位產(chǎn)品價(jià)格
CustomerID:顧客ID坦仍;每個(gè)客戶(hù)分配唯一的5位整數(shù)
Country:國(guó)家鳍烁;每個(gè)客戶(hù)所在國(guó)家/地區(qū)的名稱(chēng)
提出問(wèn)題
基于RFM模型提出一下問(wèn)題:
1.客戶(hù)維度:各類(lèi)客戶(hù)的占比為多少,是否正常繁扎?
2.時(shí)間維度:各類(lèi)客戶(hù)分別都在哪些月份購(gòu)買(mǎi)產(chǎn)品幔荒?
3.區(qū)域維度:各類(lèi)客戶(hù)分別集中在哪些州/國(guó)/地區(qū)?
4.產(chǎn)品維度:各類(lèi)客戶(hù)集中購(gòu)買(mǎi)的產(chǎn)品分別是哪些梳玫?
數(shù)據(jù)清洗
用python對(duì)數(shù)據(jù)進(jìn)行清洗(此處省略)
構(gòu)建模型
1.將利用python清洗的數(shù)據(jù)導(dǎo)出為csv表:gift_retail_online爹梁,將表導(dǎo)入MySQL中。
2.創(chuàng)建分析所需的新表
即:
地區(qū)表:online_country_state
價(jià)格區(qū)間表:online_price_range
客戶(hù)分類(lèi)表:online_customer_group
建表過(guò)程
①選出國(guó)家再人工進(jìn)行查詢(xún)得到online_country_state(地區(qū)表)
SELECT DISTINCT 國(guó)家 AS 國(guó)家 FROM gift_retail_online
將國(guó)家選出來(lái)提澎,由于地區(qū)數(shù)據(jù)需要查詢(xún)姚垃,所以導(dǎo)出國(guó)家再進(jìn)行人工查詢(xún)其中文名稱(chēng)、所屬洲及在其所屬洲的位置盼忌,部分表截圖如下:
②創(chuàng)建臨時(shí)表price再轉(zhuǎn)化為新表online_price_range(價(jià)格區(qū)間表)
CREATE VIEW price AS
SELECT 產(chǎn)品編號(hào),
(CASE WHEN 單價(jià)=0 THEN '贈(zèng)品'
WHEN 單價(jià)<1 THEN '<1'
WHEN 單價(jià)<10 THEN '1-10'
WHEN 單價(jià)<50 THEN "10-50"
WHEN 單價(jià)<300 THEN "50-300"
WHEN 單價(jià)<1000 THEN "300-1000"
WHEN 單價(jià)<3000 THEN "1000-3000"
ELSE ">3000" END) AS 價(jià)格區(qū)間
FROM (SELECT * FROM gift_retail_online WHERE 單價(jià)>=0) AS a
3. 由于需要對(duì)客戶(hù)分類(lèi)
根據(jù)RFM模型积糯,創(chuàng)建R_value、F_value谦纱、M_value三張臨時(shí)表看成,再由三張臨時(shí)表創(chuàng)建出臨時(shí)客戶(hù)分類(lèi)表,再轉(zhuǎn)化為新表online_customer_group(客戶(hù)分類(lèi)表)
*由于這里需要知道R跨嘉、F川慌、M各自的四分位數(shù)才可以給出區(qū)間隨后進(jìn)行打分,在這里使用python的quantitle函數(shù)祠乃,求出各自的分類(lèi)標(biāo)準(zhǔn)
R指標(biāo)
數(shù)據(jù)采集日期與最近消費(fèi)日期的相隔天數(shù)
CREATE VIEW R_value AS
SELECT 顧客ID,
(CASE WHEN 購(gòu)買(mǎi)天數(shù) BETWEEN 0 AND 30 THEN 5
WHEN 購(gòu)買(mǎi)天數(shù) BETWEEN 30 AND 90 THEN 4
WHEN 購(gòu)買(mǎi)天數(shù) BETWEEN 90 AND 180 THEN 3
WHEN 購(gòu)買(mǎi)天數(shù) BETWEEN 180 AND 360 THEN 2
WHEN 購(gòu)買(mǎi)天數(shù) BETWEEN 360 AND 720 THEN 1
ELSE 0 END
) AS 購(gòu)買(mǎi)得分
FROM
(SELECT 顧客ID,DATEDIFF('2011-12-09',MAX(交易日期)) AS 購(gòu)買(mǎi)天數(shù)
FROM
(SELECT * FROM gift_retail_online WHERE 單價(jià)>0
AND 發(fā)票編號(hào) NOT IN
(SELECT 發(fā)票編號(hào) FROM gift_retail_online WHERE 發(fā)票編號(hào) LIKE "C%")
AND 顧客ID NOT IN
(SELECT 顧客ID FROM gift_retail_online WHERE 顧客ID='U')
) AS a
GROUP BY 顧客ID) AS b
ORDER BY 購(gòu)買(mǎi)得分 DESC
F指標(biāo)
CREATE VIEW F_value AS
SELECT 顧客ID,
(CASE WHEN 購(gòu)買(mǎi)次數(shù) BETWEEN 1 AND 2 THEN 1
WHEN 購(gòu)買(mǎi)次數(shù) BETWEEN 2 AND 5 THEN 2
WHEN 購(gòu)買(mǎi)次數(shù) BETWEEN 5 AND 10 THEN 3
WHEN 購(gòu)買(mǎi)次數(shù) BETWEEN 10 AND 20 THEN 4
WHEN 購(gòu)買(mǎi)次數(shù) BETWEEN 20 AND 8000 THEN 5
ELSE 0 END
) AS 購(gòu)買(mǎi)頻率得分
FROM
(SELECT 顧客ID,COUNT(顧客ID) AS 購(gòu)買(mǎi)次數(shù)
FROM
(SELECT * FROM gift_retail_online WHERE 單價(jià)>0
AND 發(fā)票編號(hào) NOT IN
(SELECT 發(fā)票編號(hào) FROM gift_retail_online WHERE 發(fā)票編號(hào) LIKE "C%")
AND 顧客ID NOT IN
(SELECT 顧客ID FROM gift_retail_online WHERE 顧客ID='U')
) AS a
GROUP BY 顧客ID) AS b
ORDER BY 購(gòu)買(mǎi)頻率得分 DESC
M指標(biāo)
CREATE VIEW M_value AS
SELECT 顧客ID,
(CASE WHEN 購(gòu)買(mǎi)金額 BETWEEN 0 AND 500 THEN 1
WHEN 購(gòu)買(mǎi)金額 BETWEEN 500 AND 2000 THEN 2
WHEN 購(gòu)買(mǎi)金額 BETWEEN 2000 AND 5000 THEN 3
WHEN 購(gòu)買(mǎi)金額 BETWEEN 5000 AND 10000 THEN 4
WHEN 購(gòu)買(mǎi)金額 BETWEEN 10000 AND 280000 THEN 5
ELSE 0 END
) AS 購(gòu)買(mǎi)金額得分
FROM
(SELECT 顧客ID,SUM(筆銷(xiāo)售額) AS 購(gòu)買(mǎi)金額
FROM
(SELECT * FROM gift_retail_online WHERE 單價(jià)>0
AND 發(fā)票編號(hào) NOT IN
(SELECT 發(fā)票編號(hào) FROM gift_retail_online WHERE 發(fā)票編號(hào) LIKE "C%")
AND 顧客ID NOT IN
(SELECT 顧客ID FROM gift_retail_online WHERE 顧客ID='U')
) AS a
GROUP BY 顧客ID) AS b
ORDER BY 購(gòu)買(mǎi)金額得分 DESC
計(jì)算三者的平均值
SELECT avg(購(gòu)買(mǎi)得分) from r_value
3.832
SELECT avg(購(gòu)買(mǎi)頻率得分) from f_value
4.4539
SELECT avg(購(gòu)買(mǎi)金額得分) from m_value
1.8882
根據(jù)R,F,M三個(gè)虛擬表創(chuàng)建客戶(hù)分類(lèi)表梦重,轉(zhuǎn)化為新表online_customer_group(客戶(hù)分類(lèi)表)
CREATE VIEW 客戶(hù)分類(lèi) AS
SELECT 顧客ID,
(CASE WHEN R>3.82 AND F>4.45 AND M>1.89 THEN '重要價(jià)值客戶(hù)'
WHEN R>3.82 AND F>4.45 AND M<1.89 THEN '一般價(jià)值客戶(hù)'
WHEN R>3.82 AND F<4.45 AND M>1.89 THEN '重要發(fā)展客戶(hù)'
WHEN R>3.82 AND F<4.45 AND M<1.89 THEN '一般發(fā)展客戶(hù)'
WHEN R<3.82 AND F>4.45 AND M>1.89 THEN '重要保持客戶(hù)'
WHEN R<3.82 AND F>4.45 AND M<1.89 THEN '一般挽留客戶(hù)'
WHEN R<3.82 AND F<4.45 AND M>1.89 THEN '重要挽留客戶(hù)'
ELSE '流失客戶(hù)' END) AS 客戶(hù)類(lèi)型
FROM
(SELECT a.顧客ID,a.購(gòu)買(mǎi)得分 AS R,
b.購(gòu)買(mǎi)頻率得分 AS F,
c.購(gòu)買(mǎi)金額得分 AS M
FROM r_value AS a
INNER JOIN
f_value AS b
on a.顧客ID=b.顧客ID
INNER JOIN
m_value AS c
ON a.顧客ID=c.顧客ID) AS c
數(shù)據(jù)分析
1.各類(lèi)客戶(hù)的占比及判斷是否正常
SELECT a.客戶(hù)類(lèi)型,a.客戶(hù)類(lèi)型數(shù)量,b.總客戶(hù)數(shù),FORMAT(a.客戶(hù)類(lèi)型數(shù)量/b.總客戶(hù)數(shù),2) AS 占比 FROM
(SELECT 客戶(hù)類(lèi)型,COUNT(客戶(hù)類(lèi)型) AS 客戶(hù)類(lèi)型數(shù)量
FROM online_customer_group
GROUP BY 客戶(hù)類(lèi)型
ORDER BY 客戶(hù)類(lèi)型數(shù)量 DESC) AS a,
(SELECT COUNT(顧客ID) AS 總客戶(hù)數(shù) FROM online_customer_group) AS b
對(duì)客戶(hù)類(lèi)別進(jìn)行等級(jí)劃分,劃分為A亮瓷、B琴拧、C級(jí)
A級(jí)客戶(hù):重要價(jià)值客戶(hù)占比45% + 重要發(fā)展客戶(hù)占比3% = 48%
B級(jí)客戶(hù):重要保持客戶(hù)10% + 一般發(fā)展客戶(hù)10% + 一般價(jià)值客戶(hù)9% + 重要挽留客戶(hù)2%=31%
C 級(jí)客戶(hù):流失客戶(hù)14% + 一般保持客戶(hù)7% = 21%
2.不同類(lèi)型客戶(hù)的購(gòu)買(mǎi)時(shí)間對(duì)比
SELECT a.年份,a.月份,SUM(a.筆銷(xiāo)售額) AS 各個(gè)類(lèi)型顧客銷(xiāo)售額,b.客戶(hù)類(lèi)型 FROM
(SELECT 年份,月份,單價(jià),發(fā)票編號(hào),顧客ID,筆銷(xiāo)售額 FROM gift_retail_online
WHERE 單價(jià)>0 AND 筆銷(xiāo)售額>0 AND 發(fā)票編號(hào) NOT IN
(SELECT 發(fā)票編號(hào) FROM gift_retail_online WHERE 發(fā)票編號(hào) LIKE "C%")) AS a
INNER JOIN
(SELECT 顧客ID,客戶(hù)類(lèi)型 FROM online_customer_group) AS b
ON a.顧客ID=b.顧客ID
GROUP BY 客戶(hù)類(lèi)型,年份,月份
i. 流失客戶(hù)與一般保持客戶(hù)同為C級(jí)客戶(hù),從圖上看特征基本相同寺庄,消費(fèi)金額都較低且時(shí)間只到消費(fèi)截止到9月份艾蓝,10-12月并沒(méi)有交易記錄力崇。
ii. 重要發(fā)展客戶(hù)與一般發(fā)展客戶(hù)分別為A斗塘、B級(jí)客戶(hù),從圖上明顯看出區(qū)分兩者的為消費(fèi)金額亮靴,截至11月前的曲線(xiàn)基本相同馍盟,但在12月份出現(xiàn)相反的走向,由此可知重要發(fā)展客戶(hù)會(huì)在12月份進(jìn)行集中購(gòu)買(mǎi)茧吊,而一般發(fā)展客戶(hù)只在8-11月份內(nèi)有少量的消費(fèi)贞岭。
iii. 重要保持客戶(hù)與重要挽留客戶(hù)同為B級(jí)客戶(hù)八毯,從圖上可看出重要挽留客戶(hù)在1月份與6月份的消費(fèi)金額存在大小峰值,重要保持客戶(hù)的消費(fèi)金額波動(dòng)幅度并不大瞄桨,兩者的消費(fèi)也都截止在9月份话速,10-12月并沒(méi)有交易記錄。
iv. 重要價(jià)值客戶(hù)與一般價(jià)值客戶(hù)分別為A芯侥、B級(jí)客戶(hù)泊交,該圖有主次兩個(gè)坐標(biāo)軸,重要價(jià)值客戶(hù)消費(fèi)金額高且在11月出現(xiàn)峰值柱查,一般價(jià)值客戶(hù)的消費(fèi)分時(shí)間段進(jìn)行廓俭,分別為1-3月、5-6月唉工、8-11月研乒。
v. 重要價(jià)值客戶(hù)對(duì)比整體銷(xiāo)售情況,兩個(gè)曲線(xiàn)走勢(shì)基本相同淋硝,可以明顯看出整體的消費(fèi)的90%來(lái)自重要價(jià)值客戶(hù)雹熬。
3.各類(lèi)客戶(hù)分別集中在哪些區(qū)域
SELECT b.所屬洲,b.所處位置,b.中文名稱(chēng) AS 國(guó)家,COUNT(a.num) AS 客戶(hù)數(shù)量,c.客戶(hù)類(lèi)型 FROM
(SELECT 國(guó)家,顧客ID,COUNT(DISTINCT `顧客ID`) AS num FROM gift_retail_online
WHERE 單價(jià)>0 AND 筆銷(xiāo)售額>0 AND 發(fā)票編號(hào) NOT IN
(SELECT 發(fā)票編號(hào) FROM gift_retail_online WHERE 發(fā)票編號(hào) LIKE "C%")
GROUP BY 顧客ID) AS a
INNER JOIN
(SELECT 國(guó)家,中文名稱(chēng),所屬洲,所處位置 FROM online_country_state) AS b
ON a.國(guó)家=b.國(guó)家
INNER JOIN
(SELECT 顧客ID,客戶(hù)類(lèi)型 FROM online_customer_group) AS c
ON a.顧客ID=c.顧客ID
GROUP BY 客戶(hù)類(lèi)型,國(guó)家
i. 從上圖可以看出客戶(hù)基本上集中在歐洲,其它洲的客戶(hù)極少奖地,如非洲只存在1位重要價(jià)值客戶(hù)橄唬,南美洲只存在1為重要保持客戶(hù)
4.各類(lèi)客戶(hù)集中購(gòu)買(mǎi)的產(chǎn)品
SELECT * FROM
(SELECT *,row_number() OVER(PARTITION BY 客戶(hù)類(lèi)型 ORDER BY 產(chǎn)品數(shù)量 DESC) AS ranking
FROM
(SELECT 產(chǎn)品編號(hào),SUM(產(chǎn)品數(shù)量) AS 產(chǎn)品數(shù)量,客戶(hù)類(lèi)型 FROM
(SELECT 國(guó)家,顧客ID,產(chǎn)品編號(hào),產(chǎn)品數(shù)量 FROM gift_retail_online
WHERE 單價(jià)>0 AND 筆銷(xiāo)售額>0 AND 發(fā)票編號(hào) NOT IN
(SELECT 發(fā)票編號(hào) FROM gift_retail_online WHERE 發(fā)票編號(hào) LIKE "C%")) AS a
INNER JOIN
(SELECT 顧客ID,客戶(hù)類(lèi)型 FROM online_customer_group) AS b
ON a.顧客ID=b.顧客ID
GROUP BY 客戶(hù)類(lèi)型,產(chǎn)品編號(hào))AS d) AS c
WHERE ranking<=5
從圖中可以看出,重要發(fā)展客戶(hù)與重要挽留客戶(hù)分別對(duì)產(chǎn)品編號(hào)為23843與23166有集中的偏好参歹,其它客戶(hù)對(duì)應(yīng)購(gòu)買(mǎi)量前5的產(chǎn)品之間區(qū)別并沒(méi)有特別大