本篇文章對sql的應(yīng)用進(jìn)行總結(jié)肚豺,使用sql從不同角度對銷售數(shù)據(jù)進(jìn)行分析
項(xiàng)目簡介
本篇文章使用的是某產(chǎn)品3惦辛、4肚医、5月份的消費(fèi)信息和用戶信息咬腋,對這兩份數(shù)據(jù)來分析該產(chǎn)品這三個(gè)月的銷售情況羹膳。
數(shù)據(jù)介紹
userinfo 客戶信息表
userId 客戶id
sex 性別
brith 出生日期
orderinfo 訂單信息表
orderId 訂單序號(hào)(虛擬主鍵)
userId 客戶id
isPaid 是否支付
price 商品價(jià)格
paidTime 支付時(shí)間
以上兩個(gè)表格即為本次分析的數(shù)據(jù),匹配字段是userid
分析角度
- 統(tǒng)計(jì)不同月份的下單人數(shù)
- 統(tǒng)計(jì)用戶三月份的復(fù)購率和回購率
- 統(tǒng)計(jì)男女用戶的消費(fèi)頻次是否有差異
- 統(tǒng)計(jì)多次消費(fèi)的用戶根竿,第一次和最后一次消費(fèi)的間隔是多少陵像?
- 統(tǒng)計(jì)不同年齡段,用戶的消費(fèi)金額是否有差異寇壳?
- 統(tǒng)計(jì)消費(fèi)的二八法則醒颖,消費(fèi)top 20%的用戶,貢獻(xiàn)了多少的額度壳炎?
MySQL分析
1.導(dǎo)入數(shù)據(jù)和清理數(shù)據(jù)
導(dǎo)入數(shù)據(jù)可以選擇:
- 從workbench導(dǎo)入图贸,效率較慢
- 使用命令行將數(shù)據(jù)導(dǎo)入,對數(shù)據(jù)量大的數(shù)據(jù)操作起來效率高
#創(chuàng)建數(shù)據(jù)庫,表格
create database data;
create table orderinfo(
orderid int primary key ,
userid int,
isPaid varchar(10),
price float,
paidTime varchar(30));
create table userinfo(
userid int primary key,
sex varchar(10),
birth date);
查看所有表格
#導(dǎo)入數(shù)據(jù)疏日,csv文件以逗號(hào)為分隔符
load data local infile 'C:/Users/Administrator/Desktop/order_info_utf.csv' into table orderinfo fields terminated by ',';
load data local infile 'C:/Users/Administrator/Desktop/user_info_utf.csv' into table userinfo fields terminated by ',';
#將時(shí)間格式標(biāo)準(zhǔn)化為2000-01-01
update orderinfo set paidtime=replace(paidtime,'/','-') where paidtime is not null;
#更新字符串為日期格式偿洁,方便接下來使用日期函數(shù)操作
update orderinfo set paidtime=str_to_date(paidtime,'%Y-%m-%d %H:%i') where paidtime is not null;
處理后的paidtime
2.業(yè)務(wù)分析
2.1統(tǒng)計(jì)不同月份的下單人數(shù)
select month(paidtime) as month,count(distinct userid) from orderinfo as order_number
where ispaid='已支付'
group by month(paidtime);
不同月份下單人數(shù)
2.2統(tǒng)計(jì)用戶三月份的復(fù)購率
select count(if(se>1,1,null))/count(1) as '復(fù)購率' from (
select userid,count(userid) as se from orderinfo
where ispaid='已支付'
and month(paidtime) = 3
group by userid) t;
3月份復(fù)購率
2.3統(tǒng)計(jì)三月份的回購率
#反復(fù)利用子查詢(每月消費(fèi)的userid,并利用分組進(jìn)行去重)
select t1.m,count(t2.m)/count(t1.m) as '回購率' from (
select userid,date_format(paidtime,'%Y-%m-01') as m
from orderinfo
where ispaid='已支付'
group by userid,date_format(paidtime,'%Y-%m-01'))t1
left join (
select userid,date_format(paidtime,'%Y-%m-01') as m
from orderinfo
where ispaid='已支付'
group by userid,date_format(paidtime,'%Y-%m-01'))t2
on t1.userid = t2.userid and t1.m = date_sub(t2.m,interval 1 month)
group by t1.m
每月回購率
2.4統(tǒng)計(jì)男女用戶的消費(fèi)頻次是否有差異
select sex,avg(ct) as frequency from (
select t1.userid,t2.sex,count(*) as ct
from orderinfo t1
inner join (
select * from userinfo
where sex != '')t2
on t1.userid = t2.userid
group by userid)t3
group by sex;
男女用戶消費(fèi)頻次差異
2.5統(tǒng)計(jì)多次消費(fèi)的用戶沟优,第一次和最后一次消費(fèi)間隔是多少涕滋?(生命周期,用戶消費(fèi)時(shí)間)
select
userid,
datediff(max(paidtime),min(paidtime)) "消費(fèi)間隔"
from orderinfo
where ispaid = '已支付'
group by userid
having count(paidtime) > 1;
用戶消費(fèi)間隔
2.6統(tǒng)計(jì)不同年齡段挠阁,用戶的消費(fèi)金額是否有差異(用戶消費(fèi)基本畫像)
#各個(gè)年齡組的消費(fèi)頻次宾肺、消費(fèi)金額
select age,avg(ct),round(sum(amount),2) from (
#每個(gè)用戶的消費(fèi)頻次、所屬年齡組侵俗、消費(fèi)金額
select o.userid,age,sum(price) as amount,count(*) as ct
from orderinfo o
#內(nèi)連接表userinfo獲取年齡信息
inner join
#將userinfo表按年齡分段
(select userid,round((year(now())-year(birth))/10) as age
from userinfo
where birth > '1919-00-00')t1
on o.userid = t1.userid
WHERE ispaid = '已支付'
group by o.userid,age)t2
group by t2.age
不同年齡段用戶消費(fèi)頻次锨用、金額
2.7統(tǒng)計(jì)消費(fèi)的二八法則,消費(fèi)的top20%的用戶隘谣,貢獻(xiàn)多少額度(top客戶)
#先計(jì)算出總的用戶數(shù)與消費(fèi)總金額
SELECT count(userid),sum(total) from (
SELECT userid,sum(price) as total
FROM orderinfo
where ispaid = '已支付'
GROUP BY userid
ORDER BY total DESC)t
總用戶數(shù)與消費(fèi)總金額
SELECT count(userid),sum(total),sum(total)/318503081 from (
SELECT userid,sum(price) as total
FROM orderinfo
where ispaid = '已支付'
GROUP BY userid
ORDER BY total DESC
LIMIT 17000)t
前20%的用戶消費(fèi)金額及在總消費(fèi)金額中占比