數(shù)據(jù)說(shuō)明
由兩張表組成,其中orderinfo表記錄了3月-5月的訂單情況,字段包括orderid变抽、userid、ispaid(支付狀態(tài))氮块、price和paidtime(支付時(shí)間)绍载,共539411行。
userinfo記錄了所有用戶情況滔蝉,字段包括userid击儡、sex和birth,共101535行蝠引。
分析問題
1.統(tǒng)計(jì)不同月份的下單人數(shù)
2.統(tǒng)計(jì)用戶三月份的回購(gòu)率和復(fù)購(gòu)率
3.統(tǒng)計(jì)男女用戶的消費(fèi)頻次是否有差異
4.統(tǒng)計(jì)多次消費(fèi)的用戶阳谍,第一次和最后一次消費(fèi)間隔是多少?
5.統(tǒng)計(jì)不同年齡段螃概,用戶的消費(fèi)金額是否有差異矫夯?
6.統(tǒng)計(jì)消費(fèi)的二八法則,消費(fèi)的top20%用戶吊洼,貢獻(xiàn)了多少額度
導(dǎo)入處理
- 在數(shù)據(jù)庫(kù)中建表
create table orderinfo(
orderid int primary key not null ,
userid int,
isPaid varchar(10),
price float,
paidTime varchar(30));
create table userinfo(
userid int primary key,
sex varchar(10),
birth date);
2.從cvs導(dǎo)入數(shù)據(jù)到mysql
load data local infile 'C:/Users/47207/Desktop/order_info_utf.csv' into table orderinfo fields terminated by ',';
load data local infile 'C:/Users/47207/Desktop/user_info_utf.csv' into table userinfo fields terminated by ',';
注:這里fields terminated by ','是指文件內(nèi)容按‘逗號(hào)為分割填入sql表中
-
觀察數(shù)據(jù)茧痒,將數(shù)據(jù)格式轉(zhuǎn)化為統(tǒng)一的sql日期格式
image.png
1、先把時(shí)間格式標(biāo)準(zhǔn)化為 1993-02-27
update orderinfo set paidtime=replace(paidtime,'/','-') where paidtime is not null;
2融蹂、然后更新字符串為日期格式旺订,然后才能使用日期函數(shù)進(jìn)行操作,
update orderinfo set paidtime=str_to_date(paidtime,'%Y-%m-%d %H:%i') where paidtime is not null;
預(yù)處理完成如下:
問題解決
1.統(tǒng)計(jì)不同月份的下單人數(shù)
select DATE_FORMAT(paidtime,'%Y-%M') as month, count(distinct userid)
from orderinfo
where ispaid='已支付'
group by month;
2.統(tǒng)計(jì)用戶三月份的回購(gòu)率和復(fù)購(gòu)率
這兩個(gè)指標(biāo)邏輯不同:
(1).回購(gòu)率是三月購(gòu)買的userid四月依舊購(gòu)買超燃。
select count(distinct userid)
from orderinfo
where month(paidtime)=4 and ispaid='已支付' and userid in (
select userid
from orderinfo
where ispaid='已支付' and month(paidtime)=3
group by userid);
上述語(yǔ)句得出了4月依舊購(gòu)買的人數(shù)区拳,除以3月購(gòu)買人數(shù)即可方法二:
select count(b.april_user)/count(a.march_user)
from (select userid as march_user
from orderinfo
where isPaid='已支付' and month(paidTime)=3
group by userid) a left join (select userid as april_user
from orderinfo
where isPaid='已支付' and month(paidTime)=4
group by userid) b on a.march_user=b.april_user;
每個(gè)月的復(fù)購(gòu)率可以如下計(jì)算:
select t1.m,count(t1.m),count(t2.m),count(t2.m)/count(t1.m) from
(select userid,date_format(paidtime,"%Y-%m-01") m from orderinfo where isPaid = "已支付" group by userid,date_format(paidtime,"%Y-%m-01")) t1
left join
(select userid,date_format(paidtime,"%Y-%m-01") m from orderinfo where isPaid = "已支付" group by userid,date_format(paidtime,"%Y-%m-01")) t2
on t1.userid = t2.userid and t2.m=date_add(t1.m,interval 1 month) group by t1.m;
(2).復(fù)購(gòu)率是三月所有消費(fèi)者中,購(gòu)買2次及以上消費(fèi)者的比例意乓。
select count(purchase_num), SUM(IF(purchase_num>1,1,NULL)), SUM(IF(purchase_num>1,1,NULL))/count(purchase_num) as rate
FROM (select userid, count(*) as purchase_num
from orderinfo
where ispaid='已支付' and month(paidtime)=3
group by userid) a;
3.統(tǒng)計(jì)男女用戶的消費(fèi)頻次是否有差異(平均消費(fèi)次數(shù))
select u.sex, count(o.userid)/count(distinct u.userid)
from orderinfo o, userinfo u
where o.userid=u.userid and o.isPaid='已支付' and u.sex <> ''
group by u.sex;
4.統(tǒng)計(jì)多次消費(fèi)的用戶樱调,第一次和最后一次消費(fèi)間隔是多少约素?
select userid, min(paidtime), max(paidtime), DATEDIFF(max(paidtime),min(paidtime))
from orderinfo
where ispaid='已支付'
group by userid
having count(*)>1
order by userid asc;
5.統(tǒng)計(jì)不同年齡段,用戶的消費(fèi)金額是否有差異笆凌?
思路:去除有問題日期數(shù)據(jù)后圣猎,將年齡分段(ceil函數(shù)向上取整)。再求出每個(gè)用戶的年齡段乞而、訂單數(shù)送悔、總金額。最后求出不同年齡段的訂單平均金額爪模。
select a.age_level, sum(a.sm)/sum(a.ct)
from
(select u.userid, u.age_level, count(u.userid) as ct, sum(price) as sm
from orderinfo o join
(select userid, CEIL((year(now())-year(birth))/10) as age_level
from userinfo
where birth > '1901-00-00') u on o.userid=u.userid
where o.isPaid='已支付'
group by u.userid, u.age_level) a
group by a.age_level
order by a.age_level asc;
6.統(tǒng)計(jì)消費(fèi)的二八法則欠啤,消費(fèi)的top20%用戶,貢獻(xiàn)了多少額度
思路:先求出總消費(fèi)金額和用戶數(shù)屋灌。
select count(*), sum(a.sum_p)
from(
select userid, sum(price) as sum_p
from orderinfo
where isPaid='已支付'
group by userid
order by sum(price) desc) a;
前百分之20%的用戶為17130
select count(*), sum(a.sum_p)
from(
select userid, sum(price) as sum_p
from orderinfo
where isPaid='已支付'
group by userid
order by sum(price) desc
limit 17130) a;
貢獻(xiàn)了額度的272201811.46/318501846.54=85%洁段,基本符合二八定律。