在data下面新建兩張order和user表,將原數(shù)據(jù)導(dǎo)入mysql藏姐,csv的格式以逗號分隔。
load data local infile 'C:/Users/Administrator/Desktop/day1/user_info_utf.csv' into table data.user
fields terminated by ',';
order表部分如下:
select * from data.order
user表部分如下:
統(tǒng)計不同月份的下單人數(shù):
思路:按月進(jìn)行分組,統(tǒng)計人數(shù)清蚀,注意去重蹲嚣。
select date_format(paidTime,'%Y%m') as month,
count(distinct userId) as countid
from data.order
where isPaid='已支付'
group by date_format(paidTime,'%Y%m')
三月份的下單人數(shù)高于四月份递瑰,五月份的數(shù)據(jù)含有些臟數(shù)據(jù)不作分析祟牲。
統(tǒng)計三月份的復(fù)購率和回購率
復(fù)購率:三月份買過一次后,當(dāng)月再次購買
回購率:三月份買過一次后抖部,四月份繼續(xù)購買
1.三月份復(fù)購率:
思路:在三月份的條件下说贝,對userId進(jìn)行分組再計數(shù),篩選出購買次數(shù)大于1的人占三月份購買總?cè)藬?shù)的占比
select userId,count(userId) as paidcounts from data.order
where isPaid ='已支付'
and date_format(paidTime,'%Y%m')=201603
group by userId
先找出每個用戶的購買次數(shù):
篩選出大于1的占比:
select count(userId),count(if(paidcounts>1,1,null)) as repaidc
from(
select userId,count(userId) as paidcounts from data.order
where isPaid ='已支付'
and date_format(paidTime,'%Y%m')=201603
group by userId) t
兩者相除慎颗,得出結(jié)論
三月份的復(fù)購率(保留一位小數(shù))約為:30.9%
2.三月份的回購率:
通用方法:可求每個月的回購率
思路:先將每個用戶所有的消費時間篩選出來乡恕,用left join,產(chǎn)生笛卡爾積效應(yīng)哗总,篩選出月份相差為1的記錄几颜,求出占比。
select userId,date_format(paidTime,'%Y-%m-01') m
from data.order
where isPaid='已支付'
group by userId,date_format(paidTime,'%Y-%m-01')
將表進(jìn)行自連接讯屈,條件是id相同蛋哭,且前后消費時間相差一個月
再此基礎(chǔ)上對各個月份進(jìn)行分組,計算占比
select t1.m,count(t1.m),count(t2.m) cback from(
select userId,date_format(paidTime,'%Y-%m-01') m from data.order
where isPaid='已支付'
group by userId,date_format(paidTime,'%Y-%m-01')) t1
left join(
select userId,date_format(paidTime,'%Y-%m-01') m from data.order
where isPaid='已支付'
group by userId,date_format(paidTime,'%Y-%m-01'))t2
on t1.userId=t2.userId and t1.m=date_sub(t2.m,interval 1 month)
group by t1.m
三月份回購率約為:23.9%
統(tǒng)計男女消費頻次是否有差異:
思路:將order表和user表連接涮母,條件是已支付且性別不為空谆趾。對userId分組,查詢出對應(yīng)的消費次數(shù)叛本。再次嵌套groupby對性別分組沪蓬,求平均值。
select sex,avg(c)
from(
select o.userId,u.sex,count(u.sex) as c
from data.order o
join data.user u
on o.userId=u.userId
where isPaid='已支付' and sex !=''
group by o.userId) t
group by sex
男女的消費頻次約分別為1.78次和1.80次来候,消費頻次差距不大跷叉。
統(tǒng)計多次消費的用戶,第一次和最后一次消費間隔是多少营搅?
思路:對userId分組云挟,過濾出消費大于1次的,再找出時間的最大和最小值转质、消費間隔园欣。
select userId,max(paidTime),min(paidTime),
datediff(max(paidTime),min(paidTime)) deltaday
from data.order
where isPaid='已支付'
group by userId
having count(userId)>1
統(tǒng)計不同年齡段,用戶的消費金額是否有差異
先將用戶年齡按10的梯級劃分休蟹,把大于100歲的去除沸枯。
select userId,sex,ceil(age/10) as level from(
select userId,sex,year(now())-year(user.birth) as age
from data.user) t1
where age<100
將兩表進(jìn)行連接:
再對連接表的level和userId分組,計算出每個userId的消費次數(shù)赂弓,最后算出平均值绑榴。
select level,avg(ct)
from(
select t2.level,o.userId,count(o.userId) ct
from(
select userId,sex,ceil(age/10) as level from(
select userId,sex,year(now())-year(user.birth) as age
from data.user) t1
where age<100) t2
join data.order o
on o.userId=t2.userId and o.isPaid='已支付'
group by t2.level,o.userId) t3
group by level
根據(jù)結(jié)果可知,0-30年齡段的消費頻次較低盈魁,中年段的消費頻次相對低齡段有所增高且平均消費頻次在所有年齡段中最高彭沼,到老齡段頻次又逐漸降低。
統(tǒng)計消費的二八法則备埃,消費的TOP20%用戶姓惑,貢獻(xiàn)了多少額度?
思路:先計算出前20%的用戶數(shù)量按脚,再算出前20%用戶貢獻(xiàn)額度總和于毙。
排名前20%的用戶的消費總額度:
select count(userId) as top20,sum(sp)
from(
select userId,sum(price) as sp
from data.order
where isPaid='已支付'
group by userId
order by sp desc
limit 17130) t
總?cè)藬?shù)的消費總額度是:
前20%的用戶貢獻(xiàn)了約85%的額度。