面向業(yè)務(wù)的mysql筆試題筆記

參考:
數(shù)分面試-SQL篇

本篇文章完全按照以上參考鏈接寫(xiě),其實(shí)可以理解為個(gè)人筆記

拼多多面試題

注:部分來(lái)源于筆試,面試部分因?yàn)槎际腔谄渌嗣娼?jīng)口述記錄悠鞍,所以難免有一些在格式不統(tǒng)一的地方瑟幕。

1.case專(zhuān)題-商品訂單數(shù)據(jù)

數(shù)據(jù)表:

訂單表orders,大概字段有(order_id'訂單號(hào),'user_id‘用戶(hù)編號(hào)’, order_pay‘訂單金額’ , order_time‘下單時(shí)間’,'商品一級(jí)類(lèi)目commodity_level_I','商品二級(jí)類(lèi)目commodity_level_2')

問(wèn)題1. 求最近7天內(nèi)每一個(gè)一級(jí)類(lèi)目下成交總額排名前3的二級(jí)類(lèi)目:
思路:

  1. 先給出最近七天每一個(gè)一級(jí)類(lèi)目下的二級(jí)類(lèi)目的成交總額
select o.commodity_level_I, o.commodity_level_2, sum(order_pay) as total_pay
from orders as o
where datediff(now(), o.order_time) <= 7
group by o.commodity_level_I, o.commodity_level_2
  1. 再對(duì)每個(gè)一級(jí)類(lèi)目下的成交總額進(jìn)行排名
select tmp.commodity_level_I, tmp.commodity_level_2, tmp.total_pay,
row_number() over(partition by tmp.commodity_level_I order by tmp.total_pay desc) as rank
from (select o.commodity_level_I, o.commodity_level_2, sum(order_pay) as total_pay
from orders as o
where datediff(now(), o.order_time) <= 7
group by o.commodity_level_I, o.commodity_level_2) as tmp 
  1. 取排名前三的即可
select tmp1.commodity_level_I as '商品一級(jí)類(lèi)目', 
tmp1.commodity_level_2 as '商品二級(jí)類(lèi)目', 
tmp1.total_pay as '成交總額'
from (select tmp.commodity_level_I, tmp.commodity_level_2, tmp.total_pay,
row_number() over(partition by tmp.commodity_level_I order by tmp.total_pay desc) as rank
from (select o.commodity_level_I, o.commodity_level_2, sum(order_pay) as total_pay
from orders as o
where datediff(now(), o.order_time) <= 7
group by o.commodity_level_I, o.commodity_level_2) as tmp ) as tmp1
where tmp1.rank <=3

問(wèn)題2.提取8.1-8.10每一天消費(fèi)金額排名在101-195的user_id
思路:

  1. 選取8.1-8.10每一天每個(gè)user的消費(fèi)金額
select o.order_time, o. user_id, sum(o.order_pay) as sum_pay
from orders as o
where o.order_time between '2020-08-01' and '2020-08-10'
group by o.order_time, o. user_id
  1. 給出每天的消費(fèi)金額排名
select tmp.order_time, tmp. user_id, tmp.sum_pay,
row_number() over(partition by tmp.order_time order by tmp.sum_pay desc) as rank
from (select o.order_time, o. user_id, sum(o.order_pay) as sum_pay
from orders as o
where o.order_time between '2020-08-01' and '2020-08-10'
group by o.order_time, o. user_id) as tmp

3.選出排名101-195的即可

select tmp1.order_time  as '訂單日期', 
tmp1. user_id , 
tmp1.sum_pay as '消費(fèi)金額'
from (select tmp.order_time, tmp. user_id, tmp.sum_pay,
row_number() over(partition by tmp.order_time order by tmp.sum_pay desc) as rank
from (select o.order_time, o. user_id, sum(o.order_pay) as sum_pay
from orders as o
where o.order_time between '2020-08-01' and '2020-08-10'
group by o.order_time, o. user_id) as tmp) as tmp1
where tmp1.rank between 101 and 195;

2.case專(zhuān)題-活動(dòng)運(yùn)營(yíng)數(shù)據(jù)分析

數(shù)據(jù)表

表1——訂單表orders,大概字段有(user_id‘用戶(hù)編號(hào)’, order_pay‘訂單金額’ , order_time‘下單時(shí)間’)

表2——活動(dòng)報(bào)名表act_apply,大概字段有(act_id‘活動(dòng)編號(hào)’, user_id‘報(bào)名用戶(hù)’,act_time‘報(bào)名時(shí)間’)

1.活動(dòng)運(yùn)營(yíng)數(shù)據(jù)分析-統(tǒng)計(jì)每個(gè)活動(dòng)對(duì)應(yīng)所有用戶(hù)在報(bào)名后產(chǎn)生的總訂單金額伤靠,總訂單數(shù)
思路:
兩表按user_id進(jìn)行連接
還有一個(gè)條件是下單的時(shí)間要大于等于報(bào)名時(shí)間
然后按照act_id進(jìn)行分組統(tǒng)計(jì)

select a.act_id, sum(o.order_pay) as '總金額', count(a.act_id) as '訂單數(shù)'
from orders as o
join act_apply as a
on o.user_id = a.user_id
where o.order_time >= a.act_time
group by a.act_id

參考鏈接中使用的是left join(如果需要改我這里是right join)
這里是需要斟酌一下的

2.統(tǒng)計(jì)每個(gè)活動(dòng)從開(kāi)始后到當(dāng)天(考試日)平均每天產(chǎn)生的訂單數(shù)胶坠,活動(dòng)開(kāi)始時(shí)間定義為最早有用戶(hù)報(bào)名的時(shí)間指么。(涉及到時(shí)間的數(shù)據(jù)類(lèi)型均為:datetime)

  1. 先統(tǒng)計(jì)每個(gè)活動(dòng)的最小時(shí)間即為活動(dòng)開(kāi)始的時(shí)間
select a.act_id, a.user_id, a.act_time,
min(a.act_time) over(partition by a.act_id) as start_date
from act_apply as a

2.與訂單表連接 總的訂單數(shù)除以時(shí)間差即可

select tmp.act_id as '活動(dòng)編號(hào)'掏觉, count(*)/datediff(now(), tmp.start_date) as `平均每天產(chǎn)生的訂單數(shù)`
from (select a.act_id, a.user_id, a.act_time,
min(a.act_time) over(partition by a.act_id) as start_date
from act_apply as a) as tmp
right join orders as o
on o.user_id = tmp.user_id
where o.order_time >= tmp.act_time
-- 如果group by 沒(méi)有start_date的話(huà) sql不允許select語(yǔ)句使用它的 這是聚合函數(shù)的限制
group by tmp.act_id

其實(shí)不用窗口函數(shù)也是可行的

SELECT a.act_id, COUNT(*)/ DATEDIFF(NOW() - MIN(a.act_time)) as m_order_num
FROM orders as o
LEFT JOIN act_apply as a
ON a.user_id = o.user_id
WHERE order_time >= act_time
GROUP BY a.act_id;

3.case專(zhuān)題-用戶(hù)行為路徑分析

表1——用戶(hù)行為表tracking_log区端,大概字段有(user_id‘用戶(hù)編號(hào)’,opr_id‘操作編號(hào)’,log_time‘操作時(shí)間’)

問(wèn)題1.統(tǒng)計(jì)每天符合以下條件的用戶(hù)數(shù):A操作之后是B操作,AB操作必須相鄰
思路:

  1. 用窗口函數(shù)給出每天每個(gè)用戶(hù)的每次操作澳腹、對(duì)應(yīng)的前一個(gè)操作
select t.user_id, t.opr_id as opr1, t.log_time,
lead(1) over(partition by t.log_time, t.user_id order by t.log_time asc) as opr2
from tracking_log as t

2.按時(shí)間進(jìn)行分組 統(tǒng)計(jì)opr1為A opr2為B的用戶(hù)數(shù)目

select tmp.log_time, count(distinct tmp.user_id) as 'A-B路徑用戶(hù)計(jì)數(shù)'
from (select t.user_id, t.opr_id as opr1, t.log_time,
lead(1) over(partition by t.log_time, t.user_id order by t.log_time asc) as opr2
from tracking_log as t) as tmp
where tmp.opr1 = 'A' and tmp.opr2 = 'B'
group by tmp.log_time

問(wèn)題2.統(tǒng)計(jì)用戶(hù)行為序列為A-B-D的用戶(hù)數(shù)
其中:A-B之間可以有任何其他瀏覽記錄(如C,E等),B-D之間除了C記錄可以有任何其他瀏覽記錄(如A,E等)
思路:

  1. 構(gòu)造每個(gè)用戶(hù)的瀏覽記錄
select t.user_id, group_concat(t.opr_id order by t.log_time asc) as path
from tracking_log as t
group by t.user_id

GROUP_CONCAT(DISTINCT expression
ORDER BY expression
SEPARATOR sep);
//原文出自【易百教程】珊燎,商業(yè)轉(zhuǎn)載請(qǐng)聯(lián)系作者獲得授權(quán),非商業(yè)請(qǐng)保留原文鏈接:https://www.yiibai.com/mysql/group_concat.html

2.選出滿(mǎn)足條件的記錄

select t.user_id, group_concat(t.opr_id order by t.log_time asc) as path
from tracking_log as t
group by t.user_id
having path REGEXP 'A[A-Z]+B[^C]+D'

MySQL 正則表達(dá)式
3.統(tǒng)計(jì)數(shù)量即可

select count(tmp.user_id) as result
from (select t.user_id, group_concat(t.opr_id order by t.log_time asc) as path
from tracking_log as t
group by t.user_id
having path REGEXP 'A[A-Z]+B[^C]+D')  as tmp

參考答案

select count(user_id) as result
from
(
    select user_id,group_concat(opr_id order by log_time) as user_behavior_path
    from tracking_log
    group by user_id
    having (user_behavior_path like '%A%B%D%') 
             and (user_behavior_path not like '%A%B%C%D%')
) t

使用模糊查詢(xún) 選出滿(mǎn)足 ABD順序的 再排除滿(mǎn)足ABCD順序的即可

4.case專(zhuān)題-用戶(hù)留存分析

表1——用戶(hù)登陸表user_log遵湖,大概字段有(user_id‘用戶(hù)編號(hào)’悔政,log_date‘登陸時(shí)間’)

問(wèn)題1.求每天新增用戶(hù)數(shù),以及他們第2天延旧、30天的留存率
思路:

  1. 先選出每個(gè)用戶(hù)的最小登陸時(shí)間 則該用戶(hù)是當(dāng)天的新增用戶(hù)
select u.user_id, min(u.log_time) as first_time
from user_log as u
group by u.user_id

2.與主表連接 得到每個(gè)用戶(hù)的登陸時(shí)間及其首次登陸時(shí)間
對(duì)首次登陸時(shí)間分組 統(tǒng)計(jì)數(shù)量則為每日新增用戶(hù)數(shù)

select tmp. first_time, count(distinct tmp.user_id)  as '每天新增用戶(hù)數(shù)'
from (select u.user_id, min(u.log_time) as first_time
from user_log as u
group by u.user_id) as tmp
join user_log as u
on u.user_id = tmp.user_id
group by tmp. first_time;
  1. 第二天留存的用戶(hù)通過(guò)if判斷 如果登陸時(shí)間與首次登陸時(shí)間差值為1 則返回用戶(hù)id 否則記為null 統(tǒng)計(jì)數(shù)量即可
    30天留存同理
select tmp. first_time, count(distinct tmp.user_id)  as '每天新增用戶(hù)數(shù)',
count(distinct if(datediff(tmp.log_time, tmp. first_time) = 1, tmp.user_id, null)) as '第二天留存用戶(hù)數(shù)'谋国,
count(distinct if(datediff(tmp.log_time, tmp. first_time) = 29, tmp.user_id, null)) as '第三十天留存用戶(hù)數(shù)'
from (select u.user_id, min(u.log_time) as first_time
from user_log as u
group by u.user_id) as tmp
join user_log as u
on u.user_id = tmp.user_id
group by tmp. first_time;
  1. 所求留存率只需除以新增用戶(hù)數(shù) 然后用round取兩位即可
select tmp. first_time, count(distinct tmp.user_id)  as '每天新增用戶(hù)數(shù)',
round(count(distinct if(datediff(tmp.log_time, tmp. first_time) = 1, tmp.user_id, null))/count(distinct tmp.user_id), 2) as '第二天的留存率',
round(count(distinct if(datediff(tmp.log_time, tmp. first_time) = 29, tmp.user_id, null))/count(distinct tmp.user_id), 2) as '第三十天的留存率'
from (select u.user_id, min(u.log_time) as first_time
from user_log as u
group by u.user_id) as tmp
join user_log as u
on u.user_id = tmp.user_id
group by tmp. first_time;

用窗口函數(shù)怎么做呢
其實(shí)以上思路的1前三步用窗口函數(shù)就可以實(shí)現(xiàn)

select tmp.first_date, count(distinct tmp.user_id) as '每天新增用戶(hù)數(shù)'
from (select u.user_id, u.log_time迁沫, min(u.log_time) over(partition by u.user_id) as first_date
from user_log as u) as tmp
group by tmp.first_date;

第二天的回訪(fǎng)數(shù)量可以這么求

select tmp.first_date, count(distinct tmp.user_id) as '第二天留存用戶(hù)數(shù)'
from (select u.user_id, u.log_time芦瘾, min(u.log_time) over(partition by u.user_id) as first_date
from user_log as u) as tmp
where datediff(tmp.log_time, tmp.first_date)= 1
group by tmp.first_date;

第三十天留存用戶(hù)數(shù)同理

select tmp.first_date, count(distinct tmp.user_id) as '第三十天留存用戶(hù)數(shù)'
from (select u.user_id, u.log_time, min(u.log_time) over(partition by u.user_id) as first_date
from user_log as u) as tmp
where datediff(tmp.log_time, tmp.first_date)= 29
group by tmp.first_date;

三表連接

select t1.first_date, t1.'每天新增用戶(hù)數(shù)',
concat(round(t2.'第二天留存用戶(hù)數(shù)'/t1.'每天新增用戶(hù)數(shù)' , 2), '%') as  '第2天回訪(fǎng)率',
concat(round(t2.'第二天留存用戶(hù)數(shù)'/t1.'每天新增用戶(hù)數(shù)' , 2), '%') as  '第30天回訪(fǎng)率'
from (select tmp.first_date, count(distinct tmp.user_id) as '每天新增用戶(hù)數(shù)'
from (select u.user_id, u.log_time近弟, min(u.log_time) over(partition by u.user_id) as first_date
from user_log as u) as tmp
group by tmp.first_date) as t1
join (select tmp.first_date, count(distinct tmp.user_id) as '第二天留存用戶(hù)數(shù)'
from (select u.user_id, u.log_time祷愉, min(u.log_time) over(partition by u.user_id) as first_date
from user_log as u) as tmp
where datediff(tmp.log_time, tmp.first_date)= 1
group by tmp.first_date) as t2
on t1. first_date = t2.first_date
join (select tmp.first_date, count(distinct tmp.user_id) as '第三十天留存用戶(hù)數(shù)'
from (select u.user_id, u.log_time, min(u.log_time) over(partition by u.user_id) as first_date
from user_log as u) as tmp
where datediff(tmp.log_time, tmp.first_date)= 29
group by tmp.first_date) as t3
on t1. first_date = t3.first_date

問(wèn)題2.找近90天,30天,7天的登錄人數(shù)
思路:

  1. 求得每個(gè)用戶(hù)每次登陸距離現(xiàn)在的時(shí)間
select u.user_id, u.log_time, datediff(now(), u.log_time) as '距今登陸時(shí)間天數(shù)'
from user_log as u

2.統(tǒng)計(jì)近7天的登陸人數(shù)

select count(distinct u.user_id)
from user_log as u
where datediff(now(), u.log_time) <= 7

30天欺殿,90天的類(lèi)似

如果一次查詢(xún)出則可以用如下方法

select 
count(distinct if(datediff(now(), u.log_time)<=7, u.user_id, null)) as '近7天登陸人數(shù)'程拭,
count(distinct if(datediff(now(), u.log_time)<=30, u.user_id, null)) as '近30天登陸人數(shù)'哺壶,
count(distinct if(datediff(now(), u.log_time)<=90, u.user_id, null)) as '近90天登陸人數(shù)'
from user_log as u

問(wèn)題3.求用戶(hù)近一個(gè)月平均登錄時(shí)間間隔(按天)
思路:

  1. 給出近一個(gè)月每次登陸前一次的登陸情況
selec u.user_id, u.log_time, lag(u.log_time) over(partition u.user_id order by u.log_time asc) as pre_log_time
from user_log as u
where datediff(now(), u.log_time) <= 30
  1. 給出用戶(hù)每次登陸的時(shí)間間隔
selec u.user_id, u.log_time - lag(u.log_time) over(partition u.user_id order by u.log_time asc) as diff
from user_log as u
where datediff(now(), u.log_time) <= 30
  1. 對(duì)用戶(hù)進(jìn)行分組 統(tǒng)計(jì)平均的間隔
select tmp.user_id, avg(tmp.diff) as '平均間隔'
from (selec u.user_id, u.log_time - lag(u.log_time) over(partition u.user_id order by u.log_time asc) as diff
from user_log as u
where datediff(now(), u.log_time) <= 30) as tmp
group by tmp.user_id

5.case專(zhuān)題-統(tǒng)計(jì)特征(中位數(shù),眾數(shù),四分位數(shù))

表1——訂單表orders资锰,字段:店鋪id(shop_id),銷(xiāo)量(sale),商品id(commodity_id)

問(wèn)題1.求每個(gè)店鋪銷(xiāo)量的中位數(shù)
這個(gè)問(wèn)題其實(shí)在這里做過(guò)
Leetcode569. 員工薪水中位數(shù)(困難)
還有另一個(gè)中位數(shù)的題目
Leetcode571. 給定數(shù)字的頻率查詢(xún)中位數(shù)(困難)
都挺難的 現(xiàn)在返回來(lái)看還是有些不會(huì)的
參考這篇文章還是學(xué)到了新的方法
解法一:
常規(guī)思路
設(shè)每個(gè)店鋪銷(xiāo)量組成的序列長(zhǎng)度是cnt
當(dāng)cnt是偶數(shù)時(shí),中位數(shù)所在序號(hào)是cnt/2,cnt/2+1
當(dāng)cnt是奇數(shù)時(shí),中位數(shù)所在序號(hào)是ceil(cnt/2)
1.先給出cnt 和每個(gè)店鋪按照銷(xiāo)量的排名

select o.shop_id, o.sale,
count(1) over(partition by o.shop_id) as cnt,
row_number() over(partition by o.shop_id order by o.sale asc) as sale_rank
from orders as o
  1. 對(duì)于cnt為奇數(shù)或偶數(shù)選出對(duì)應(yīng)排名的記錄
select tmp.shop_id, tmp.sale
from (select o.shop_id, o.sale,
count(1) over(partition by o.shop_id) as cnt,
row_number() over(partition by o.shop_id order by o.sale asc) as sale_rank
from orders as o) as tmp
where (tmp.cnt%2=0 and sale_rank in (cnt/2, cnt/2+1)) or
(tmp.cnt%2=1 and sale_rank = ceiling(cnt/2))
  1. 最后對(duì)shop_id分組 對(duì)sale取平均即可
select tmp.shop_id, avg(tmp.sale) as '中位數(shù)'
from (select o.shop_id, o.sale,
count(1) over(partition by o.shop_id) as cnt,
row_number() over(partition by o.shop_id order by o.sale asc) as sale_rank
from orders as o) as tmp
where (tmp.cnt%2=0 and sale_rank in (cnt/2, cnt/2+1)) or
(tmp.cnt%2=1 and sale_rank = ceiling(cnt/2))
group by tmp.group_id

解法二:

abs(rn - (cnt+1)/2) < 1
解釋下上面的公式:
rn是給定長(zhǎng)度為cnt的數(shù)列的序號(hào)排序,
eg:對(duì)于1,2,3,4,5,它的中位數(shù)所在序號(hào)是3,3-(5+1)/2 = 0
對(duì)于1,2,3,4,它的中位數(shù)所在序號(hào)是2,3
2 - (4+1)/2 = -0.5
3-(4+1)/2 = 0.5
可見(jiàn)(cnt+1)/2是一個(gè)數(shù)列的中間位置,如果是奇數(shù)數(shù)列,這個(gè)位置剛好是中位數(shù)所在
如果是偶數(shù),abs(rn - (cnt+1)/2) < 1

代碼思路與上邊很相似 只不過(guò)改了where條件

select shop_id,avg(sale) as median
from
(
    select shop_id,sale,
             row_number() over (partition by shop_id order by sale) as rn, -- 各商品銷(xiāo)量在其店鋪內(nèi)的順序編號(hào)
         count(1) over (partition by shop_id) as cnt -- 各店鋪的商品記錄數(shù)
    from orders
)
where abs(rn - (cnt+1)/2) < 1 -- 順序編號(hào)在店鋪商品銷(xiāo)量記錄數(shù)中間的宝磨,即為中位數(shù)
group by shop_id

解法三:不用窗口函數(shù)宽堆,不排序壁肋,利用中位數(shù)定義
這個(gè)有點(diǎn)兒繞 感興趣的可以看一下鏈接
中位數(shù)出現(xiàn)的頻率一定大于等于大于它的數(shù)和小于它的數(shù)的絕對(duì)值之差猫胁。

問(wèn)題2.求每個(gè)店鋪訂購(gòu)商品的眾數(shù)
不用窗口函數(shù)的思路:

  1. 對(duì)每個(gè)店鋪的商品出現(xiàn)次數(shù)進(jìn)行統(tǒng)計(jì)
select o.shop_id, o.commodity_id, count(*) as order_num
from order as o
group by o.shop_id, o.commodity_id
  1. 選出每個(gè)店鋪頻數(shù)最高的商品次數(shù)
select tmp.shop_id, max(order_num) as max_order_num
from (select o.shop_id, o.commodity_id, count(*) as order_num
from order as o
group by o.shop_id, o.commodity_id) as tmp
group by tmp.shop_id
  1. 利用二元in選出對(duì)應(yīng)字段即可
select tmp.shop_id, tmp.commodity_id, tmp.order_num
from (select o.shop_id, o.commodity_id, count(*) as order_num
from order as o
group by o.shop_id, o.commodity_id) as tmp
where (tmp.shop_id, tmp.order_num) in (select tmp.shop_id, max(order_num) as max_order_num
from (select o.shop_id, o.commodity_id, count(*) as order_num
from order as o
group by o.shop_id, o.commodity_id) as tmp
group by tmp.shop_id)

利用窗口函數(shù)的思路:

  1. 對(duì)每個(gè)店鋪的商品出現(xiàn)次數(shù)進(jìn)行統(tǒng)計(jì)
select o.shop_id, o.commodity_id, count(*) as order_num
from order as o
group by o.shop_id, o.commodity_id

2.選出每個(gè)店鋪頻數(shù)最高的商品次數(shù)

select tmp.shop_id, tmp.commodity_id, tmp.order_num,
max(tmp.order_num) over(partition by tmp.shop_id) as  max_order_num
from (select o.shop_id, o.commodity_id, count(*) as order_num
from order as o
group by o.shop_id, o.commodity_id) as tmp
  1. 最后選出頻數(shù)等于最高頻數(shù)即可
select tmp1.shop_id, tmp1.commodity_id, tmp1.order_num
from (select tmp.shop_id, tmp.commodity_id, tmp.order_num,
max(tmp.order_num) over(partition by tmp.shop_id) as  max_order_num
from (select o.shop_id, o.commodity_id, count(*) as order_num
from order as o
group by o.shop_id, o.commodity_id) as tmp) as tmp1
where tmp1.order_num = tmp1.max_order_num

問(wèn)題3.求四分位數(shù)

三個(gè)表
T1:good_id,cate_id(分類(lèi))
T2:mall_id(店鋪), good_id
T3:mall_id, credit_score(信用分)
問(wèn)郑藏,在不同分類(lèi)中,店鋪的信用分top25%

思路:

  1. 連接三表
select T1.good_id, T2.cate_id, T2.mall_id, T3.credit_score
from T1
join T2
on T1.good_id = T2.good_id
join T3 
on T3.mall_id = T2.mall_id
  1. 給出不同分類(lèi)下的排名 和不同分類(lèi)的總條數(shù)
select tmp.cate_id, tmp.mall_id, tmp.credit_score, 
row_number() over(partition by tmp.cate_id order by tmp.credit_score desc) as cate_rank,
count(*) over(partition by tmp.cate_id) as cate_total
from (select T1.good_id, T2.cate_id, T2.mall_id, T3.credit_score
from T1
join T2
on T1.good_id = T2.good_id
join T3 
on T3.mall_id = T2.mall_id) as tmp
  1. 選出排名小于等于總數(shù)*25%的即可
select tmp2.cate_id, tmp2.mall_id, tmp2.credit_score, tmp2.cate_rank
from (select tmp.cate_id, tmp.mall_id, tmp.credit_score,
row_number() over(partition by tmp.cate_id order by tmp.credit_score desc) as cate_rank,
count(*) over(partition by tmp.cate_id) as cate_total
from (select T1.good_id, T2.cate_id, T2.mall_id, T3.credit_score
from T1
join T2
on T1.good_id = T2.good_id
join T3 
on T3.mall_id = T2.mall_id) as tmp) as tmp2
where tmp2.cate_rank <= tmp2.cate_total*0.25

6.case專(zhuān)題-GMV周同比統(tǒng)計(jì)

表:T —— 字段:時(shí)間(sale_date),店鋪類(lèi)別(cate_id),店鋪數(shù)量(mall_num),gmv
問(wèn)題1.拼多多618前后一周內(nèi)各店鋪類(lèi)別gmv的日均提升幅度和比例

思路:

  1. 選出618前一周的gmv 和對(duì)應(yīng)7天后的gmv
select T.sale_date, T.cate_id, T.mall_num, T.gmv衡查,
lead(gmv, 7) over(partition by T.cate_id order by T.sale_date asc) as gmv_lead_7
from T
-- 窗口函數(shù)執(zhí)行實(shí)在where之后的所以這里的范圍是這樣 而不是 0617
where T.sale_date between '2020-06-11' and '2020-06-24'
  1. 選出一周前的記錄 對(duì)店鋪類(lèi)別分組 統(tǒng)計(jì)gmv的日均提升幅度和比例即可
select T.cate_id, 
avg(gmv_lead_7 - gmv) as '日均提升幅度', 
avg((gmv_lead_7 - gmv)/gmv) as '日均提升比例'
from (select T.sale_date, T.cate_id, T.mall_num, T.gmv必盖,
lead(gmv, 7) over(partition by T.cate_id order by T.sale_date asc) as gmv_lead_7
from T
where T.sale_date between '2020-06-11' and '2020-06-24') as tmp
where tmp.sale_date between '2020-06-11' and '2020-06-17'
group by T.cate_id;

問(wèn)題2.在618前一周gmv top20%拌牲,20-40%等這5類(lèi)商鋪在618后一周內(nèi)gmv日均提升幅度和比例
思路:

  1. 先給出618前一周的記錄 然后對(duì)類(lèi)別進(jìn)行分組 得到前一周每個(gè)類(lèi)別的總gmv 一遍后續(xù)求比例
select T.cate_id, sum(T.gmv) as total_gmv
from T
where T.sale_date between '2020-06-11' and '2020-06-17'
group by T.cate_id

2.根據(jù)總的gmv排名 給定五個(gè)類(lèi)別的分組

select tmp.cate_id,
count(T.cate_id) as total,
row_number() over(order by tmp.total_gmv desc) as rank
from (select T.cate_id, sum(T.gmv) as total_gmv
from T
where T.sale_date between '2020-06-11' and '2020-06-17'
group by T.cate_id) as tmp

3.根據(jù)rank/total的大小可以給定分組

select tmp1.cate_id,
case when tmp1.rank <= tmp1.total*0.1 then '10%'
else tmp1.rank <= tmp1.total*0.2 then '10%-20%'
else tmp1.rank <= tmp1.total*0.3 then '20%-30%'
else tmp1.rank <= tmp1.total*0.4 then '30%-40%'
else tmp1.rank <= tmp1.total*0.5then '40%-50%'
else tmp1.rank > tmp1.total*0.5 then '50%以上'
end as gmv_quantile
from (select tmp.cate_id,
count(T.cate_id) as total,
row_number() over(order by tmp.total_gmv desc) as rank
from (select T.cate_id, sum(T.gmv) as total_gmv
from T
where T.sale_date between '2020-06-11' and '2020-06-17'
group by T.cate_id) as tmp) as tmp1
  1. 之后就與第一問(wèn)很相似了 只不過(guò)是對(duì)于沒(méi)類(lèi)店鋪又進(jìn)行了一個(gè)分類(lèi) 然后根據(jù)這個(gè)分類(lèi)進(jìn)行統(tǒng)計(jì)
select  b.gmv_quantile, 
avg(a.gmv_lead_7 - a.gmv) as '日均提升幅度', 
avg((a.gmv_lead_7 - a.gmv)/a.gmv) as '日均提升比例' 
from (select T.sale_date, T.cate_id, T.mall_num, T.gmv歌粥,
lead(gmv, 7) over(partition by T.cate_id order by T.sale_date asc) as gmv_lead_7
from T
where T.sale_date between '2020-06-11' and '2020-06-24') as a
join (select tmp1.cate_id,
case when tmp1.rank <= tmp1.total*0.1 then '10%'
else tmp1.rank <= tmp1.total*0.2 then '10%-20%'
else tmp1.rank <= tmp1.total*0.3 then '20%-30%'
else tmp1.rank <= tmp1.total*0.4 then '30%-40%'
else tmp1.rank <= tmp1.total*0.5then '40%-50%'
else tmp1.rank > tmp1.total*0.5 then '50%以上'
end as gmv_quantile
from (select tmp.cate_id,
count(T.cate_id) as total,
row_number() over(order by tmp.total_gmv desc) as rank
from (select T.cate_id, sum(T.gmv) as total_gmv
from T
where T.sale_date between '2020-06-11' and '2020-06-17'
group by T.cate_id) as tmp) as tmp1) as b
on a.cate_id = b.cate_id
where a.sale_date between '20190611' and '20190617'
and b.gmv_quantile in ('10%','10%-20%','20%-30%','30%-40','40%-50%')
group by b.gmv_quantile

7.case專(zhuān)題-連續(xù)區(qū)間問(wèn)題

(拼多多二面面試題塌忽,就是找到一個(gè)session的開(kāi)始和結(jié)束位置,但是具體題目條件記不清楚了失驶,這里放個(gè)類(lèi)似的題目,看看這一類(lèi)題目的解題思路)
Leetcode1285. 找到連續(xù)區(qū)間的開(kāi)始和結(jié)束數(shù)字(中等)
思路:

  1. 利用定義變量的方法 判斷當(dāng)前值與前一值是否差一 若是則分組保持不變 否組分組+1
select L.log_id,
@group:=if(L.log_id = @pre_id + 1,@group, @group+1) as group,
@pre_id:=L.log_id
from Logs as L, (select @pre_id:=null, @group:=null) as init
  1. 對(duì)于每個(gè)分組 最大值為結(jié)束 最小值為起始
select min(L.log_id) as start_id, 
max(L.log_id) as end_id
from (select L.log_id,
@group:=if(L.log_id = @pre_id + 1,@group, @group+1) as group,
@pre_id:=L.log_id
from Logs as L, (select @pre_id:=null, @group:=null) as init) as tmp
group by tmp.group;

猿輔導(dǎo)面試題

8.case專(zhuān)題-學(xué)生成績(jī)分析

表:Enrollments
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| student_id | int |
| course_id | int |
| grade | int |
+---------------+---------+
(student_id, course_id) 是該表的主鍵土居。

問(wèn)題1.查詢(xún)每位學(xué)生獲得的最高成績(jī)和它所對(duì)應(yīng)的科目,若科目成績(jī)并列突勇,取 course_id 最小的一門(mén)装盯。查詢(xún)結(jié)果需按 student_id 增序進(jìn)行排序。
解法1. 利用二元in
思路:

  1. 選出沒(méi)名學(xué)生的最高成績(jī)
select E.student_id, max(E.grade) as max_grade
from Enrollments as E
group by E.student_id
  1. 利用二元in選出對(duì)應(yīng)的科目
select E.student_id, E.course_id, E.grade
from Enrollments as E
where (E.student_id, E.grade) in (select E.student_id, max(E.grade) as max_grade
from Enrollments as E
group by E.student_id)
  1. 按照student_id和grade分組 選出最小的course_id 然后按student_id排序即可
select E.student_id, min(E.course_id) as course_id, E.grade
from Enrollments as E
where (E.student_id, E.grade) in (select E.student_id, max(E.grade) as max_grade
from Enrollments as E
group by E.student_id)
group by E.student_id, E.grade
order by E.student_id asc;

解法2. 利用窗口函數(shù)
思路:

  1. 對(duì)stdent_id進(jìn)行分組 對(duì)grade desc, course_id asc進(jìn)行排序 得到成績(jī)排序(相同成績(jī)的課程id小的在前)
select E.student_id, E.course_id, E.grade,
row_number() over(partition by E.student_id order by grade desc, course_id asc) as rank 
from Enrollments as E
  1. 選出rank為1的記錄然后按student_id 增序進(jìn)行排序即可
select tmp.student_id, tmp.course_id, tmp.grade
from (select E.student_id, E.course_id, E.grade,
row_number() over(partition by E.student_id order by grade desc, course_id asc) as rank 
from Enrollments as E) as tmp
where tmp.rank = 1
order by tmp.student_id

參考鏈接的代碼:

  1. 對(duì)stdent_id進(jìn)行分組 對(duì)grade desc進(jìn)行排序 按dense_rank() 給定序號(hào) 最高分相同的序號(hào)都為1
  2. 選出序號(hào)為1 的 然后對(duì)stdent_id進(jìn)行分組 對(duì)course_id asc進(jìn)行排序 選出rank為1的
  3. 按照student_id排序即可
select a.student_id,a.course_id,a.grade
from
(
    select student_id,course_id,grade,
           row_number()over(partition by student_id order by course_id) as course_rank
    from
    (
        select student_id,course_id,grade,
               dense_rank()over(partition by student_id order by grade desc) as grade_rank
        from Enrollments
    ) t
    where t.grade_rank = 1
) a
where a.course_rank = 1
order by a.student_id

問(wèn)題2.查詢(xún)每一科目成績(jī)最高和最低分?jǐn)?shù)的學(xué)生,輸出courseid,studentid,score
思路:

  1. 先查出每門(mén)科目的最高分和最低分
select course_id,max(grade) as max_grade
from Enrollments
group by course_id

select course_id,min(grade) as min_grade
from Enrollments
group by course_id
  1. 利用二元in
select E.student_id, E.course_id, E.grade
from Enrollments as E
where (E.student_id, E.grade) in 
(select E.student_id, max(E.grade) as max_grade
from Enrollments as E
group by E.student_id) or 
 (E.student_id, E.grade) in
 (select E.student_id, min(E.grade) as min_grade
from Enrollments as E
group by E.student_id)

參考的鏈接
用連接分別選出最高分和最低分 然后用union進(jìn)行合并

select e.course_id,e.student_id,e.grade as score
from Enrollments e left join
(
    select course_id,max(grade) as max_grade
    from Enrollments
    group by course_id
) t on e.course_id = t.course_id
where e.grade = t.max_grade
union all
select e.course_id,e.student_id,e.grade as score
from Enrollments e left join
(
    select course_id,min(grade) as min_grade
    from Enrollments
    group by course_id
) t on e.course_id = t.course_id
where e.grade = t.min_grade

9.case專(zhuān)題-學(xué)生做題情況分析

表t:做題日期(time),學(xué)生id(stu_id),題目id(exer_id)

統(tǒng)計(jì)10.1-10.10每天做新題的人的數(shù)量甲馋,重點(diǎn)在每天
思路:

  1. 每名學(xué)生對(duì)于每道題目第一次做才叫新題 也就是最小的時(shí)間
select t.time, t.stu_id, t.exer_id,
min(t.time) over(partition by t.stu_id, t.exer_id) as first_time
from t
where t.time between '2020-10-01' and '2020-10-10'
  1. 如果time等于first_time就是在做新題 統(tǒng)計(jì)每名學(xué)生做新題的次數(shù) 如果為10次 則說(shuō)明每天都在做新題
select tmp.stu_id
from (select t.time, t.stu_id, t.exer_id,
min(t.time) over(partition by t.stu_id, t.exer_id) as first_time
from t
where t.time between '2020-10-01' and '2020-10-10') as tmp
where tmp.time = tmp.first_time
group by tmp.stu_id
having count(tmp.stu_id) = 10

3.統(tǒng)計(jì)最終滿(mǎn)足條件的用戶(hù)數(shù)量

select count(a.stu_id) as '10.1-10.10每天做新題的人的數(shù)量'
from (select tmp.stu_id
from (select t.time, t.stu_id, t.exer_id,
min(t.time) over(partition by t.stu_id, t.exer_id) as first_time
from t
where t.time between '2020-10-01' and '2020-10-10') as tmp
where tmp.time = tmp.first_time
group by tmp.stu_id
having count(tmp.stu_id) = 10) as a
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末埂奈,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子定躏,更是在濱河造成了極大的恐慌账磺,老刑警劉巖芹敌,帶你破解...
    沈念sama閱讀 212,454評(píng)論 6 493
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異垮抗,居然都是意外死亡氏捞,警方通過(guò)查閱死者的電腦和手機(jī)冒版,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,553評(píng)論 3 385
  • 文/潘曉璐 我一進(jìn)店門(mén)液茎,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái),“玉大人辞嗡,你說(shuō)我怎么就攤上這事捆等。” “怎么了续室?”我有些...
    開(kāi)封第一講書(shū)人閱讀 157,921評(píng)論 0 348
  • 文/不壞的土叔 我叫張陵栋烤,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我挺狰,道長(zhǎng)明郭,這世上最難降的妖魔是什么? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 56,648評(píng)論 1 284
  • 正文 為了忘掉前任丰泊,我火速辦了婚禮薯定,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘趁耗。我一直安慰自己沉唠,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 65,770評(píng)論 6 386
  • 文/花漫 我一把揭開(kāi)白布苛败。 她就那樣靜靜地躺著满葛,像睡著了一般。 火紅的嫁衣襯著肌膚如雪罢屈。 梳的紋絲不亂的頭發(fā)上嘀韧,一...
    開(kāi)封第一講書(shū)人閱讀 49,950評(píng)論 1 291
  • 那天,我揣著相機(jī)與錄音缠捌,去河邊找鬼锄贷。 笑死,一個(gè)胖子當(dāng)著我的面吹牛曼月,可吹牛的內(nèi)容都是我干的谊却。 我是一名探鬼主播,決...
    沈念sama閱讀 39,090評(píng)論 3 410
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼哑芹,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼炎辨!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起聪姿,我...
    開(kāi)封第一講書(shū)人閱讀 37,817評(píng)論 0 268
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤碴萧,失蹤者是張志新(化名)和其女友劉穎乙嘀,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體破喻,經(jīng)...
    沈念sama閱讀 44,275評(píng)論 1 303
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡虎谢,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,592評(píng)論 2 327
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了曹质。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片婴噩。...
    茶點(diǎn)故事閱讀 38,724評(píng)論 1 341
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖咆繁,靈堂內(nèi)的尸體忽然破棺而出讳推,到底是詐尸還是另有隱情顶籽,我是刑警寧澤玩般,帶...
    沈念sama閱讀 34,409評(píng)論 4 333
  • 正文 年R本政府宣布,位于F島的核電站礼饱,受9級(jí)特大地震影響坏为,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜镊绪,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 40,052評(píng)論 3 316
  • 文/蒙蒙 一匀伏、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧蝴韭,春花似錦够颠、人聲如沸。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 30,815評(píng)論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)。三九已至庆尘,卻和暖如春剃诅,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背驶忌。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 32,043評(píng)論 1 266
  • 我被黑心中介騙來(lái)泰國(guó)打工矛辕, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人付魔。 一個(gè)月前我還...
    沈念sama閱讀 46,503評(píng)論 2 361
  • 正文 我出身青樓聊品,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親几苍。 傳聞我的和親對(duì)象是個(gè)殘疾皇子翻屈,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 43,627評(píng)論 2 350