子查詢
接下來將統(tǒng)計(jì)每天每個(gè)渠道的所有事件,
最后一步是在創(chuàng)建的事件列中計(jì)算平均值哩都,為此,我們需要在上述查詢結(jié)果中進(jìn)行再好查詢。將查詢放在括號中咐汞,賦一個(gè)名字(就像為表格添加別名一樣)盖呼,作為外查詢的FROM來源,那么它就成了查詢中的查詢化撕,即子查詢几晤。
總體流程蟹瘾,先運(yùn)行內(nèi)層查詢,內(nèi)層查詢必須獨(dú)立運(yùn)行掠手,然后外查詢從內(nèi)查詢的結(jié)果集中再查詢憾朴。
整個(gè)流程如下:
- 首先,我們需要按照日期和渠道分組喷鸽。然后按事件數(shù)(第三列)排序众雷,這樣可以快速得出每個(gè)渠道每天的事件發(fā)生次數(shù)。
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC;
- 現(xiàn)在創(chuàng)建一個(gè)子查詢做祝,只是用來提供第一個(gè)查詢中的所有數(shù)據(jù)砾省。可以看出混槐,要獲得這一結(jié)果编兄,提供了整個(gè)原始表格。查詢的附加部分包括
*
声登,并且我們需要為表格設(shè)置別名翻诉。此外,是在SELECT
語句中(而不是FROM
)中提供表格捌刮。
SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub;
- 最后碰煌,我們在以下語句中能夠獲得顯示每個(gè)渠道一天的平均事件數(shù)的表格。
SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;
練習(xí)
子查詢的格式
子查詢格式
在編寫子查詢時(shí)绅作,查詢很容易就看起來很復(fù)雜芦圾。為了便于閱讀,其實(shí)日后經(jīng)常只是你自己要閱讀:
要記住的重要事項(xiàng)是俄认,在使用子查詢時(shí)个少,要讓讀者能夠輕松地判斷查詢的哪個(gè)部分將一起執(zhí)行。大部分人的做法是按照某種方式縮進(jìn)子查詢眯杏,上一頁面的解決方案就是這么做的夜焦。
這節(jié)課的示例縮進(jìn)很明顯,一直到小括號岂贩。如果你嵌套了很多的子查詢茫经,則不適用,一般法則就是思考下如何以便于閱讀的方式編寫查詢。下面給出了以多種方式編寫同一查詢的示例卸伞。你會發(fā)現(xiàn)抹镊,某些示例明顯比其他的容易閱讀。
格式糟糕的查詢
雖然這些格式糟糕的查詢和格式清晰的查詢一樣會執(zhí)行荤傲,但是卻不容易讓人理解查詢的作用垮耳!
以下是第一個(gè)示例,根本無法判斷查詢的作用:
SELECT * FROM (SELECT DATE_TRUNC('day',occurred_at) AS day, channel, COUNT(*) as events FROM web_events GROUP BY 1,2 ORDER BY 3 DESC) sub;
下面的第二個(gè)示例不是太糟糕遂黍,但是你會發(fā)現(xiàn)最后一個(gè)示例依然更容易讀懂终佛。
SELECT *
FROM (
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub;
格式清晰的查詢
與之前的示例相比,在這個(gè)格式清晰的示例中雾家,我們很容易就看出要從哪個(gè)表格中獲取數(shù)據(jù)查蓉。此外,如果在子查詢后面有 GROUP BY榜贴、ORDER BY、WHERE妹田、HAVING 或任何其他語句唬党,則按照外部查詢的同一級別縮進(jìn),正如最后一個(gè)示例所顯示的鬼佣,它是上個(gè)練習(xí)的最后一個(gè)解決方案驶拱。
SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub;
下面的查詢很相似,但是向外部查詢應(yīng)用了其他邏輯晶衷,因此按照外部查詢的級別縮進(jìn)蓝纲。而內(nèi)部查詢邏輯的縮進(jìn)級別與內(nèi)部表格匹配。
SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub
GROUP BY channel
ORDER BY 2 DESC;
最后兩個(gè)查詢?nèi)菀鬃x懂多了晌纫!
子查詢
在你寫的第一個(gè)子查詢中税迷,你編寫了一個(gè)子查詢來創(chuàng)建表格,然后可以在 FROM 語句中查詢該表格锹漱。但是箭养,如果只返回一個(gè)值,則可以在邏輯語句中使用該值哥牍,例如 WHERE毕泌、HAVING,甚至 SELECT嗅辣,該值可以嵌套在 CASE 語句中撼泛。
在下一頁面中,我們將講解這個(gè)示例澡谭,然后你將嘗試自己回答一些問題愿题。
提示
注意,在條件語句中編寫子查詢時(shí),不能包含別名抠忘。這是因?yàn)樵撟硬樵儠划?dāng)做單個(gè)值(或者對于 IN 情況是一組值)撩炊,而不是一個(gè)表格。
同時(shí)注意崎脉,這里的查詢對應(yīng)的是單個(gè)值拧咳。如果我們返回了整個(gè)列,則需要使用 IN 來執(zhí)行邏輯參數(shù)囚灼。如果我們要返回整個(gè)表格骆膝,則必須為該表格使用別名,并對整個(gè)表格執(zhí)行其他邏輯灶体。
類似的查詢方式:
以下是從 orders 表格中獲取第一個(gè)訂單的年/月信息的查詢阅签。
SELECT DATE_TRUNC('month', MIN(occurred_at))
FROM orders;
然后,為了獲取每個(gè)訂單的平均值蝎抽,我們可以在一個(gè)查詢中執(zhí)行所有的任務(wù)政钟。但是為了便于閱讀,我在下面提供了兩個(gè)查詢樟结,單獨(dú)執(zhí)行每一步养交。
SELECT AVG(standard_qty) avg_std, AVG(gloss_qty) avg_gls, AVG(poster_qty) avg_pst
FROM orders
WHERE DATE_TRUNC('month', occurred_at) =
(SELECT DATE_TRUNC('month', MIN(occurred_at)) FROM orders);
SELECT SUM(total_amt_usd)
FROM orders
WHERE DATE_TRUNC('month', occurred_at) =
(SELECT DATE_TRUNC('month', MIN(occurred_at)) FROM orders);
練習(xí) 愛上子查詢
提供每個(gè)區(qū)域擁有最高銷售額 (total_amt_usd) 的銷售代表的姓名。
- 首先瓢宦,我要算出與每個(gè)銷售代表相關(guān)的總銷售額 (total_amt_usd)碎连,并且要得出他們所在的區(qū)域。以下查詢提供了這一信息驮履。
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC;
- 接著鱼辙,得出每個(gè)區(qū)域的最高銷售額,然后使用該信息從最終結(jié)果中獲取這些行玫镐。
SELECT region_name, MAX(total_amt) total_amt
FROM(SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1, 2) inner1
GROUP BY 1;
- 本質(zhì)上倒戏,這是兩個(gè)表格的連接,其中區(qū)域和銷售額相匹配恐似。
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM(SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC) t1
JOIN (SELECT region_name, MAX(total_amt) total_amt
FROM(SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1, 2) inner1
GROUP BY 1) t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;
- 對于具有最高銷售額 (total_amt_usd) 的區(qū)域峭梳,總共下了多少個(gè)訂單?
我寫的第一個(gè)查詢是獲取每個(gè)區(qū)域的 total_amt_usd蹂喻。
SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name;
然后葱椭,我們僅從該表格中獲取銷售額最高的區(qū)域】谒模可以通過兩種方法來獲取孵运,一種是使用子查詢后的最大值,另一種是按降序排序蔓彩,然后獲取最高值治笨。
SELECT MAX(total_amt)
FROM (SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name) sub;
最終驳概,我們要獲取具有該區(qū)域銷售額的總訂單量:
SELECT r.name, SUM(o.total) total_orders
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name
HAVING SUM(o.total_amt_usd) = (
SELECT MAX(total_amt)
FROM (SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name) sub);
結(jié)果就是 Northeast,總訂單為 1230378 個(gè)旷赖。
- 對于購買標(biāo)準(zhǔn)紙張數(shù)量 (standard_qty) 最多的客戶(在作為客戶的整個(gè)時(shí)期內(nèi))顺又,有多少客戶的購買總數(shù)依然更多?
首先等孵,我們要得出購買標(biāo)準(zhǔn)紙張數(shù)量 (standard_qty) 最多的客戶稚照。以下查詢獲取了該客戶,以及總消費(fèi):
SELECT a.name account_name, SUM(o.standard_qty) total_std, SUM(o.total) total
FROM accounts a
JOIN orders o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1;
現(xiàn)在俯萌,我將使用上述信息獲取總消費(fèi)更高的所有客戶:
SELECT a.name
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY 1
HAVING SUM(o.total) > (SELECT total
FROM (SELECT a.name act_name, SUM(o.standard_qty) tot_std, SUM(o.total) total
FROM accounts a
JOIN orders o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1) sub);
上述查詢列出了具有更多訂單的客戶列表果录。我們還可以使用另一個(gè)簡單的子查詢獲取數(shù)量。
SELECT COUNT(*)
FROM (SELECT a.name
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY 1
HAVING SUM(o.total) > (SELECT total
FROM (SELECT a.name act_name, SUM(o.standard_qty) tot_std, SUM(o.total) total
FROM accounts a
JOIN orders o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1) inner_tab)
) counter_tab;
- 對于(在作為客戶的整個(gè)時(shí)期內(nèi))總消費(fèi) (total_amt_usd) 最多的客戶咐熙,他們在每個(gè)渠道上有多少 web_events弱恒?
我們首先需要獲取在整個(gè)客戶時(shí)期內(nèi)消費(fèi)最多的客戶。
SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 1;
現(xiàn)在棋恼,我們要獲取該企業(yè)(可以使用 id 進(jìn)行匹配)在每個(gè)渠道上的事件數(shù)返弹。
SELECT a.name, w.channel, COUNT(*)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id AND a.id = (SELECT id
FROM (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 1) inner_table)
GROUP BY 1, 2
ORDER BY 3 DESC;
我添加了 ORDER BY,其實(shí)并沒特別的理由爪飘,并添加了客戶名稱义起,確保僅從一個(gè)客戶那獲取數(shù)據(jù)。
- 對于總消費(fèi)前十名的客戶悦施,他們的平均終身消費(fèi) (total_amt_usd) 是多少?
首先,我們需要找出總消費(fèi) (total_amt_usd) 在前十名的客戶去团。
SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 10;
現(xiàn)在計(jì)算這十個(gè)客戶的平均消費(fèi)抡诞。
SELECT AVG(tot_spent)
FROM (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 10) temp;
- 比所有客戶的平均消費(fèi)高的企業(yè)平均終身消費(fèi) (total_amt_usd) 是多少?
首先土陪,算出所有客戶的總消費(fèi) (total_amt_usd) 平均值:
SELECT AVG(o.total_amt_usd) avg_all
FROM orders o
JOIN accounts a
ON a.id = o.account_id;
然后昼汗,只獲取高于這一平均值的客戶。
SELECT o.account_id, AVG(o.total_amt_usd)
FROM orders o
GROUP BY 1
HAVING AVG(o.total_amt_usd) > (SELECT AVG(o.total_amt_usd) avg_all
FROM orders o
JOIN accounts a
ON a.id = o.account_id);
最后鬼雀,算出這些值的平均值顷窒。
SELECT AVG(avg_amt)
FROM (SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
FROM orders o
GROUP BY 1
HAVING AVG(o.total_amt_usd) > (SELECT AVG(o.total_amt_usd) avg_all
FROM orders o
JOIN accounts a
ON a.id = o.account_id)) temp_table;
WITH
WITH 語句經(jīng)常稱為公用表表達(dá)式(簡稱 CTE)。雖然這些表達(dá)式和子查詢的目的完全一樣源哩,但是實(shí)際更常用鞋吉,因?yàn)閷ξ磥淼淖x者來說,更容易看懂其中的邏輯励烦。
練習(xí)
復(fù)習(xí)
問題:你需要算出每個(gè)渠道每天的平均事件數(shù)谓着。
解決方案:
SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;
我們使用 WITH 語句重新編寫查詢。
注意:你可以獲取內(nèi)部查詢:
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
我們在此部分放入 WITH 語句坛掠。注意赊锚,在下面我們將表格的別名設(shè)為 events:
WITH events AS (
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2)
現(xiàn)在治筒,我們可以像對待數(shù)據(jù)庫中的任何其他表格一樣使用這個(gè)新創(chuàng)建的 events 表格:
WITH events AS (
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2)
SELECT channel, AVG(events) AS average_events
FROM events
GROUP BY channel
ORDER BY 2 DESC;
對于上述示例,我們只需一個(gè)額外的表格舷蒲,但是想象下我們要創(chuàng)建第二個(gè)表格來從中獲取數(shù)據(jù)耸袜。我們可以按照以下方式來創(chuàng)建額外的表格并從中獲取數(shù)據(jù):
WITH table1 AS (
SELECT *
FROM web_events),
table2 AS (
SELECT *
FROM accounts)
SELECT *
FROM table1
JOIN table2
ON table1.account_id = table2.id;
然后,你可以按照相同的方式使用 WITH 語句添加越來越多的表格牲平。底部的練習(xí)將確保你掌握了這些新查詢的所有必要組成部分堤框。
練習(xí)
WITH 練習(xí)
本質(zhì)上,WITH 語句和子查詢執(zhí)行的任務(wù)相同欠拾。因此胰锌,你可以使用 WITH 編寫之前的任何一個(gè)查詢。嘗試使用 WITH(而不是子查詢)再次執(zhí)行之前的每個(gè)查詢藐窄。
以上是數(shù)據(jù)庫的 ERD资昧,當(dāng)你處理下面的練習(xí)時(shí),可能會用到該 ERD荆忍。你應(yīng)該將答案寫成 WITH 語句格带,而不是得出一個(gè)答案并復(fù)制輸出。這么做的重要性是查詢能夠動態(tài)地回答問題刹枉,即使數(shù)據(jù)變化了叽唱,依然能獲得正確的答案。
以下是使用 WITH 條件重新編寫的之前問題的答案微宝。這些查詢通常更容易讀懂棺亭。
提供每個(gè)區(qū)域擁有最高銷售額 (total_amt_usd) 的銷售代表的姓名。
WITH t1 AS (
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC),
t2 AS (
SELECT region_name, MAX(total_amt) total_amt
FROM t1
GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;
對于具有最高銷售額 (total_amt_usd) 的區(qū)域蟋软,總共下了多少個(gè)訂單镶摘?
WITH t1 AS (
SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name),
t2 AS (
SELECT MAX(total_amt)
FROM t1)
SELECT r.name, SUM(o.total) total_orders
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name
HAVING SUM(o.total_amt_usd) = (SELECT * FROM t2);
- 對于購買標(biāo)準(zhǔn)紙張數(shù)量 (standard_qty) 最多的客戶(在作為客戶的整個(gè)時(shí)期內(nèi)),有多少客戶的購買總數(shù)依然更多岳守?
WITH t1 AS (
SELECT a.name account_name, SUM(o.standard_qty) total_std, SUM(o.total) total
FROM accounts a
JOIN orders o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1),
t2 AS (
SELECT a.name
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY 1
HAVING SUM(o.total) > (SELECT total FROM t1))
SELECT COUNT(*)
FROM t2;
- 對于(在作為客戶的整個(gè)時(shí)期內(nèi))總消費(fèi) (total_amt_usd) 最多的客戶凄敢,他們在每個(gè)渠道上有多少 web_events?
```sql
WITH t1 AS (
SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 1)
SELECT a.name, w.channel, COUNT(*)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id AND a.id = (SELECT id FROM t1)
GROUP BY 1, 2
ORDER BY 3 DESC;
- 對于總消費(fèi)前十名的客戶湿痢,他們的平均終身消費(fèi) (total_amt_usd) 是多少?
WITH t1 AS (
SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 10)
SELECT AVG(tot_spent)
FROM t1;
- 比所有客戶的平均消費(fèi)高的企業(yè)平均終身消費(fèi) (total_amt_usd) 是多少涝缝?
WITH t1 AS (
SELECT AVG(o.total_amt_usd) avg_all
FROM orders o
JOIN accounts a
ON a.id = o.account_id),
t2 AS (
SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
FROM orders o
GROUP BY 1
HAVING AVG(o.total_amt_usd) > (SELECT * FROM t1))
SELECT AVG(avg_amt)
FROM t2;
哇!太夸張了譬重,如果你都答對了拒逮,那么很棒!