oracle學(xué)習筆記7: 高級分組

group by 用來在原始數(shù)據(jù)上創(chuàng)建聚合來將數(shù)據(jù)轉(zhuǎn)化為有用的信息葡公。

基本的group by 列出個個部門的名稱鸟雏,員工總數(shù)

select d.dname, count(empno) empcount
  from scott.dept d
  left outer join scott.emp e
    on d.deptno = e.deptno
 group by d.dname
 order by d.dname;

select列表中的每一列必須包含在group by子句中裆针。如果沒有則會導(dǎo)致錯誤咱枉。如:


SQL> select d.dname, d.loc, count(empno) empcount
  2    from scott.emp e
  3     join scott.dept d
  4      on d.deptno = e.deptno
  5   group by d.dname;
select d.dname, d.loc, count(empno) empcount
                *
第 1 行出現(xiàn)錯誤:
ORA-00979: 不是 GROUP BY 表達式

盡管包含group by子句的select語句輸出看上去是按順序列出的,你不能期待group by每次都排好序地返回數(shù)據(jù),如果輸出結(jié)果必須按照一定的順序排列陷嘴,則必須使用order by子句乎赴。

--沒有排序的group by 
select deptno,count(*)
from emp
group by deptno;
--復(fù)雜的sql
set serveroutput off;
--復(fù)雜的sql
select /* lst7-4 */
distinct dname,
         decode(d.deptno,
                1,
                (select count(*) from emp where deptno = 1),
                2,
                (select count(*) from emp where deptno = 2),
                3,
                (select count(*) from emp where deptno = 3),
                (select count(*) from emp where deptno not in (1, 2, 3))) dept_count
  from (select distinct deptno from emp) d
  join dept d2
    on d.deptno = d.deptno;

@E:\bjc2016\study\pln lst7-4

上面的寫法,會使SQL語句更加復(fù)雜難以理解并且難以維護真慢。group by子句極大的簡化必須寫的sql語句以外毅臊,還消除了數(shù)據(jù)庫不必要的IO。

set serveroutput off;
--復(fù)雜的sql
select /* lst7-5 */
distinct dname,
         count(empno) empcount
  from  emp e
  join dept d
    on d.deptno = d.deptno
    group by d.dname
    order by d.dname;

@E:\bjc2016\study\pln lst7-5

group by 優(yōu)點:

  • 使sql語句更具有可讀性
  • 書寫起來比使用很多相關(guān)子查詢更簡單
  • 減少了重復(fù)訪問同一個數(shù)據(jù)塊的次數(shù)黑界,從而得到更好的性能管嬉。

在分組之后,還在數(shù)據(jù)集上應(yīng)用了having子句朗鸠。另一方面蚯撩,在獲取數(shù)據(jù)行之后,進行分組之前烛占,應(yīng)用了where子句胎挎。having 子句中可以使用運算,函數(shù)及子查詢忆家。


SQL> --having子句
SQL> select /* lst7-6 */
  2   d.dname, trunc(e.hiredate, 'yyyy') hiredate, count(empno) empcount
  3    from emp e
  4    join dept d
  5      on e.deptno = e.deptno
  6   group by d.dname, trunc(e.hiredate, 'yyyy')
  7  having count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select min(hiredate)
  8                                                                    from scott.emp) and (select max(hiredate)
  9                                                                                           from scott.emp)
 10   order by d.dname;

未選定行

SQL> @E:\bjc2016\study\pln lst7-6
原值    8:        WHERE UPPER(SQL_TEXT) LIKE '%&1%'
新值    8:        WHERE UPPER(SQL_TEXT) LIKE '%lst7-6%'

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  0wcfknkztdxqt, child number 0
-------------------------------------
select /* lst7-6 */  d.dname, trunc(e.hiredate, 'yyyy') hiredate,
count(empno) empcount   from emp e   join dept d     on e.deptno =
e.deptno  group by d.dname, trunc(e.hiredate, 'yyyy') having
count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select
min(hiredate)
        from scott.emp) and (select max(hiredate)
                                                                   from
scott.emp)  order by d.dname


PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 239717969

---------------------------------------------------------------------------
| Id  | Operation              | Name | E-Rows |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |        |       |       |          |
|*  1 |  FILTER                |      |        |       |       |          |
|   2 |   SORT GROUP BY        |      |      1 |  2048 |  2048 | 2048  (0)|
|   3 |    MERGE JOIN CARTESIAN|      |     40 |       |       |          |
|   4 |     TABLE ACCESS FULL  | DEPT |      4 |       |       |          |
|   5 |     BUFFER SORT        |      |     10 |  2048 |  2048 | 2048  (0)|

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   6 |      TABLE ACCESS FULL | EMP  |     10 |       |       |          |
|   7 |   SORT AGGREGATE       |      |      1 |       |       |          |
|   8 |    TABLE ACCESS FULL   | EMP  |     10 |       |       |          |
|   9 |   SORT AGGREGATE       |      |      1 |       |       |          |
|  10 |    TABLE ACCESS FULL   | EMP  |     10 |       |       |          |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((COUNT(*)<=5 AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
              fmyyyy')>= AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'fmyyyy')<=))


Note
-----
   - Warning: basic plan statistics not available. These are only collected when
:

       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system leve

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
l



已選擇41行。

SQL>

group by的cube擴展

當與group by子句一起使用時弦赖,將會使得對每一行都要考慮包含在cube的參數(shù)中的所有可能的元素組合。這個運算將會生成比表中實際存在的行數(shù)更多的數(shù)據(jù)行沼沈。

-- hr.emplyees表的cube運算
select last_name, first_name
  from hr.employees
 group by first_name, last_name;

set autotrace off;
set autotrace on statistics;
with emps as
 (select /* lst-7 */
   last_name, first_name
    from hr.employees
   group by cube(last_name, first_name))
select rownum, last_name, first_name from emps;

對于每一對last_name,first_name币厕,cube將會按順序為每個元素替換為null值。cube生成的數(shù)據(jù)行在Oracle文檔中稱為超級聚合行旦装,可以在運算列中加入null值來識別页衙。


SQL> set autotrace off;
SQL> --預(yù)測cube返回行數(shù)
SQL> with counts as
  2   (select count(distinct first_name) first_name_count,
  3           count(distinct last_name) last_name_count,
  4           count(distinct(first_name || last_name)) full_name_count
  5      from hr.employees)
  6  select first_name_count,
  7         last_name_count,
  8         full_name_count,
  9         first_name_count + last_name_count + full_name_count + 1 total_count
 10    from counts;

FIRST_NAME_COUNT LAST_NAME_COUNT FULL_NAME_COUNT TOTAL_COUNT
---------------- --------------- --------------- -----------
              91             102             107         301

下面用SQL語句模擬cube,可以看出cube為我們節(jié)省了許多力氣。

--用union all生成cube數(shù)據(jù)行
with emps as (
 select last_name,first_name from hr.employees
),
mycube as (
       select last_name,first_name from emps
       union all
       select last_name,null first_name from emps
       union all
       select null last_name,first_name from emps
       union all
       select null last_name,null first_name from emps
)
select /*+ gather_plan_statistics */ * 
from mycube group by last_name,first_name;

cube實際應(yīng)用

sales_history模式中包含1998~2001年的銷售數(shù)據(jù)店乐。
下面的SQL展示2001年的所有銷售數(shù)據(jù)艰躺。并想要查看各個產(chǎn)品種類的銷售情況匯總,包含基于10年消費者年齡段眨八,收入水平的聚合腺兴;按照收入水平而不考慮年齡的匯總;以及按年齡而不考慮收入水平的聚合廉侧。

--銷售數(shù)據(jù)的union all查詢
with tsales as
 (select /* lst7-10 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001)),
gb as
 (select --Q1 - 所有分類通過收入和年齡范圍
   'Q1' query_tag,
   prod_category,
   cust_income_level,
   age_range,
   sum(profit) profit
    from tsales
   group by prod_category, cust_income_level, age_range
  union all
  select --Q2 - 所有分類通過年齡范圍
   'Q2' query_tag,
   prod_category,
   'ALL INCOME' cust_income_level,
   age_range,
   sum(profit) profit
    from tsales
   group by prod_category, 'ALL INCOME', age_range
  union all
  select --Q3 - 所有分類通過收入
   'Q3' query_tag,
   prod_category,
   cust_income_level,
   'ALL AGE' age_range,
   sum(profit) profit
    from tsales
   group by prod_category, cust_income_level, 'ALL AGE'
  union all
  select --Q4 - 所有分類
   'Q4' query_tag,
   prod_category,
   'ALL INCOME' cust_income_level,
   'ALL AGE' age_range,
   sum(profit) profit
    from tsales
   group by prod_category, 'ALL INCOME', 'ALL AGE'
  )
select * from gb order by prod_category, profit;

【語法】NVL (expr1, expr2)
【功能】若expr1為NULL页响,返回expr2;expr1不為NULL段誊,返回expr1闰蚕。
注意兩者的類型要一致
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level
若c.cust_income_level為null,則返回'A: below 30,000'
mod(x,y)
【功能】返回x除以y的余數(shù)
【參數(shù)】x,y,數(shù)字型表達式
【返回】數(shù)字
【示例】
select mod(23,8),mod(24,8) from dual;
返回:7,0
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' || to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range
上面是求年齡段连舍,如果56没陡,則求出的范圍為50_60

--用cube代替union all
with tsales as
 (select /* lst7-11 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
                     null,
                     decode(age_range, null, 4, 3),
                     decode(age_range, null, 2, 1)) query_tag,
       prod_category,
       cust_income_level,
       age_range,
       sum(profit) profit
  from tsales
 group by prod_category, cube(cust_income_level, age_range)
 order by prod_category, profit;

decode(條件,值1,翻譯值1,值2,翻譯值2,...值n,翻譯值n,缺省值)
【功能】根據(jù)條件返回相應(yīng)值
【參數(shù)】c1, c2, ...,cn,字符型/數(shù)值型/日期型,必須類型相同或null
注:值1……n 不能為條件表達式,這種情況只能用case when then end解決
·含義解釋:  
  decode(條件,值1,翻譯值1,值2,翻譯值2,...值n,翻譯值n,缺省值)  
  該函數(shù)的含義如下:  
  IF 條件=值1 THEN
  RETURN(翻譯值1)
  ELSIF 條件=值2 THEN
  RETURN(翻譯值2)
  ......
  ELSIF 條件=值n THEN
  RETURN(翻譯值n)  
  ELSE
  RETURN(缺省值)
  END IF
  
或:
  when case 條件=值1 THEN
  RETURN(翻譯值1)
  ElseCase 條件=值2 THEN
  RETURN(翻譯值2)
  ......
  ElseCase 條件=值n THEN
  RETURN(翻譯值n)  
  ELSE

RETURN(缺省值)
  END
'Q' || decode(cust_income_level, null,decode(age_range, null, 4, 3),decode(age_range, null, 2, 1)) query_tag
是返回查詢分類標識cust_income_level為null返返回decode(age_range, null, 4, 3)否則返回decode(age_range, null, 2, 1)
cust_income_level==null and age_range==null,query_tag=4
cust_income_level==null and age_range!=null,query_tag=3
cust_income_level!=null and age_range==null,query_tag=2
cust_income_level!=null and age_range!=null,query_tag=1

用grouping()函數(shù)排除空值

上面的SQL有個問題烟瞧,盡管總行數(shù)與之前使用union all運算符所得到的相一致诗鸭,一些數(shù)據(jù)行中的cust_income_level和age_range具有空值,并且有一行的這兩列都為空值参滴。當cube的參數(shù)中包含生成列的所有可能組合時强岸,每一列都有會產(chǎn)生n-1個空值,n為列表中的數(shù)目砾赔。在查詢的例子中有兩個例暴心,因此對于每個唯一的age_range值都會在cust_income_level列上產(chǎn)生空值专普。對于age_range列來說也適用同樣的規(guī)則檀夹。如果這兩列中的數(shù)據(jù)在某些行上原本就有空值炸渡,這些空值就可能出問題买决。如何辨別數(shù)據(jù)中原有的空值和cube擴展所插入的值呢?在oracle 8i中引入了grouping()函數(shù)嘁灯,可以用來識別這些超聚合行旁仿。被用來作為grouping()函數(shù)參數(shù)的表達式必須與出現(xiàn)在group by子句中的表達式相匹配。例如
decode(grouping(age_range),1,'ALL AGE',age_range) age_range
age_range檢測age_range是否有一行由cube產(chǎn)生的空值办悟,或者是否其在數(shù)據(jù)庫中本身就是空值病蛉。如果當前行是由cube生成的超聚合行則返回值為1铺然,對于其它所有情況返回值都為0魄健。
當與case()表達式或decode()函數(shù)組合時沽瘦,超聚合行中的空值可以用一個報告中有用的值替換析恋。這種情況下助隧,decode()看上去是更好的選擇并村,因為它更簡便并且grouping()函數(shù)僅有兩種可能的返回值橘霎。

--grouping()函數(shù)
--無grouping
with tsales as
 (select /* lst7-11 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
                     null,
                     decode(age_range, null, 4, 3),
                     decode(age_range, null, 2, 1)) query_tag,
       prod_category,
       cust_income_level,
       age_range,
       sum(profit) profit
  from tsales
 group by prod_category, cube(cust_income_level, age_range)
 order by prod_category, profit;

--有g(shù)rouping
--case和decode都可以工作,我更喜歡用decode
with tsales as
 (select /* lst7-12 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
                     null,
                     decode(age_range, null, 4, 3),
                     decode(age_range, null, 2, 1)) query_tag,
       prod_category,
       case grouping(cust_income_level)
         when 1 then
          'ALL INCOME'
         else
          cust_income_level
       end cust_income_level,
       decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,
       sum(profit) profit
  from tsales
 group by prod_category, cube(cust_income_level, age_range)
 order by prod_category, profit;

用grouping()擴展報告

另一種使用grouping()的方法是放在having子句中原环,用來控制在輸出中顯示哪個層級的聚合嘱吗。
使用grouping()函數(shù)可以被濃縮為對cube擴展中的各行或所有行進行滾動小計谒麦。

--在having子句中進行g(shù)rouping()
with tsales as
 (select /* lst7-13 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
                     null,
                     decode(age_range, null, 4, 3),
                     decode(age_range, null, 2, 1)) query_tag,
       prod_category,
       case grouping(cust_income_level)
         when 1 then
          'ALL INCOME'
         else
          cust_income_level
       end cust_income_level,
       decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,
       sum(profit) profit
  from tsales
 group by prod_category, cube(cust_income_level, age_range)
 --having grouping(cust_income_level)=1
 --having grouping(age_range)=1
 having grouping(cust_income_level)=1 and grouping(age_range)=1
 order by prod_category, profit;

上面sql的數(shù)據(jù)可以看到將grouping()應(yīng)用到cust_income_level列對所有age_range值跨各個收入層次創(chuàng)建聚合。對age_range列進行這樣的操作會得到類似的效果耻蛇,對所有cust_income_level值進行聚合而不考慮age_range的值臣咖。將cube擴展中的所有列作為grouping()函數(shù)的參數(shù)將會導(dǎo)致聚合被濃縮為一行類似sum(profit)和group by prod_category所實現(xiàn)的功能夺蛇。但是蚊惯,使用cube擴展簡單修改having子句就可以創(chuàng)建幾份不同的報告截型。

用grouping_id()擴展報告

grouping_id()函數(shù)相對grouping()函數(shù)來說是相對較新的宦焦,在oracle 9i中引入波闹,與grouping()函數(shù)在某種程度上是類似的精堕。不同的是grouping()計算一個表達式并返回0或1歹篓,而grouping_id()計算一個表達式庄撮,確定其參數(shù)中的哪一行(如果有的話)用來生成超聚合行洞斯,然后創(chuàng)建一個位矢量烙如,并將該值作為整形值返回厅翔。

--group_id()位矢量
with rowgen as (
     select 1 bit_1,0 bit_0
     from dual
), cubed as (
 select 
 grouping_id(bit_1,bit_0) gid,
 to_char(grouping(bit_1)) bv_1,
 to_char(grouping(bit_0)) bv_0,
 decode(grouping(bit_1),1,'GRP BIT 1') gb_1,
 decode(grouping(bit_0),1,'GRP BIT 0') gb_0
 from rowgen
 group by cube(bit_1,bit_0)
)
select gid,bv_1 || bv_0 bit_vector,
gb_1,
gb_0
from cubed
order by gid;
group_id()位矢量運行結(jié)果

我們己經(jīng)知道如何使用grouping()通過having子句來控制輸出刀闷,但考慮數(shù)據(jù)庫效率時甸昏,單獨的grouping_id()調(diào)用可以用來取代所有不同的having grouping()子句施蜜。grouping()函數(shù)的功能僅僅用來辨別數(shù)據(jù)行翻默,因為它僅能返回0或1。由于grouping_id()函數(shù)返回一個基于位矢量的數(shù)值检盼,它可以輕易被用來進行各種不同的比較而不用修改sql語句吨枉。
為什么要關(guān)注不改變SQL語句就能改變比較呢柬唯?如上面基于銷售歷史的例子中圃庭,用戶可能會被給出4個輸出選項,任意一個或多個可能會被選中师坎。用戶的選擇可以用來作為使用having grouping_id()函數(shù)的一個單獨的sql語句胯陋,而不是基于having grouping()的不同組全的多個sql語句的輸入遏乔,因此需要數(shù)據(jù)庫解析sql語句的次數(shù)也比就較少盟萨。同時這也會使得需要執(zhí)行的sql語句更少,使用更小的IO,以及更少的內(nèi)存前计。
正如使用cube來避免通過union all將多個sql語句結(jié)合起來一樣丈屹,grouping_id()能夠避免在應(yīng)用中使用多個sql語句伶棒。

--顯示所有收入層次和年齡段的聚合
variable N_ALL_DATA number 
--顯示所有年齡段的聚合
variable N_AGE_RANGE number 
--顯示所有收入層次的聚合
variable N_INCOME_LEVEL number 
--只給出匯總
variable N_SUMMAY number 
begin
 :N_ALL_DATA      := 0;      -- 1 生效
 :N_AGE_RANGE     := 2;      -- 2 生效
 :N_INCOME_LEVEL  := 0;      -- 3 生效
 :N_SUMMAY        := 4;      -- 4 生效
end;
/
with tsales as
 (select /* lst7-15 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
sum(profit) profit
from tsales
group by prod_category,cube(cust_income_level,age_range)
having grouping_id(cust_income_level,age_range)+1 in(:N_ALL_DATA,:N_AGE_RANGE,:N_INCOME_LEVEL,:N_SUMMAY)
order by prod_category,profit;

使用grouping函數(shù)也可以實現(xiàn)同的結(jié)果,但需要在having子句中進行一些測試舅锄。示例銷售歷史數(shù)據(jù)查詢在cube參數(shù)中只包含兩列皇忿。在having子句中總共需要進行4次測試鳍烁,因為grouping子句將會返回1或者0幔荒,每一列有兩個可能的值。從而需要4次測試提澎。如果3列盼忌,則需要8次谦纱,所需的測試次數(shù)將會是2的n次方跨嘉,其中n為cube中參數(shù)列或表達式的個數(shù)。

用grouping()代替grouping_id()的having子句的例子

--顯示所有收入層次和年齡段的聚合
variable N_ALL_DATA number 
--顯示所有年齡段的聚合
variable N_AGE_RANGE number 
--顯示所有收入層次的聚合
variable N_INCOME_LEVEL number 
--只給出匯總
variable N_SUMMAY number 
begin
 :N_ALL_DATA      := 0;      -- 1 生效
 :N_AGE_RANGE     := 2;      -- 2 生效
 :N_INCOME_LEVEL  := 0;      -- 3 生效
 :N_SUMMAY        := 4;      -- 4 生效
end;
/
with tsales as
 (select /* lst7-16 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
sum(profit) profit
from tsales
group by prod_category,cube(cust_income_level,age_range)
having 
 (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_ALL_DATA)
 or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_AGE_RANGE)
 or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_INCOME_LEVEL)
 or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_SUMMAY)
order by prod_category,profit;

1. 使用grouping可以判斷該行是數(shù)據(jù)庫中本來的行,還是有統(tǒng)計產(chǎn)生的行
grouping值為0時說明這個值是數(shù)據(jù)庫中本來的值,為1說明是統(tǒng)計的結(jié)果(也可以說該列為空時是1力崇,不為空時是0)
2. GROUPING_ID()函數(shù)可以接受一列或多列亮靴,返回GROUPING位向量的十進制值茧吊。GROUPING位向量的計算方法是將按照順序?qū)γ恳涣姓{(diào)用GROUPING函數(shù)的結(jié)果組合起來,所以說01和10的值不一樣的
3. group_id的使用 當group by子句中重復(fù)使用一個列時,通過group_id來去除重復(fù)值

grouping sets與rollup()

group by的grouping sets()擴展在oracle 9i中初次登場话速,前面的例子中的整個group by...having子句可以用group by grouping sets()替換乳讥。


with tsales as
 (select /* lst7-17 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
sum(profit) profit
from tsales
group by prod_category,grouping sets(
rollup(prod_category), --產(chǎn)品分類小計
(cust_income_level),--產(chǎn)品分類和收入層次
(age_range), --產(chǎn)品分類和年齡范圍
(cust_income_level,age_range) --產(chǎn)品分類云石,年齡范圍和收入層次
)
--having group_id() < 1
order by prod_category,profit;

group by cube having grouping_id()與group by grouping sets一個主要的區(qū)別是汹忠,前者能將變量設(shè)定為正確的值來簡便修改輸出错维,而后者的輸出不能修改参歹,除非修改或動態(tài)生成sql語句犬庇。修改sql語句意味著需要維護更多的代碼并且占用更多的數(shù)據(jù)庫資源臭挽。最好盡量避免使用動態(tài)生成sql語句,因為它會消耗的數(shù)據(jù)庫資源更多涨共,并且在出現(xiàn)問題時難以檢修纽帖。
某些時候grouping_sets()擴展會導(dǎo)致輸出中出現(xiàn)重復(fù)。重復(fù)是由rollup(prod_category)產(chǎn)生的举反“弥保可以通過去掉rollup()然后重新運行得到驗證,重復(fù)的行將不復(fù)存在火鼻。但是室囊,每種產(chǎn)品種類的總計也不存在了。解決的辦法就是使用group_id()函數(shù)標記重復(fù)的行魁索,并將其插入到having子句中。
在上面的sql中將--having group_id() < 1 改成 having group_id() < 1
這樣,輸出結(jié)果就如預(yù)期的那樣不包含重復(fù)的行了趁窃。有趣的是如果將rollup(prod_category)改成null,去掉having子句裆针,同時還能得到預(yù)期的輸出耘婚。代碼如下:


with tsales as
 (select /* lst7-17-1 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
sum(profit) profit
from tsales
group by prod_category,grouping sets(
null,
--rollup(prod_category), --產(chǎn)品分類小計
(cust_income_level),--產(chǎn)品分類和收入層次
(age_range), --產(chǎn)品分類和年齡范圍
(cust_income_level,age_range) --產(chǎn)品分類胞锰,年齡范圍和收入層次
)
--having group_id() < 1
order by prod_category,profit

group by的rollup()擴展也可以單獨用來計算否則將會需要由union all結(jié)合起來的多個查詢完成的小計誊册。
例如:創(chuàng)建顯示器所有名字以Sul開頭的消費者各自的購買總額報告,并且要求對每個消費者分別按年,產(chǎn)品分類進行小計,還要有所有消費的總計料祠。這種類型的任務(wù)可以使用rollup()完成顺呕。

--rollup()小計
with mysales as (
    select c.cust_last_name || ',' || c.cust_first_name cust_name,
    p.prod_category,
    to_char(trunc(time_id,'YYYY'),'YYYY') sale_year,
    p.prod_name,
    s.amount_sold
    from sh.sales s
    join sh.products p on p.prod_id=s.prod_id
    join sh.customers c on c.cust_id=s.cust_id
    where c.cust_last_name like 'Sul%'
)
select 
    decode(grouping(m.cust_name),1,'GRAND TOTAL',m.cust_name) cust_name,
    decode(grouping(m.sale_year),1,'TOTAL BY YEAR',m.sale_year) sale_year,
    decode(grouping(m.prod_category),1,'TOTAL BY CATEGORY',m.prod_category) prod_category,
    sum(m.amount_sold) amount_sold
from mysales m
group by rollup(m.cust_name,m.prod_category,m.sale_year)
order by grouping(m.cust_name), 1,2,3;

注意decode()和grouping()函數(shù)再一次被用來表示小計行启盛。使用grouping(m.cust_name)將總計顯示在報告的最后棍厂。由于這個值>0的唯一情況就是當計算所有消費者總計時张漂,這個總計值只會出現(xiàn)在報告的最后漠畜。

group by的局限性

  • LOB列瘾敢,嵌套表或數(shù)組不能用做group by表達式的一部分
SQL> with lobtest as (
  2   select to_clob(d.dname) dname
  3   from scott.emp e
  4   join scott.dept d on d.deptno=e.deptno
  5  )
  6  select l.dname
  7  from lobtest l
  8  group by l.dname;
group by l.dname
         *
第 8 行出現(xiàn)錯誤:
ORA-00932: 數(shù)據(jù)類型不一致: 應(yīng)為 -, 但卻獲得 CLOB
  • 不允許使用標量子查詢表達式

SQL> select d.dname,count(empno) empcount
  2  from scott.emp e
  3  join scott.dept d on d.deptno=e.deptno
  4  group by (select dname from scott.dept d2 where d2.dname = d.dname )
  5  order by d.dname;
group by (select dname from scott.dept d2 where d2.dname = d.dname )
          *
第 4 行出現(xiàn)錯誤:
ORA-22818: 這里不允許出現(xiàn)子查詢表達式
  • 如果group by子句引用任何對象類型的列則查詢不能并行化
create type dept_location_type as object
(
       street_address varchar2(40),
       postal_code varchar2(10),
       city varchar2(30),
       state_province varchar2(10),
       country_id char(2),
       order member function match (e dept_location_type) return integer
);
/
create or replace type body  dept_location_type
as order member function match (e dept_location_type) return integer
is 
   begin
         if city <e.city then
                 return -1;
         elsif city > e.city then
                 return 1;
         else
                 return 0;        
         end if;
   end;
end;
/      

create table deptobj
as 
select d.deptno,d.dname
from scott.dept d;     

alter table  deptobj add (dept_location dept_location_type);   

select * from deptobj;

update deptobj set dept_location=dept_location_type('1234 fenmenao st','453076','ShenZhen','GuangDong','GD') where deptno=1;
update deptobj set dept_location=dept_location_type('345 Leshan st','123456','LeShan','SiCuan','SC') where deptno=2;
update deptobj set dept_location=dept_location_type('345 ChongQing st','123456','ChongQing','ChongQing','CQ') where deptno=3;
update deptobj set dept_location=dept_location_type('345 ChangChun st','123456','ChangChun','GuiYang','GY') where deptno=4;
--對象列的并行g(shù)roup by
select /*+ gather_plan_statictics parallel(e 2)*/
d.dept_location,count(e.ename) ecount
from scott.emp e,deptobj d
where e.deptno=d.deptno
group by d.dept_location
order by d.dept_location;

執(zhí)行結(jié)果

對象列的并行g(shù)roup by

dept_location類型體中的成員函數(shù)匹配用來進行城市值的比較郑临,然后使用group by將雇員按城市分組典奉。最后一個列出的局限性在后期的版本是可以工作的陕靠。

總結(jié)

Oracle以group by子句擴展的形式為SQL開發(fā)者提供了一些極佳的工具税肪,幫助我們不僅能夠減少代碼量,并且能提高數(shù)據(jù)庫效率。大多數(shù)的特性也要與其它不同的功能進行組合滔以。

group by總結(jié)
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末敦迄,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子瞧柔,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 219,427評論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件雹拄,死亡現(xiàn)場離奇詭異势篡,居然都是意外死亡绷蹲,警方通過查閱死者的電腦和手機疤估,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,551評論 3 395
  • 文/潘曉璐 我一進店門雕什,熙熙樓的掌柜王于貴愁眉苦臉地迎上來磷雇,“玉大人,你說我怎么就攤上這事抄瑟⊥什猓” “怎么了?”我有些...
    開封第一講書人閱讀 165,747評論 0 356
  • 文/不壞的土叔 我叫張陵汛兜,是天一觀的道長巴粪。 經(jīng)常有香客問我,道長粥谬,這世上最難降的妖魔是什么肛根? 我笑而不...
    開封第一講書人閱讀 58,939評論 1 295
  • 正文 為了忘掉前任,我火速辦了婚禮漏策,結(jié)果婚禮上派哲,老公的妹妹穿的比我還像新娘。我一直安慰自己掺喻,他們只是感情好芭届,可當我...
    茶點故事閱讀 67,955評論 6 392
  • 文/花漫 我一把揭開白布储矩。 她就那樣靜靜地躺著,像睡著了一般褂乍。 火紅的嫁衣襯著肌膚如雪持隧。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,737評論 1 305
  • 那天逃片,我揣著相機與錄音屡拨,去河邊找鬼。 笑死褥实,一個胖子當著我的面吹牛呀狼,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播损离,決...
    沈念sama閱讀 40,448評論 3 420
  • 文/蒼蘭香墨 我猛地睜開眼哥艇,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了僻澎?” 一聲冷哼從身側(cè)響起貌踏,我...
    開封第一講書人閱讀 39,352評論 0 276
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎怎棱,沒想到半個月后哩俭,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,834評論 1 317
  • 正文 獨居荒郊野嶺守林人離奇死亡拳恋,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,992評論 3 338
  • 正文 我和宋清朗相戀三年凡资,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片谬运。...
    茶點故事閱讀 40,133評論 1 351
  • 序言:一個原本活蹦亂跳的男人離奇死亡隙赁,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出梆暖,到底是詐尸還是另有隱情伞访,我是刑警寧澤,帶...
    沈念sama閱讀 35,815評論 5 346
  • 正文 年R本政府宣布轰驳,位于F島的核電站厚掷,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏级解。R本人自食惡果不足惜冒黑,卻給世界環(huán)境...
    茶點故事閱讀 41,477評論 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望勤哗。 院中可真熱鬧抡爹,春花似錦、人聲如沸芒划。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,022評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至泵殴,卻和暖如春涮帘,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背袋狞。 一陣腳步聲響...
    開封第一講書人閱讀 33,147評論 1 272
  • 我被黑心中介騙來泰國打工焚辅, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人苟鸯。 一個月前我還...
    沈念sama閱讀 48,398評論 3 373
  • 正文 我出身青樓,卻偏偏與公主長得像棚点,于是被迫代替她去往敵國和親早处。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 45,077評論 2 355

推薦閱讀更多精彩內(nèi)容