PostgreSQL 源碼解讀（35）- 查詢語句#20（查詢優(yōu)化-簡化Having和GroupBy）

本節(jié)簡單介紹了PG查詢優(yōu)化中對Having和Group By子句的簡化處理矿微。

一、基本概念

簡化Having語句
把Having中的約束條件,如滿足可以提升到Where條件中的,則移動到Where子句中,否則仍保留在Having語句中.這樣做的目的是因為Having過濾在Group by之后執(zhí)行,如能把Having中的過濾提升到Where中,則可以提前執(zhí)行"選擇"運算,減少Group by的開銷.
以下語句,條件dwbh='1002'提升到Where中執(zhí)行:

testdb=# explain verbose select a.dwbh,a.xb,count(*) 
testdb-# from t_grxx a 
testdb-# group by a.dwbh,a.xb
testdb-# having count(*) >= 1 and dwbh = '1002';
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 GroupAggregate  (cost=15.01..15.06 rows=1 width=84)
   Output: dwbh, xb, count(*)
   Group Key: a.dwbh, a.xb
   Filter: (count(*) >= 1) -- count(*) >= 1 仍保留在Having中
   ->  Sort  (cost=15.01..15.02 rows=2 width=76)
         Output: dwbh, xb
         Sort Key: a.xb
         ->  Seq Scan on public.t_grxx a  (cost=0.00..15.00 rows=2 width=76)
               Output: dwbh, xb
               Filter: ((a.dwbh)::text = '1002'::text) -- 提升到Where中,掃描時過濾Tuple
(10 rows)

如存在Group by & Grouping sets則不作處理:

testdb=# explain verbose
testdb-# select a.dwbh,a.xb,count(*) 
testdb-# from t_grxx a 
testdb-# group by 
testdb-# grouping sets ((a.dwbh),(a.xb),())
testdb-# having count(*) >= 1 and dwbh = '1002'
testdb-# order by a.dwbh,a.xb;
                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Sort  (cost=28.04..28.05 rows=3 width=84)
   Output: dwbh, xb, (count(*))
   Sort Key: a.dwbh, a.xb
   ->  MixedAggregate  (cost=0.00..28.02 rows=3 width=84)
         Output: dwbh, xb, count(*)
         Hash Key: a.dwbh
         Hash Key: a.xb
         Group Key: ()
         Filter: ((count(*) >= 1) AND ((a.dwbh)::text = '1002'::text)) -- 掃描數(shù)據(jù)表后再過濾
         ->  Seq Scan on public.t_grxx a  (cost=0.00..14.00 rows=400 width=76)
               Output: dwbh, grbh, xm, xb, nl
(11 rows)

簡化Group by語句
如Group by中的字段列表已包含某個表主鍵的所有列,則該表在Group by語句中的其他列可以刪除,這樣的做法有利于提升在Group by過程中排序或Hash的性能,減少不必要的開銷.

testdb=# explain verbose select a.dwbh,a.dwmc,count(*) 
testdb-# from t_dwxx a 
testdb-# group by a.dwbh,a.dwmc
testdb-# having count(*) >= 1;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 HashAggregate  (cost=13.20..15.20 rows=53 width=264)
   Output: dwbh, dwmc, count(*)
   Group Key: a.dwbh, a.dwmc -- 分組鍵為dwbh & dwmc
   Filter: (count(*) >= 1)
   ->  Seq Scan on public.t_dwxx a  (cost=0.00..11.60 rows=160 width=256)
         Output: dwmc, dwbh, dwdz
(6 rows)

testdb=# alter table t_dwxx add primary key(dwbh); -- 添加主鍵
ALTER TABLE
testdb=# explain verbose select a.dwbh,a.dwmc,count(*) 
from t_dwxx a 
group by a.dwbh,a.dwmc
having count(*) >= 1;
                              QUERY PLAN                               
-----------------------------------------------------------------------
 HashAggregate  (cost=1.05..1.09 rows=1 width=264)
   Output: dwbh, dwmc, count(*)
   Group Key: a.dwbh -- 分組鍵只保留dwbh
   Filter: (count(*) >= 1)
   ->  Seq Scan on public.t_dwxx a  (cost=0.00..1.03 rows=3 width=256)
         Output: dwmc, dwbh, dwdz
(6 rows)

二、源碼解讀

相關(guān)處理的源碼位于文件subquery_planner.c中,主函數(shù)為subquery_planner,代碼片段如下:

     /*
      * In some cases we may want to transfer a HAVING clause into WHERE. We
      * cannot do so if the HAVING clause contains aggregates (obviously) or
      * volatile functions (since a HAVING clause is supposed to be executed
      * only once per group).  We also can't do this if there are any nonempty
      * grouping sets; moving such a clause into WHERE would potentially change
      * the results, if any referenced column isn't present in all the grouping
      * sets.  (If there are only empty grouping sets, then the HAVING clause
      * must be degenerate as discussed below.)
      *
      * Also, it may be that the clause is so expensive to execute that we're
      * better off doing it only once per group, despite the loss of
      * selectivity.  This is hard to estimate short of doing the entire
      * planning process twice, so we use a heuristic: clauses containing
      * subplans are left in HAVING.  Otherwise, we move or copy the HAVING
      * clause into WHERE, in hopes of eliminating tuples before aggregation
      * instead of after.
      *
      * If the query has explicit grouping then we can simply move such a
      * clause into WHERE; any group that fails the clause will not be in the
      * output because none of its tuples will reach the grouping or
      * aggregation stage.  Otherwise we must have a degenerate (variable-free)
      * HAVING clause, which we put in WHERE so that query_planner() can use it
      * in a gating Result node, but also keep in HAVING to ensure that we
      * don't emit a bogus aggregated row. (This could be done better, but it
      * seems not worth optimizing.)
      *
      * Note that both havingQual and parse->jointree->quals are in
      * implicitly-ANDed-list form at this point, even though they are declared
      * as Node *.
      */
     newHaving = NIL;
     foreach(l, (List *) parse->havingQual)//存在Having條件語句
     {
         Node       *havingclause = (Node *) lfirst(l);//獲取謂詞
 
         if ((parse->groupClause && parse->groupingSets) ||
             contain_agg_clause(havingclause) ||
             contain_volatile_functions(havingclause) ||
             contain_subplans(havingclause))
         {
             /* keep it in HAVING */
             //如果有Group&&Group Sets語句
             //保持不變
             newHaving = lappend(newHaving, havingclause);
         }
         else if (parse->groupClause && !parse->groupingSets)
         {
             /* move it to WHERE */
             //只有g(shù)roup語句,可以加入到j(luò)ointree的條件中
             parse->jointree->quals = (Node *)
                 lappend((List *) parse->jointree->quals, havingclause);
         }
         else//既沒有g(shù)roup也沒有g(shù)rouping set,拷貝一份到j(luò)ointree的條件中
         {
             /* put a copy in WHERE, keep it in HAVING */
             parse->jointree->quals = (Node *)
                 lappend((List *) parse->jointree->quals,
                         copyObject(havingclause));
             newHaving = lappend(newHaving, havingclause);
         }
     }
     parse->havingQual = (Node *) newHaving;//調(diào)整having子句
 
     /* Remove any redundant GROUP BY columns */
     remove_useless_groupby_columns(root);//去掉group by中無用的數(shù)據(jù)列

remove_useless_groupby_columns

 /*
  * remove_useless_groupby_columns
  *      Remove any columns in the GROUP BY clause that are redundant due to
  *      being functionally dependent on other GROUP BY columns.
  *
  * Since some other DBMSes do not allow references to ungrouped columns, it's
  * not unusual to find all columns listed in GROUP BY even though listing the
  * primary-key columns would be sufficient.  Deleting such excess columns
  * avoids redundant sorting work, so it's worth doing.  When we do this, we
  * must mark the plan as dependent on the pkey constraint (compare the
  * parser's check_ungrouped_columns() and check_functional_grouping()).
  *
  * In principle, we could treat any NOT-NULL columns appearing in a UNIQUE
  * index as the determining columns.  But as with check_functional_grouping(),
  * there's currently no way to represent dependency on a NOT NULL constraint,
  * so we consider only the pkey for now.
  */
 static void
 remove_useless_groupby_columns(PlannerInfo *root)
 {
     Query      *parse = root->parse;//查詢樹
     Bitmapset **groupbyattnos;//位圖集合
     Bitmapset **surplusvars;//位圖集合
     ListCell   *lc;
     int         relid;
 
     /* No chance to do anything if there are less than two GROUP BY items */
     if (list_length(parse->groupClause) < 2)//如果只有1個ITEMS,無需處理
         return;
 
     /* Don't fiddle with the GROUP BY clause if the query has grouping sets */
     if (parse->groupingSets)//存在Grouping sets,不作處理
         return;
 
     /*
      * Scan the GROUP BY clause to find GROUP BY items that are simple Vars.
      * Fill groupbyattnos[k] with a bitmapset of the column attnos of RTE k
      * that are GROUP BY items.
      */
     //用于分組的屬性 
     groupbyattnos = (Bitmapset **) palloc0(sizeof(Bitmapset *) *
                                            (list_length(parse->rtable) + 1));
     foreach(lc, parse->groupClause)
     {
         SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
         TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);
         Var        *var = (Var *) tle->expr;
 
         /*
          * Ignore non-Vars and Vars from other query levels.
          *
          * XXX in principle, stable expressions containing Vars could also be
          * removed, if all the Vars are functionally dependent on other GROUP
          * BY items.  But it's not clear that such cases occur often enough to
          * be worth troubling over.
          */
         if (!IsA(var, Var) ||
             var->varlevelsup > 0)
             continue;
 
         /* OK, remember we have this Var */
         relid = var->varno;
         Assert(relid <= list_length(parse->rtable));
         groupbyattnos[relid] = bms_add_member(groupbyattnos[relid],
                                               var->varattno - FirstLowInvalidHeapAttributeNumber);
     }
 
     /*
      * Consider each relation and see if it is possible to remove some of its
      * Vars from GROUP BY.  For simplicity and speed, we do the actual removal
      * in a separate pass.  Here, we just fill surplusvars[k] with a bitmapset
      * of the column attnos of RTE k that are removable GROUP BY items.
      */
     surplusvars = NULL;         /* don't allocate array unless required */
     relid = 0;
     //如某個Relation的分組鍵中已含主鍵列,去掉其他列
     foreach(lc, parse->rtable)
     {
         RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
         Bitmapset  *relattnos;
         Bitmapset  *pkattnos;
         Oid         constraintOid;
 
         relid++;
 
         /* Only plain relations could have primary-key constraints */
         if (rte->rtekind != RTE_RELATION)
             continue;
 
         /* Nothing to do unless this rel has multiple Vars in GROUP BY */
         relattnos = groupbyattnos[relid];
         if (bms_membership(relattnos) != BMS_MULTIPLE)
             continue;
 
         /*
          * Can't remove any columns for this rel if there is no suitable
          * (i.e., nondeferrable) primary key constraint.
          */
         pkattnos = get_primary_key_attnos(rte->relid, false, &constraintOid);
         if (pkattnos == NULL)
             continue;
 
         /*
          * If the primary key is a proper subset of relattnos then we have
          * some items in the GROUP BY that can be removed.
          */
         if (bms_subset_compare(pkattnos, relattnos) == BMS_SUBSET1)
         {
             /*
              * To easily remember whether we've found anything to do, we don't
              * allocate the surplusvars[] array until we find something.
              */
             if (surplusvars == NULL)
                 surplusvars = (Bitmapset **) palloc0(sizeof(Bitmapset *) *
                                                      (list_length(parse->rtable) + 1));
 
             /* Remember the attnos of the removable columns */
             surplusvars[relid] = bms_difference(relattnos, pkattnos);
 
             /* Also, mark the resulting plan as dependent on this constraint */
             parse->constraintDeps = lappend_oid(parse->constraintDeps,
                                                 constraintOid);
         }
     }
 
     /*
      * If we found any surplus Vars, build a new GROUP BY clause without them.
      * (Note: this may leave some TLEs with unreferenced ressortgroupref
      * markings, but that's harmless.)
      */
     if (surplusvars != NULL)
     {
         List       *new_groupby = NIL;
 
         foreach(lc, parse->groupClause)
         {
             SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
             TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);
             Var        *var = (Var *) tle->expr;
 
             /*
              * New list must include non-Vars, outer Vars, and anything not
              * marked as surplus.
              */
             if (!IsA(var, Var) ||
                 var->varlevelsup > 0 ||
                 !bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
                                surplusvars[var->varno]))
                 new_groupby = lappend(new_groupby, sgc);
         }
 
         parse->groupClause = new_groupby;
     }
 }

三守伸、參考資料

planner.c

最后編輯于：2018.09.06 14:46:35

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子反肋，更是在濱河造成了極大的恐慌，老刑警劉巖踏施，帶你破解...
沈念sama閱讀 207,113評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件石蔗，死亡現(xiàn)場離奇詭異，居然都是意外死亡畅形，警方通過查閱死者的電腦和手機养距，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,644評論 2贊 381
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來日熬，“玉大人棍厌，你說我怎么就攤上這事。” “怎么了耘纱？”我有些...
開封第一講書人閱讀 153,340評論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵敬肚，是天一觀的道長。經(jīng)常有香客問我束析，道長帘皿，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 55,449評論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任畸陡，我火速辦了婚禮鹰溜，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘丁恭。我一直安慰自己曹动，他們只是感情好，可當我...
茶點故事閱讀 64,445評論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布牲览。她就那樣靜靜地躺著墓陈，像睡著了一般。火紅的嫁衣襯著肌膚如雪第献。梳的紋絲不亂的頭發(fā)上贡必，一...
開封第一講書人閱讀 49,166評論 1贊 284
城市分裂傳說
那天，我揣著相機與錄音庸毫，去河邊找鬼仔拟。笑死，一個胖子當著我的面吹牛飒赃，可吹牛的內(nèi)容都是我干的利花。我是一名探鬼主播，決...
沈念sama閱讀 38,442評論 3贊 401
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼载佳，長吁一口氣：“原來是場噩夢啊……” “哼炒事！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起蔫慧，我...
開封第一講書人閱讀 37,105評論 0贊 261
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤挠乳，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后姑躲，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體睡扬，經(jīng)...
沈念sama閱讀 43,601評論 1贊 300
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 36,066評論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年肋联，在試婚紗的時候發(fā)現(xiàn)自己被綠了威蕉。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 38,161評論 1贊 334
活死人
序言：一個原本活蹦亂跳的男人離奇死亡橄仍，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情侮繁，我是刑警寧澤虑粥，帶...
沈念sama閱讀 33,792評論 4贊 323
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站宪哩，受9級特大地震影響娩贷，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜锁孟，卻給世界環(huán)境...
茶點故事閱讀 39,351評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一彬祖、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧品抽，春花似錦储笑、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,352評論 0贊 19
一樁弒父案突倍，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至盆昙，卻和暖如春羽历，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背淡喜。一陣腳步聲響...
開封第一講書人閱讀 31,584評論 1贊 261
情欲美人皮
我被黑心中介騙來泰國打工秕磷，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人炼团。一個月前我還...
沈念sama閱讀 45,618評論 2贊 355
代替公主和親
正文我出身青樓跳夭，卻偏偏與公主長得像，于是被迫代替她去往敵國和親们镜。傳聞我的和親對象是個殘疾皇子币叹，可洞房花燭夜當晚...
茶點故事閱讀 42,916評論 2贊 344

PostgreSQL 源碼解讀（35）- 查詢語句#20（查詢優(yōu)化-簡化Having和GroupBy）

一、基本概念

二、源碼解讀

三守伸、參考資料

推薦閱讀更多精彩內(nèi)容