最近遇到一個(gè)這么一個(gè)問題,有一個(gè)語句,查詢的where條件為佛吓,select a,b,c from table_name where a>xxx and b=xxx order by c克蚂;那這種情況該如何正確建造聯(lián)合索引?有人說建聯(lián)合索引(a,b,c)瞻赶。也有說(a,c,b)赛糟、(b,a,c)、(b,c,a)甚至還有(b,c)砸逊。還各有各的說法璧南,有說等值優(yōu)先,又有說师逸。開銷最小的放在最后面司倚。等等,可是到底應(yīng)該如何建造聯(lián)合索引,才能有最小的開銷动知,最好的效率皿伺,看來只能實(shí)際測試一下了。
準(zhǔn)備環(huán)境
1.先使用sysbench創(chuàng)建一個(gè)測試表盒粮,先寫入100w條數(shù)據(jù)鸵鸥。
root # sysbench /usr/share/sysbench/oltp_update_index.lua --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=test --mysql-password=123456 --mysql-db=sbtest --table-size=1000000 --tables=1 --threads=10 --report-interval=3 --time=20 prepare
2.先看一下表結(jié)構(gòu),和數(shù)據(jù)。
mysql> show create table sbtest1\G
*************************** 1. row ***************************
Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`k` int(11) NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4
mysql> select * from sbtest1 limit 10;
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 1 | 504773 | 46953504341-24827434769-14900357301-22216603386-59015331152-64421594353-91063882286-20886127030-43555962287-78841524137 | 93637428753-19078512484-07499046167-90467648315-23800958676 |
| 2 | 383019 | 95718881762-40242328339-05770398751-81456272523-88750816056-12849082499-99426386575-67936738491-05967978445-48490321969 | 28062089678-53016824652-82043772113-00345737256-89531328849 |
| 3 | 503847 | 01038385872-50361705817-46318905997-76679583539-88685166918-98634648550-40413039627-46247204716-09242720804-66101780709 | 24034894203-99292672185-45043562307-83346840810-30785340141 |
| 4 | 499704 | 93953438013-23388274922-95983341481-57221053014-04363900317-67253124803-28970744159-73839965368-87044223191-51910643768 | 06048066221-15243579179-69120208226-58470567091-17594624310 |
| 5 | 504333 | 71261094016-71756798329-41630209368-07219841001-81466827148-40549506058-53017560040-16501082010-89928032477-46063916453 | 00213142226-22736525923-92307256421-88250677601-39977405598 |
| 6 | 503552 | 32753709874-02382056451-81693154231-27180260479-19599038117-11097832377-93022647238-28037030398-75145265570-28677929900 | 72055222201-67390314106-05261498936-44609701627-90239230652 |
| 7 | 406374 | 64005369708-01328903521-83628938492-58715179738-86878139875-08682365457-20667146951-39733142900-50703403349-70540498661 | 51830182144-85315866878-16093318939-33670981339-30608544097 |
| 8 | 499282 | 49820770300-84763632943-74879578402-29401740516-28978918607-99083669177-42065952807-09235614265-01166968728-26325800927 | 85626280648-53691844105-62670548340-12193503415-17217200408 |
| 9 | 501253 | 97138737797-22390438815-38895931199-24202958945-61107229413-58522735620-64288660669-68685322850-20820693122-52956464894 | 36554471751-65919787106-00809866023-48790163561-33909214861 |
| 10 | 497864 | 09547967219-39906583629-69077413839-69484437464-13418798478-47928808080-54901501970-58233953813-72479608919-63475855320 | 59346922262-65082899599-23058506484-38409416453-58642527076 |
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
根據(jù)上方的條件丹皱,我們創(chuàng)造一條本案例一樣的語句:select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;也就是說這條語句只有一個(gè)結(jié)果集妒穴,比較直觀一些,k代表a摊崭、pad代表b讼油、c代表c。
沒有索引的情況
從下方可以看出呢簸,沒有索引必然是全表掃描矮台,100w的數(shù)據(jù),掃描大概98w行阔墩。時(shí)間耗時(shí)0.32s嘿架,Extra為Using where; Using filesort
Using where:表示不一定是沒用到索引,而是就算有索引啸箫,但是where條件的需求滿足不了耸彪,需要獲取所需要的數(shù)據(jù)行,這種情況效率是有一些糟糕的忘苛。
Using filesort:這個(gè)表示本次查詢使用到了額外的排序蝉娜,就是當(dāng)索引中無法完成排序,就需要在臨時(shí)文件中進(jìn)行額外的排序扎唾,因?yàn)槲覀冋Z句中有order by召川,如果需要排序的數(shù)據(jù)比較多,這種情況也是比較糟糕的胸遇。
mysql> select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
1 row in set (0.32 sec)
mysql> explain select k,c,pad from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
| 1 | SIMPLE | sbtest1 | NULL | ALL | NULL | NULL | NULL | NULL | 986328 | 3.33 | Using where; Using filesort |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
1 row in set, 1 warning (0.00 sec)
增加索引(a,b,c)
從下方測試可以看出荧呐,加了索引確實(shí)好了一點(diǎn),執(zhí)行時(shí)間0.06s纸镊,變成了索引范圍掃描(range)倍阐,但是還是用到了額外的排序,而且掃描行數(shù)為20多w逗威。
Using index:表示索引覆蓋峰搪,因?yàn)槲覀兪莝elect a,b,c而不是select * 所以可能會(huì)用到索引覆蓋,不知道索引覆蓋是什么的可以看我另外一篇文章有詳細(xì)介紹凯旭。
mysql> alter table sbtest1 add index id_test(k,pad,c);
mysql> explain select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
+----+-------------+---------+------------+-------+---------------+---------+---------+------+--------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+--------+----------+------------------------------------------+
| 1 | SIMPLE | sbtest1 | NULL | range | id_test | id_test | 244 | NULL | 256480 | 10.00 | Using where; Using index; Using filesort |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+--------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
1 row in set (0.06 sec)
增加索引(a,c,b)
從下方結(jié)果來看概耻,(a,c,b)和(a,b,c)基本上區(qū)別不大...
mysql> alter table sbtest1 add index id_test(k,c,pad);
mysql> OPTIMIZE TABLE sbtest1;
mysql> select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
1 row in set (0.05 sec)
mysql> explain select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
+----+-------------+---------+------------+-------+---------------+---------+---------+------+--------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+--------+----------+------------------------------------------+
| 1 | SIMPLE | sbtest1 | NULL | range | id_test | id_test | 4 | NULL | 254162 | 10.00 | Using where; Using index; Using filesort |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+--------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)
增加索引(b,a,c)
從這里看效率已經(jīng)很高了使套,掃描行數(shù)1行,但是索引是range范圍掃描鞠柄,并且依然有臨時(shí)表排序侦高。
mysql> alter table sbtest1 add index id_test(pad,k,c);
mysql> OPTIMIZE TABLE sbtest1;
mysql> select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
1 row in set (0.00 sec)
mysql> explain select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+------------------------------------------+
| 1 | SIMPLE | sbtest1 | NULL | range | id_test | id_test | 244 | NULL | 1 | 100.00 | Using where; Using index; Using filesort |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)
增加索引(b,c,a)
此時(shí)可以看到,索引的效率已經(jīng)非常高了春锋,ref,const矫膨,唯一查詢,并且沒有臨時(shí)排序期奔,因?yàn)閎值相等于c,所以無需排序危尿,因?yàn)樗饕斜緛砭褪怯行驘o需再進(jìn)行排序呐萌。
mysql> alter table sbtest1 add index id_test(pad,c,k);
mysql> explain select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+--------------------------+
| 1 | SIMPLE | sbtest1 | NULL | ref | id_test | id_test | 240 | const | 1 | 33.33 | Using where; Using index |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+--------------------------+
1 row in set, 1 warning (0.01 sec)
mysql> select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
1 row in set (0.00 sec)
增加索引(b,c)
這里效率整體來看和(b,c,a)一樣,而且同樣沒有額外的排序谊娇,但是因?yàn)槲覀兪且榉喂拢瑂lect a,b,c,所以如果不加a的索引济欢,就用不到覆蓋索引赠堵,相比較(b,c,a)多一次通過主鍵回表找a列數(shù)據(jù)的操作。但是因?yàn)槭侵麈I其實(shí)效率還好法褥。因?yàn)榻?b,c,a)走到最后a的時(shí)候也是需要在輔助索引樹中茫叭,進(jìn)行for循環(huán)判斷a列的。
mysql> drop index id_test on sbtest1;
mysql> alter table sbtest1 add index id_test(pad,c);
mysql> OPTIMIZE TABLE sbtest1;
mysql> explain select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+------------------------------------+
| 1 | SIMPLE | sbtest1 | NULL | ref | id_test | id_test | 240 | const | 1 | 33.33 | Using where |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+------------------------------------+
1 row in set, 1 warning (0.01 sec)
mysql> select k,pad,c from sbtest1 where k>383019 and pad='93637428753-19078512484-07499046167-90467648315-23800958676' order by c;
1 row in set (0.00 sec)
B+tree索引概念
在解釋之前半等,我們需要了解一下B+tree索引中比較重要的兩個(gè)概念:
第一個(gè):B+tree在創(chuàng)建索引時(shí)揍愁,是按照最左側(cè)進(jìn)行排序,例如(a,b,c)三列建造聯(lián)合索引杀饵,先按照a排序莽囤,如過a的值一樣,那就按照第二列b進(jìn)行排序切距,b是相等的話朽缎,在按照c排序...
-
第二個(gè):因?yàn)楦鶕?jù)上方的排序方式,所以就導(dǎo)致了谜悟,B+tree的聯(lián)合索引话肖,在select查詢時(shí),如果查詢條件赌躺,是范圍查詢?nèi)鏰>?或者a<?狼牺,就會(huì)導(dǎo)致右側(cè),也就是后面的where條件無法走索引礼患。例如是钥,索引(a,b,c)掠归,查詢條件為a>? and b=? c=?這樣的話,后面的b和c就無法使用到索引了悄泥,因?yàn)锽+tree遵循虏冻,最左原則,先按照索引查a弹囚,a在范圍查找后厨相,b和c在索引樹中就是無序的,無序的自然也就沒辦法使用索引了鸥鹉。
請(qǐng)看下圖蛮穿,a,b兩列建造索引,先按照a排序毁渗,所以a是有序的践磅,b在a相同的情況下,b是有序的灸异。
image.png
結(jié)論
(b,c,a)≈(b,c)>(b,a,c)>(a,c,b)≈(a,b,c)
根據(jù)上面的兩個(gè)核心點(diǎn)府适,我們現(xiàn)在整理理解一下本篇文章的結(jié)論,為什么b,c沒有建造a的索引肺樟,也能和b,c,a效率一樣呢檐春?
- 那是因?yàn)榫退憬ㄔ炝薬,也是用不上的么伯,因?yàn)閎是等值疟暖,c是做order排序,B+tree索引中的聯(lián)合索引蹦狂,是先排序再創(chuàng)建的誓篱,所以結(jié)果是有序的,無需再進(jìn)行排序凯楔,也就沒有了filesort窜骄。但走到最后判斷a的時(shí)候,此時(shí)a是無序的摆屯,所以不滿足使用B+tree索引的條件邻遏。
那在(b,c)索引中,最后是怎么獲取a這列數(shù)據(jù)的虐骑?
- b等值查詢后准验,因?yàn)槭堑戎担詂也是有序的廷没,無需排序糊饱,最后到a的時(shí)候,因?yàn)樗饕龢渲袥]有a這個(gè)列颠黎,所以需要通過主鍵id獲取a列數(shù)據(jù)進(jìn)行比對(duì)另锋。因?yàn)槭峭ㄟ^主鍵滞项,效率其實(shí)還好。但是因?yàn)樗枰樵兊臄?shù)據(jù)是select a,b,c所以在索引樹中無法直接獲取到所有數(shù)據(jù)夭坪,也就沒了using index覆蓋索引文判。
那在(b,c,a)索引中,最后是怎么獲取a這列數(shù)據(jù)的室梅?
- 前面查詢的數(shù)據(jù)和(b,c)索引一樣戏仓,到了最后a的時(shí)候,因?yàn)?b,c,a)索引中也有a這個(gè)列亡鼠,所以可以在這個(gè)索引頁中進(jìn)行for循環(huán)赏殃,加if判斷a的查詢條件,看是否滿足拆宛,判斷完畢后返回?cái)?shù)據(jù)嗓奢,因?yàn)橹苯訌乃饕龢渲蝎@取到的數(shù)據(jù),所以用到了using index覆蓋索引浑厚。
如果不建造a列索引只是建造(b,c)效果也是差不多的,查到最后a的時(shí)候通過主鍵回表效率也不是很低根盒,所以整體來說建造(b,c,a)或者(b,c)都可以钳幅,如果可以確認(rèn)語句永遠(yuǎn)只查a,b,c三列,建造(b,c,a)沒有問題炎滞,因?yàn)榭梢杂玫礁采w索引敢艰,否則為了存儲(chǔ)空間和寫入效率等,建造(b,c)就可以了册赛。
小知識(shí)
前面提到了钠导,當(dāng)聯(lián)合索引where條件,用到范圍查詢后森瘪,右側(cè)也就是后面的條件因?yàn)槭菬o序的牡属,所以無法使用索引,這句話也對(duì)扼睬,但不全對(duì)逮栅,因?yàn)榉秶樵冞€包括>=、<=窗宇、BETWEEN措伐、like。這些條件的后面军俊,有可能也會(huì)用上索引侥加,為什么這么說?
- 比如有(a,b)兩列索引粪躬,a列數(shù)據(jù)為1担败、1昔穴、2、3氢架、4傻咖,那此時(shí)可以發(fā)現(xiàn)a=1的數(shù)據(jù)有兩條,這兩條數(shù)據(jù)的b列一定是局部有序的岖研,如果我的查詢條件是a>=1 and b=?卿操,如果是a>1查詢條件,那肯定不會(huì)看到1這個(gè)結(jié)果孙援,所以無法b無法走索引害淤,但是如果是a>=1,那么結(jié)果中此時(shí)有兩個(gè)1拓售,這時(shí)候優(yōu)化器就會(huì)把他看成a=1 and b=?窥摄,因?yàn)橛邢嗤牡戎担Y(jié)果集a列的兩個(gè)1础淤。那此時(shí)b列就是有序的崭放,就能用到索引先過濾掉這一部分?jǐn)?shù)據(jù)。所以還是有可能會(huì)用到索引的鸽凶。