filtered主要的功能是說明撕氧,引擎層掃描數(shù)據(jù)后根據(jù)額外的where條件過濾后剩下值的一個百分比瘤缩,因為遇到一個問題和其相關(guān),不由得想看看其估算過程伦泥,主要考慮如下
- 主要代碼5.7.22
- 主要考慮等值過濾條件
- 8.0.28帶直方圖測試
一款咖、計算方式
經(jīng)過debug發(fā)現(xiàn)其實filter的計算還是比較簡單的何暮,其主要函數(shù)為calculate_condition_filter
其中定義了幾個宏
/// Filtering effect for equalities: col1 = col2
#define COND_FILTER_EQUALITY 0.1f
/// Filtering effect for inequalities: col1 > col2
#define COND_FILTER_INEQUALITY 0.3333f
/// Filtering effect for between: col1 BETWEEN a AND b
#define COND_FILTER_BETWEEN 0.1111f
對于等值來講我們用到的就是COND_FILTER_EQUALITY宏,其固定為10%铐殃,下列的估算方式這個宏會參與計算。主要的計算流程為:
1. 每個不能使用索引的where條件計算為 max(1/n_rows,0.1)
這個部分來自Item_func_eq::get_filtering_effect跨新,這里的Item_func_eq就是每個where等值條件富腊,可以清晰看到
return fld->get_cond_filter_default_probability(rows_in_table,
COND_FILTER_EQUALITY);//0.1f
Item_field::get_cond_filter_default_probability
return std::max(static_cast<float>(1/max_distinct_values), default_filter);
2. 關(guān)于n_rows到底是什么?
那么這里有一個主要的問題,就是這個n_rows 行數(shù)到底代表的是什么域帐,首先在函數(shù)calculate_condition_filter中可以看到赘被,如下代碼
filtered*=
tab->join()->where_cond->get_filtering_effect(tab->table_ref->map(),
used_tables,
&table->tmp_set,
static_cast<double>(tab->records()));
而這里的where_cond就是我們where條件后面用不到索引的那些條件,其中行數(shù)比較關(guān)鍵為tab->records()肖揣,如果簡單看一下實際上這個值為
QEP_shared中的(執(zhí)行計劃相關(guān))
/**
Either #rows in the table or 1 for const table.
Used in optimization, and also in execution for FOUND_ROWS().
*/
ha_rows m_records;
其設(shè)置為 QEP_shared::set_records民假,
最終可以看到這個值實際上來自
tab->set_records(tab->found_records= tab->table()->file->stats.records);
其中tab->table()返回對象為TABLE結(jié)構(gòu)體,也就是我們常說的table_cache
它就是innodb底層的統(tǒng)計數(shù)據(jù)龙优,代表是表中預(yù)估的行數(shù)羊异,可在函數(shù)ha_innobase::info_low中找到如下:
n_rows = ib_table->stat_n_rows;
....
stats.records = (ha_rows) n_rows;
簡單說就是mysql.innodb_table_stats/information_schema.tables下那個表數(shù)據(jù)的行數(shù)。
3. 如果有多個條件則通過各自filter相乘的方式進(jìn)行計算彤断,得到最終值
這部分來自Item_cond_and::get_filtering_effect野舶,and 也是一個item,其下面包含2個time宰衙,代碼和注釋說明了一切
/*
Calculated as "Conjunction of independent events":
P(A and B ...) = P(A) * P(B) * ...
*/
while ((item= it++))
filtered*= item->get_filtering_effect(filter_for_table,
read_tables,
fields_to_ignore,
rows_in_table);
return filtered;
這個就不需要過多解釋了吧平道。
4. 如果最終計算的filter和1/num_rows之間判斷,取max(1/n_rows,0.1)
注意這里和前面不同供炼,第1點是每個where條件的filter不同一屋,這里是最終計算的filter
filtered 在 calculate_condition_filter 中進(jìn)行計算,如下
filtered= max(filtered, 1.0f / tab->records());
5. 如果預(yù)估掃描的行數(shù)*計算的最終filter 小于 0.05則取0.05
注意這里和前面不同袋哼,第1點是每個where條件的filter不同冀墨,這里是最終計算的filtered 在 calculate_condition_filter 中進(jìn)行計算,如下
if ((filtered * fanout) < 0.05f) //扇出
filtered= 0.05f/static_cast<float>(fanout);
總的來說先嬉,filtered幾乎是一個預(yù)估的計算轧苫,可以作為參考但是一般不會太準(zhǔn),其參與計算的元素主要包含:
- 0.1 固定寫死值
- 0.05 固定寫死的值
- 1/num_rows 也沒有過多的考慮偏移量的問題疫蔓,其中num_rows 為表的總行數(shù)
- 預(yù)估引擎層掃描的行數(shù)
這就可能導(dǎo)致執(zhí)行計劃中join 之類的選擇錯驅(qū)動表含懊,因為這種算法的過濾性完全是估算的,沒有基數(shù)等統(tǒng)計數(shù)據(jù)作為標(biāo)準(zhǔn)(或者索引下探動態(tài)統(tǒng)計dive )衅胀,當(dāng)然到了8.0可以考慮直方圖岔乔,最后我們來進(jìn)行簡單測試。下面來測試一下滚躯,看看是不是這樣計算的雏门。
二嘿歌、測試
先建立2個表如下:
mysql> create table testfil(a int,b int,c int);
Query OK, 0 rows affected (0.01 sec)
mysql> insert into testfil values(1,1,1);
Query OK, 1 row affected (0.00 sec)
mysql> insert into testfil values(2,2,2);
Query OK, 1 row affected (0.01 sec)
mysql> insert into testfil values(3,3,3);
Query OK, 1 row affected (0.01 sec)
mysql> insert into testfil values(4,4,4);
Query OK, 1 row affected (0.00 sec)
mysql> insert into testfil values(4,4,4);
Query OK, 1 row affected (0.00 sec)
mysql> insert into testfil values(4,4,4);
Query OK, 1 row affected (0.00 sec)
mysql> insert into testfil values(4,4,4);
Query OK, 1 row affected (0.01 sec)
mysql> insert into testfil values(3,3,3);
Query OK, 1 row affected (0.00 sec)
mysql> select * from testfil;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 4 | 4 | 4 |
| 4 | 4 | 4 |
| 4 | 4 | 4 |
| 4 | 4 | 4 |
| 3 | 3 | 3 |
+------+------+------+
7 rows in set (0.00 sec)
mysql> alter table testfil add key(a);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> create table testfil2 like testfil;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into testfil2 select * from testfil;
Query OK, 7 rows affected (0.01 sec)
Records: 7 Duplicates: 0 Warnings: 0
...
mysql> insert into testfil2 select * from testfil2;
Query OK, 14336 rows affected (4.91 sec)
Records: 14336 Duplicates: 0 Warnings: 0
mysql> select table_name,n_rows from mysql.innodb_table_stats where table_name in ('testfil','testfil2');
+------------+--------+
| table_name | n_rows |
+------------+--------+
| testfil | 7 |
| testfil2 | 28755 |
+------------+--------+
2 rows in set (0.09 sec)
mysql> select table_name,TABLE_ROWS from information_schema.tables where table_name in ('testfil','testfil2');
+------------+------------+
| table_name | TABLE_ROWS |
+------------+------------+
| testfil | 7 |
| testfil2 | 28755 |
+------------+------------+
2 rows in set (0.13 sec)
這里主要是建立2個表1個小表1個大表,我們分別來看茁影。
- 測試單個where條件宙帝,小表
mysql> desc select * from testfil where b=1 ;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testfil | NULL | ALL | NULL | NULL | NULL | NULL | 7 | 14.29 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
mysql> desc select * from testfil where b=4 ;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testfil | NULL | ALL | NULL | NULL | NULL | NULL | 7 | 14.29 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
這里我們明顯的看到,雖然等于4的行是比較多的募闲,等于1的行只有1行步脓,但是他們計算的filtered都是1/7左右,又因為1/7 > 0.1 因此顯示為14.29%浩螺,很明顯沒有考慮數(shù)據(jù)的傾斜度
- 測試單個where條件靴患,大表
mysql> desc select * from testfil2 where b=1 ;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | testfil2 | NULL | ALL | NULL | NULL | NULL | NULL | 28755 | 10.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> desc select * from testfil2 where b=4 ;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | testfil2 | NULL | ALL | NULL | NULL | NULL | NULL | 28755 | 10.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
這里也顯然,1/28755< 0.1要出,因此都顯示10%鸳君,不管數(shù)據(jù)偏移量。
- 測試兩個個where條件患蹂,小表
mysql> desc select * from testfil where b=1 and c=1;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testfil | NULL | ALL | NULL | NULL | NULL | NULL | 7 | 14.29 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
mysql> desc select * from testfil where b=4 and c=4;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testfil | NULL | ALL | NULL | NULL | NULL | NULL | 7 | 14.29 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
這里依舊顯示14.29%或颊,因為filter初始化計算出來為(1/7)*(1/7),根據(jù)第4點小于1/7,依舊算1/7,也就是14.29%
DEBUG如下
1416 filtered= max(filtered, 1.0f / tab->records());
(gdb) p filtered
$1 = 0.0204081647 ((1/7)*(1/7))
(gdb) n
1428 if ((filtered * fanout) < 0.05f)
(gdb) p filtered
$2 = 0.142857149
依舊不考慮數(shù)據(jù)切斜度况脆。
- 測試兩個where條件饭宾,大表
mysql> desc select * from testfil2 where b=1 and c=1;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | testfil2 | NULL | ALL | NULL | NULL | NULL | NULL | 28755 | 1.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
mysql> desc select * from testfil2 where b=4 and c=4;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | testfil2 | NULL | ALL | NULL | NULL | NULL | NULL | 28755 | 1.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
這里很顯然就是兩個0.1相乘,0.1*0.1=0.01 也就是1%格了,而0.01 > 1/28755 看铆,因此直接顯示
- 測試帶索引定位數(shù)據(jù)掃描和兩個where條件
這種情況主要是要測試最后一點就是第5點的0.05測試出來,建立的表如下:
mysql> show create table testauto \G
*************************** 1. row ***************************
Table: testauto
Create Table: CREATE TABLE `testauto` (
`a` int(11) NOT NULL,
`b` int(11) DEFAULT NULL,
`c` int(11) DEFAULT NULL,
KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)
mysql> select count(*) from testauto;
+----------+
| count(*) |
+----------+
| 16384 |
+----------+
1 row in set (0.77 sec)
mysql> select table_name,n_rows from mysql.innodb_table_stats where table_name in ('testauto');
+------------+--------+
| table_name | n_rows |
+------------+--------+
| testauto | 16188 |
+------------+--------+
1 row in set (0.10 sec)
這里字段a有索引盛末,且里面的數(shù)字是唯一值弹惦,那么如下語句
mysql> desc select * from testauto where a=100 and b=1 and c=1;
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | testauto | NULL | ref | a | a | 4 | const | 1 | 5.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
我們看到filtered為5%,也就是第5點生效悄但,因為 rows=1棠隐,計算出來的filter應(yīng)該為0.01,而1*0.01< 0.05 所以直接置為0.05檐嚣,也就是5%助泽。看下DEBUG
1416 filtered= max(filtered, 1.0f / tab->records());
(gdb)
1428 if ((filtered * fanout) < 0.05f)
(gdb) p filtered
$3 = 0.0100000007
(gdb) p fanout
$4 = 1
(gdb) n
1429 filtered= 0.05f/static_cast<float>(fanout);
(gdb) n
1433 bitmap_clear_all(&table->tmp_set);
(gdb) p filtered
$5 = 0.0500000007
- join測試
mysql> desc select * from testauto a,testfil2 b where a.a=b.a and a.b=1 and a.c=1;
+----+-------------+-------+------------+------+---------------+------+---------+--------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+--------+-------+----------+-------------+
| 1 | SIMPLE | a | NULL | ALL | a | NULL | NULL | NULL | 16188 | 1.00 | Using where |
| 1 | SIMPLE | b | NULL | ref | a | a | 5 | tp.a.a | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+--------+-------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
mysql> desc select * from testauto a,testfil2 b where a.a=b.a and a.b=1 and a.c=1 and a.a=1;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | a | NULL | ref | a | a | 4 | const | 1 | 5.00 | Using where |
| 1 | SIMPLE | b | NULL | ref | a | a | 5 | const | 4096 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
mysql> desc select * from testauto a,testfil2 b where a.a=b.a and a.b=1 and a.c=1 and a.a=1 and b.c=1;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | a | NULL | ref | a | a | 4 | const | 1 | 5.00 | Using where |
| 1 | SIMPLE | b | NULL | ref | a | a | 5 | const | 4096 | 10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
2 rows in set, 1 warning (0.01 sec)
mysql> desc select * from testauto a,testfil2 b where a.a=b.a and a.b=1 and a.c=1 and a.a=1 and b.c=1 and b.b=1;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | a | NULL | ref | a | a | 4 | const | 1 | 5.00 | Using where |
| 1 | SIMPLE | b | NULL | ref | a | a | 5 | const | 4096 | 1.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
這里過多分析了嚎京,實際上join中依舊生效嗡贺,只是a.a=b.a關(guān)聯(lián)條件要作為被驅(qū)動表的一個where條件來考慮。
三鞍帝、8.0 簡單測試(8.0.28)
依舊使用上面的表诫睬,進(jìn)行測試
- 不帶直方圖
mysql> desc select * from testfil where b=4 and c=4;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testfil | NULL | ALL | NULL | NULL | NULL | NULL | 7 | 14.29 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> desc select * from testfil2 where b=4 and c=4;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | testfil2 | NULL | ALL | NULL | NULL | NULL | NULL | 86265 | 1.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
依舊不考慮數(shù)據(jù)的傾斜度。
- 收集直方圖后執(zhí)行
我這里不同的數(shù)據(jù)很少帕涌,使用等寬直方圖就可以了摄凡,如下:
mysql> ANALYZE TABLE testfil UPDATE HISTOGRAM ON b,c WITH 10 BUCKETS;
+----------------+-----------+----------+----------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+----------------+-----------+----------+----------------------------------------------+
| mytest.testfil | histogram | status | Histogram statistics created for column 'b'. |
| mytest.testfil | histogram | status | Histogram statistics created for column 'c'. |
+----------------+-----------+----------+----------------------------------------------+
2 rows in set (0.04 sec)
mysql> ANALYZE TABLE testfil2 UPDATE HISTOGRAM ON b,c WITH 10 BUCKETS;
+-----------------+-----------+----------+----------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------+-----------+----------+----------------------------------------------+
| mytest.testfil2 | histogram | status | Histogram statistics created for column 'b'. |
| mytest.testfil2 | histogram | status | Histogram statistics created for column 'c'. |
+-----------------+-----------+----------+----------------------------------------------+
2 rows in set (5.21 sec)
mysql> desc select * from testfil where b=4 and c=4;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testfil | NULL | ALL | NULL | NULL | NULL | NULL | 7 | 32.65 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> desc select * from testfil2 where b=4 and c=4;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | testfil2 | NULL | ALL | NULL | NULL | NULL | NULL | 86265 | 32.65 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql>
我們可以看到直方圖確實影響了filtered的計算续徽,也更加準(zhǔn)確了,但是8.0的直方圖默認(rèn)沒有收集亲澡,因此使用不多钦扭,我的了解也比較有限。如果不使用直方圖filter還是5.7的計算方式床绪。
附錄:使用的BUG點
1 breakpoint keep y 0x0000000000eb13ac in main(int, char**) at /opt/percona-server-locks-detail-5.7.22/sql/main.cc:25
breakpoint already hit 1 time
2 breakpoint keep y 0x0000000001727699 in Explain_join::explain_rows_and_filtered() at /opt/percona-server-locks-detail-5.7.22/sql/opt_explain.cc:1480
breakpoint already hit 14 times
3 breakpoint keep y 0x0000000001589411 in calculate_condition_filter(JOIN_TAB const*, Key_use const*, unsigned long long, double, bool)
at /opt/percona-server-locks-detail-5.7.22/sql/sql_planner.cc:1219
breakpoint already hit 26 times
5 breakpoint keep y 0x0000000000fadb41 in Item_equal::get_filtering_effect(unsigned long long, unsigned long long, st_bitmap const*, double)
at /opt/percona-server-locks-detail-5.7.22/sql/item_cmpfunc.cc:7478
breakpoint already hit 4 times
6 breakpoint keep y 0x0000000000f83dc2 in Item_field::get_cond_filter_default_probability(double, float) const at /opt/percona-server-locks-detail-5.7.22/sql/item.cc:7905
breakpoint already hit 3 times