Indexing Basics
索引類型
B-TREE 索引
InnoDB使用的即是B-TREE索引峦嗤。
存儲(chǔ)引擎以不同的方式使用B-TREE索引,性能也各有不同。例如吧秕,MyISAM使用前綴壓縮技術(shù)使得索引更小,但I(xiàn)nnoDB則按照原數(shù)據(jù)格式進(jìn)行存儲(chǔ)迹炼。再如MyISAM索引通過數(shù)據(jù)的物理位置引用被索引的行砸彬,而InnoDB則根據(jù)主鍵引用被索引的行。
B-TREE對(duì)索引列是順序組織存儲(chǔ)的斯入,所以很適合查找范圍數(shù)據(jù)砂碉。下圖為B-TREE索引的抽象表示:
B-TREE索引能夠加快數(shù)據(jù)訪問的速度,因?yàn)榇鎯?chǔ)引擎不再需要進(jìn)行全表掃描獲取需要的數(shù)據(jù)刻两。
假設(shè)有下面這個(gè)數(shù)據(jù)表:
CREATE TABLE People (
last_name varchar(50) ,
first_name varchar(50) dob date,
not null, not null, not null,
gender enum('m', 'f')not null,
key(last_name, first_name, dob) );
The index will contain the values from the last_name, first_name, and dob columns for every row in the table.
!!!索引對(duì)多個(gè)值進(jìn)行排序的依據(jù)是CREATE TABLE語(yǔ)句中定義索引時(shí)的順序增蹭。
Types of queries that can use a B-Tree index. B-Tree indexes work well for lookups by the full key value
, a key range
, or a key prefix
. They are useful only if the lookup uses a leftmost prefix
of the index.
Here are some limitations of B-Tree indexes:
- They are not useful if the lookup
does not start from the leftmost side of the indexed columns
. For example, this index won’t help you find all people named Bill or all people born on a certain date, because those columns are not leftmost in the index. Likewise, you can’t use the index to find people whose last name ends with a par- ticular letter. - You
can’t skip columns in the index
. That is, you won’t be able to find all people whose last name is Smith and who were born on a particular date. If you don’t specify a value for the first_name column, MySQL can use only the first column of the index. - The storage engine can’t optimize accesses with any columns to
the right of the first range condition
. For example, if your query is WHERE last_name="Smith" AND first_name LIKE 'J%' AND dob='1976-12-23', the index access will use only the first two columns in the index, because the LIKE is a range condition (the server can use the rest of the columns for other purposes, though). For a column that has a limited number of values, you can often work around this by specifying equality conditions instead of range conditions. We show detailed examples of this in the indexing case study later in this chapter.
所以索引列的順序是非常的重要,這些限制都和索引列的順序有關(guān)磅摹。在優(yōu)化的時(shí)候可以使用相同的列但順序不同的索引來滿足快速查詢的需求滋迈。
哈希索引
A hash index is built on a hash table and is useful only for exact lookups that use every column in the index
.
In MySQL, only the Memory storage engine
supports explicit hash indexes. They are the default index type for Memory tables, though Memory tables can have B-Tree indexes, too.
Building your own hash indexes. If your storage engine doesn’t support hash indexes, you can emulate them yourself in a manner similar to that InnoDB uses. This will give you access to some of the desirable properties of hash indexes, such as a very small index size for very long keys
.
An example of when this approach works well is for URL lookups. URLs generally cause B-Tree indexes to become huge, because they’re very long. You’d normally query a table of URLs like this:
mysql> SELECT id FROM url WHERE url="http://www.mysql.com";
But if you remove the index on the url column and add an indexed url_crc column to the table, you can use a query like this:
mysql> SELECT id FROM url WHERE url="http://www.mysql.com" -> AND url_crc=CRC32("http://www.mysql.com");
CRC32 is one of hash functions based on on the "polynomial" division idea. The CRC is acronym for Cyclic Redundancy Code (other variants instead "Code" is "Check" and "Checksum") algorithm. The number 32 is specifying the size of resulting hash value (checksum) - 32 bits. The checksum is used to detect errors after transmission or storage of any piece of information.
This works well because the MySQL query optimizer notices there’s a small, highly selective index on the url_crc column and does an index lookup for entries with that value (1560514994, in this case). Even if several rows have the same url_crc value, it’s very easy to find these rows with a fast integer comparison and then examine them to find the one that matches the full URL exactly. The alternative is to index the full URL as a string, which is much slower.
這樣做的一個(gè)缺陷是需要維護(hù)hash值霎奢,可以通過使用觸發(fā)器來較好的解決這個(gè)問題。
If you use this approach, you should not use SHA1()
or MD5()
hash functions. These return very long strings
, which waste a lot of space and result in slower comparisons. They are cryptographically strong functions designed to virtually eliminate collisions, which is not your goal here. Simple hash functions can offer acceptable collision rates with better performance.
If your table has many rows and CRC32() gives too many collisions
, implement your own 64-bit hash function. Make sure you use a function that returns an integer, not a string. One way to implement a 64-bit hash function is to use just part of the value returned by MD5()
.
空間數(shù)據(jù)索引(R-Tree)
全文索引
索引的優(yōu)點(diǎn)
Three main benefits proceed from these properties:
- Indexes reduce the amount of data the server has to examine.
- Indexes help the server
avoid sorting
andtemporary tables
. - Indexes turn
random I/O
intosequential I/O
.
高性能索引策略
獨(dú)立的列
“Isolating” the column means it should not
be part of an expression or be inside a function in the query.如果查詢中得列不是獨(dú)立的饼灿,MySQL就不會(huì)使用索引椰憋。
前綴索引和索引選擇性
Index selectivity is the ratio of the number of distinct indexed values (the cardinality) to the total number of rows in the table (#T), and ranges from 1/#T to 1.
A prefix of the column is often selective enough to give good performance. If you’re indexing BLOB or TEXT columns, or very long VARCHAR columns, you must define prefix indexes, because MySQL disallows indexing their full length.
創(chuàng)建前綴索引
mysql> ALTER TABLE tableName ADD KEY (colName(len));
前綴索引的一個(gè)缺點(diǎn)是MySQL無法用其做orderBy和groupBy。
選擇合適的索引列順序
選擇索引列順序的經(jīng)驗(yàn)法則: 將選擇性最高的列放在前面赔退。
聚族索引
聚族索引并不是一種單獨(dú)的索引類型橙依,而是一種數(shù)據(jù)存儲(chǔ)方式。
InnoDB將通過主鍵聚集數(shù)據(jù)硕旗。
覆蓋索引
An index that contains (or “covers”) all the data needed to satisfy a query is called a covering index
.
使用索引掃描來做排序
壓縮(前綴壓縮)索引
MySQL需要單獨(dú)維護(hù)重復(fù)的索引窗骑,所以有性能上的考慮。重復(fù)索引是指在相同的列上
按照相同的順序
創(chuàng)建的相同類型
的索引漆枚。