前一篇文章中秘血,效率成了很關(guān)鍵的問(wèn)題奶卓,比較數(shù)據(jù)庫(kù)還是需要能高效查找數(shù)據(jù)才行
那么如何解決查找問(wèn)題呢?一個(gè)很好的辦法是使用B+樹(shù)存炮,關(guān)于B+樹(shù)就不做多的介紹了炬搭,網(wǎng)上有很多
這里只貼出定義
B-樹(shù)
是一種多路搜索樹(shù)(并不是二叉的):
1.定義任意非葉子結(jié)點(diǎn)最多只有M個(gè)兒子;且M>2穆桂;
2.根結(jié)點(diǎn)的兒子數(shù)為[2, M]宫盔;
3.除根結(jié)點(diǎn)以外的非葉子結(jié)點(diǎn)的兒子數(shù)為[M/2, M];
4.每個(gè)結(jié)點(diǎn)存放至少M(fèi)/2-1(取上整)和至多M-1個(gè)關(guān)鍵字享完;(至少2個(gè)關(guān)鍵字)
5.非葉子結(jié)點(diǎn)的關(guān)鍵字個(gè)數(shù)=指向兒子的指針個(gè)數(shù)-1灼芭;
6.非葉子結(jié)點(diǎn)的關(guān)鍵字:K[1], K[2], …, K[M-1];且K[i] < K[i+1]般又;
7.非葉子結(jié)點(diǎn)的指針:P[1], P[2], …, P[M]彼绷;其中P[1]指向關(guān)鍵字小于K[1]的子樹(shù),P[M]指向關(guān)鍵字大于K[M-1]的子樹(shù)茴迁,其它P[i]指向關(guān)鍵字屬于(K[i-1], K[i])的子樹(shù)寄悯;
8.所有葉子結(jié)點(diǎn)位于同一層;
如:(M=3)
B+樹(shù)
B+樹(shù)是B-樹(shù)的變體堕义,也是一種多路搜索樹(shù):
1.其定義基本與B-樹(shù)同猜旬,除了:
2.非葉子結(jié)點(diǎn)的子樹(shù)指針與關(guān)鍵字個(gè)數(shù)相同;
3.非葉子結(jié)點(diǎn)的子樹(shù)指針P[i],指向關(guān)鍵字值屬于[K[i], K[i+1])的子樹(shù)(B-樹(shù)是開(kāi)區(qū)間)洒擦;
5.為所有葉子結(jié)點(diǎn)增加一個(gè)鏈指針椿争;
6.所有關(guān)鍵字都在葉子結(jié)點(diǎn)出現(xiàn);
至于實(shí)現(xiàn)秘遏,給出一個(gè)連接可以看看 https://github.com/begeekmyfriend/bplustree
然后數(shù)據(jù)庫(kù)的鍵是字符串型的丘薛,如果轉(zhuǎn)換出數(shù)字呢嘉竟?答案是用hash算法邦危,網(wǎng)上也有很多經(jīng)典的實(shí)現(xiàn)
這里給出Bizzard公司的經(jīng)典算法(采用了部分,不是全部)
#pragma once
#include <string>
unsigned long cryptTable[0x500];
void prepareCryptTable()
{
unsigned long seed = 0x00100001, index1 = 0, index2 = 0, i;
for (index1 = 0; index1 < 0x100; index1++)
{
for (index2 = index1, i = 0; i < 5; i++, index2 += 0x100)
{
unsigned long temp1, temp2;
seed = (seed * 125 + 3) % 0x2AAAAB;
temp1 = (seed & 0xFFFF) << 0x10;
seed = (seed * 125 + 3) % 0x2AAAAB;
temp2 = (seed & 0xFFFF);
cryptTable[index2] = (temp1 | temp2);
}
}
}
unsigned long HashString(const std::string& key, unsigned long dwHashType)
{
unsigned char *ckey = (unsigned char *)key.c_str();
unsigned long seed1 = 0x7FED7FED, seed2 = 0xEEEEEEEE;
int ch;
while (*ckey != 0)
{
ch = toupper(*ckey++);
seed1 = cryptTable[(dwHashType << 8) + ch] ^ (seed1 + seed2);
seed2 = ch + seed1 + seed2 + (seed2 << 5) + 3;
}
return seed1;
}
暴雪的源碼中是用了三次hash值來(lái)決定一個(gè)數(shù)據(jù)的舍扰,數(shù)據(jù)沖突的幾率是1: 18889465931478580854784幾乎不可能出現(xiàn)
而我們這里由于純粹的學(xué)習(xí)原理而已倦蚪,所以采用一次就行了
接下來(lái)就是要把這兩個(gè)算法整合進(jìn)數(shù)據(jù)庫(kù)中,用來(lái)代替以前的搜索算法
需要對(duì)算法進(jìn)行一定的修改
在下一張章中實(shí)現(xiàn)