一芍碧、定義
1.1 B樹的概念
在一些大規(guī)模的數(shù)據(jù)存儲中煌珊,如數(shù)據(jù)庫,分布式系統(tǒng)等泌豆,數(shù)據(jù)的訪問經(jīng)常需要進(jìn)行磁盤的讀寫操作定庵,這個時候的瓶頸主要就在于磁盤的I/O上。
如果采用普通的二叉查找樹結(jié)構(gòu)踪危,由于樹的深度過大蔬浙,會造成磁盤I/O的讀寫過于頻繁,進(jìn)而導(dǎo)致訪問效率低下(因為一般樹的一個結(jié)點對應(yīng)一個磁盤的頁贞远,所以深度越大畴博,讀取磁盤頁越頻繁)。
那么蓝仲,如何減少樹的深度俱病?
一個基本的想法就是:采用多叉樹結(jié)構(gòu)蜜唾,在結(jié)點總數(shù)一定的情況下,結(jié)點分支越多庶艾,樹的高度也就越小袁余,從而查詢的效率也就越高。從這個意義上來看咱揍,就有了B樹的結(jié)構(gòu)颖榜。
M階B樹定義:
B樹常用“階”來定義,一棵M階B樹煤裙,要么是一棵空樹或者只有一個根結(jié)點掩完,要么具有如下性質(zhì):
- 根結(jié)點的子結(jié)點數(shù)在[2,M]之間;
- 除根結(jié)點外硼砰,所有非葉子結(jié)點的子結(jié)點數(shù)范圍:
3且蓬、所有葉子結(jié)點的高度都相同。
B樹的檢索過程:
- 在當(dāng)前結(jié)點中對關(guān)鍵碼進(jìn)行二分法檢索:
①如果找到檢索關(guān)鍵碼题翰,就返回這條記錄
②如果當(dāng)前結(jié)點是葉子結(jié)點且未找到關(guān)鍵碼恶阴,則返回檢索失敗 - 否則,沿著正確的分支重復(fù)這一過程豹障。
1.2 B+樹的概念
B樹廣泛用于實現(xiàn)基于磁盤的大型系統(tǒng)中冯事。實際上,B樹幾乎從來沒有被實現(xiàn)過血公,工業(yè)上最常使用的是B樹的一個變種——B+樹昵仅。
B+樹與B樹的最大區(qū)別在于:
- B+樹只在葉子結(jié)點中存儲記錄(或存儲指向記錄的指針),而內(nèi)部結(jié)點僅存儲關(guān)鍵碼(這些關(guān)鍵碼僅僅用于引導(dǎo)檢索)累魔;
- B+樹的葉結(jié)點一般鏈接起來摔笤,形成一個雙向鏈表。
二垦写、實現(xiàn)
本文中的實現(xiàn)是B樹的一種變種吕世,類似B+樹,主要是為了說明B樹這一數(shù)據(jù)結(jié)構(gòu)梯澜。
2.1 數(shù)據(jù)結(jié)構(gòu)定義
樹結(jié)點定義:
樹結(jié)點分為兩類:
-
內(nèi)部結(jié)點
僅存儲關(guān)鍵碼值(在以內(nèi)部結(jié)點為根的子樹中寞冯,所有的鍵都大于等于與此內(nèi)部結(jié)點關(guān)聯(lián)的鍵,但小于此內(nèi)部結(jié)點中次大的鍵) -
外部結(jié)點
即葉子結(jié)點晚伙,存儲記錄(或存儲指向記錄的指針)
注意:上圖中定義了一個哨兵鍵 " * "吮龄,哨兵鍵小于任何鍵(根結(jié)點的第一個鍵是哨兵鍵,哨兵鍵所指向結(jié)點的第一個鍵也是哨兵鍵)
源碼:
// 結(jié)點定義
private static final class Node {
private int m; // 實際子結(jié)點數(shù):m ≤ M
private boolean isLeaf;
// M階B樹的每個結(jié)點含有M個指向子結(jié)點的指針(其中內(nèi)部結(jié)點的children[0]表示哨兵咆疗,小于任何鍵)
private Entry[] children = new Entry[M];
private Node(int k漓帚,boolean isLeaf) {
m = k;
this.isLeaf = isLeaf;
}
}
private static class Entry {
private Comparable key; // 鍵碼值
private final Object val; // 實際的記錄值(或指向記錄的)
private Node next;
public Entry(Comparable key, Object val, Node next) {
this.key = key;
this.val = val;
this.next = next;
}
}
2.2 基本操作的實現(xiàn)
2.2.1 查找
查找時,從根結(jié)點開始午磁,根據(jù)被查找的鍵選擇當(dāng)前結(jié)點中的適當(dāng)區(qū)間尝抖,并根據(jù)適當(dāng)?shù)逆溄訌囊粋€結(jié)點移動到下一個結(jié)點毡们。
最終,查找過程會到達(dá)葉子結(jié)點昧辽。如果被查找的鍵在葉子結(jié)點中衙熔,則命中,否則未命中搅荞。
查找源碼:
public Value get(Key key) {
if (key == null)
throw new IllegalArgumentException("argument to get() is null");
return search(root, key);
}
//在以x為根結(jié)點的樹中查找鍵值key
private Value search(Node x, Key key) {
Entry[] children = x.children;
if (x.isLeaf) { // 外部結(jié)點红氯,直接判斷是否和被查找鍵相等
for (int j = 0; j < x.m; j++) {
if (eq(key, children[j].key))
return (Value) children[j].val;
}
} else { // 內(nèi)部結(jié)點
for (int j = 0; j < x.m; j++) { // children[0]可能表示哨兵
if (j + 1 == x.m || less(key, children[j + 1].key))
return search(children[j].next, key);
}
}
return null;
}
2.2.2 插入
B樹的插入過程與查找類似。
首先進(jìn)行查找咕痛,找到待插入的葉子結(jié)點:
①如果葉子結(jié)點還沒有滿(一般“滿”是指插入后子結(jié)點數(shù)量超過M痢甘,但實際應(yīng)用中一般會定義一個百分比,元素占比超過該百分比則認(rèn)為溢出)茉贡,那么直接插入塞栅;
②如果葉子結(jié)點已經(jīng)滿了,就把它分裂成兩個結(jié)點(平均分配)腔丧,然后將新形成的右邊結(jié)點的最小鍵碼值復(fù)制一份到父結(jié)點放椰。如果父結(jié)點也滿了,則執(zhí)行相同操作悔据,直到到達(dá)根結(jié)點或引起根結(jié)點分裂(樹高增加1).
插入源碼:
public void put(Key key, Value val) {
if (key == null)
throw new IllegalArgumentException("argument key to put() is null");
Node u = insert(root, key, val);
n++;
if (u == null)
return;
// 說明根結(jié)點需要分裂
Node t = new Node(2);
t.children[0] = new Entry(root.children[0].key, null, root); //復(fù)用root
t.children[1] = new Entry(u.children[0].key, null, u);
root = t;
height++;
}
/**
* 在以x為根的B樹中插入鍵值對
* @return 若插入結(jié)點后未滿(子結(jié)點數(shù)小于M)庄敛,則返回null;否則科汗,分裂結(jié)點,返回分裂后新的右邊結(jié)點
*/
private Node insert(Node x, Key key, Value val) {
int j;
Entry t = new Entry(key, val, null);
if (x.isLeaf) { // 葉子結(jié)點绷雏,找到第一個比key大的位置j
for (j = 0; j < x.m; j++) {
if (less(key, x.children[j].key))
break;
}
} else { // 內(nèi)部結(jié)點头滔,找到key所屬的位置j
for (j = 0; j < x.m; j++) {
if ((j + 1 == x.m) || less(key, x.children[j + 1].key)) {
Node u = insert(x.children[j].next, key, val);
if (u == null)
return null;
t.key = u.children[0].key;
t.next = u;
break;
}
}
}
//右移元素,給待插入鍵留出空位
for (int i = x.m; i > j; i--)
x.children[i] = x.children[i - 1];
x.children[j] = t;
x.m++;
if (x.m < M)
return null;
else
return split(x);
}
//分裂結(jié)點涎显,返回分裂后的右邊結(jié)點
private Node split(Node h) {
Node t = new Node(M / 2);
h.m = M / 2;
for (int j = 0; j < M / 2; j++)
t.children[j] = h.children[M / 2 + j];
return t;
}
2.3 完整源碼
public class BTree<Key extends Comparable<Key>, Value> {
// max children per B-tree node = M-1
// (must be even and greater than 2)
private static final int M = 4;
private Node root; // root of the B-tree
private int height; // height of the B-tree
private int n; // number of key-value pairs in the B-tree
// helper B-tree node data type
private static final class Node {
private int m; // number of children
private Entry[] children = new Entry[M]; // the array of children
// create a node with k children
private Node(int k) {
m = k;
}
}
// internal nodes: only use key and next
// external nodes: only use key and value
private static class Entry {
private Comparable key;
private final Object val;
private Node next; // helper field to iterate over array entries
public Entry(Comparable key, Object val, Node next) {
this.key = key;
this.val = val;
this.next = next;
}
}
/**
* Initializes an empty B-tree.
*/
public BTree() {
root = new Node(0);
}
/**
* Returns true if this symbol table is empty.
*
* @return {@code true} if this symbol table is empty; {@code false}
* otherwise
*/
public boolean isEmpty() {
return size() == 0;
}
/**
* Returns the number of key-value pairs in this symbol table.
*
* @return the number of key-value pairs in this symbol table
*/
public int size() {
return n;
}
/**
* Returns the height of this B-tree (for debugging).
*
* @return the height of this B-tree
*/
public int height() {
return height;
}
/**
* Returns the value associated with the given key.
*
* @param key
* the key
* @return the value associated with the given key if the key is in the
* symbol table and {@code null} if the key is not in the symbol
* table
* @throws IllegalArgumentException
* if {@code key} is {@code null}
*/
public Value get(Key key) {
if (key == null)
throw new IllegalArgumentException("argument to get() is null");
return search(root, key, height);
}
private Value search(Node x, Key key, int ht) {
Entry[] children = x.children;
// external node
if (ht == 0) {
for (int j = 0; j < x.m; j++) {
if (eq(key, children[j].key))
return (Value) children[j].val;
}
}
// internal node
else {
for (int j = 0; j < x.m; j++) {
if (j + 1 == x.m || less(key, children[j + 1].key))
return search(children[j].next, key, ht - 1);
}
}
return null;
}
/**
* Inserts the key-value pair into the symbol table, overwriting the old
* value with the new value if the key is already in the symbol table. If
* the value is {@code null}, this effectively deletes the key from the
* symbol table.
*
* @param key
* the key
* @param val
* the value
* @throws IllegalArgumentException
* if {@code key} is {@code null}
*/
public void put(Key key, Value val) {
if (key == null)
throw new IllegalArgumentException("argument key to put() is null");
Node u = insert(root, key, val, height);
n++;
if (u == null)
return;
// need to split root
Node t = new Node(2);
t.children[0] = new Entry(root.children[0].key, null, root);
t.children[1] = new Entry(u.children[0].key, null, u);
root = t;
height++;
}
private Node insert(Node h, Key key, Value val, int ht) {
int j;
Entry t = new Entry(key, val, null);
// external node
if (ht == 0) {
for (j = 0; j < h.m; j++) {
if (less(key, h.children[j].key))
break;
}
}
// internal node
else {
for (j = 0; j < h.m; j++) {
if ((j + 1 == h.m) || less(key, h.children[j + 1].key)) {
Node u = insert(h.children[j++].next, key, val, ht - 1);
if (u == null)
return null;
t.key = u.children[0].key;
t.next = u;
break;
}
}
}
for (int i = h.m; i > j; i--)
h.children[i] = h.children[i - 1];
h.children[j] = t;
h.m++;
if (h.m < M)
return null;
else
return split(h);
}
// split node in half
private Node split(Node h) {
Node t = new Node(M / 2);
h.m = M / 2;
for (int j = 0; j < M / 2; j++)
t.children[j] = h.children[M / 2 + j];
return t;
}
/**
* Returns a string representation of this B-tree (for debugging).
*
* @return a string representation of this B-tree.
*/
public String toString() {
return toString(root, height, "") + "\n";
}
private String toString(Node h, int ht, String indent) {
StringBuilder s = new StringBuilder();
Entry[] children = h.children;
if (ht == 0) {
for (int j = 0; j < h.m; j++) {
s.append(indent + children[j].key + " " + children[j].val + "\n");
}
} else {
for (int j = 0; j < h.m; j++) {
if (j > 0)
s.append(indent + "(" + children[j].key + ")\n");
s.append(toString(children[j].next, ht - 1, indent + " "));
}
}
return s.toString();
}
// comparison functions - make Comparable instead of Key to avoid casts
private boolean less(Comparable k1, Comparable k2) {
return k1.compareTo(k2) < 0;
}
private boolean eq(Comparable k1, Comparable k2) {
return k1.compareTo(k2) == 0;
}
/**
* Unit tests the {@code BTree} data type.
*
* @param args
* the command-line arguments
*/
public static void main(String[] args) {
BTree<String, String> st = new BTree<String, String>();
st.put("www.cs.princeton.edu", "128.112.136.12");
st.put("www.cs.princeton.edu", "128.112.136.11");
st.put("www.princeton.edu", "128.112.128.15");
st.put("www.yale.edu", "130.132.143.21");
st.put("www.simpsons.com", "209.052.165.60");
st.put("www.apple.com", "17.112.152.32");
st.put("www.amazon.com", "207.171.182.16");
st.put("www.ebay.com", "66.135.192.87");
st.put("www.cnn.com", "64.236.16.20");
st.put("www.google.com", "216.239.41.99");
st.put("www.nytimes.com", "199.239.136.200");
st.put("www.microsoft.com", "207.126.99.140");
st.put("www.dell.com", "143.166.224.230");
st.put("www.slashdot.org", "66.35.250.151");
st.put("www.espn.com", "199.181.135.201");
st.put("www.weather.com", "63.111.66.11");
st.put("www.yahoo.com", "216.109.118.65");
StdOut.println("cs.princeton.edu: " + st.get("www.cs.princeton.edu"));
StdOut.println("hardvardsucks.com: " + st.get("www.harvardsucks.com"));
StdOut.println("simpsons.com: " + st.get("www.simpsons.com"));
StdOut.println("apple.com: " + st.get("www.apple.com"));
StdOut.println("ebay.com: " + st.get("www.ebay.com"));
StdOut.println("dell.com: " + st.get("www.dell.com"));
StdOut.println();
StdOut.println("size: " + st.size());
StdOut.println("height: " + st.height());
StdOut.println(st);
StdOut.println();
}
}
三坤检、性能分析
典型的B樹通常只需要訪問2-3次即可確定元素,而且根結(jié)點可以放在內(nèi)存中進(jìn)行緩存期吓,進(jìn)一步提高效率早歇。
含有N個元素的M階B樹,其一次查找或插入需要logMN ~ log(M/2)N 次探查——實際情況下基本是一個很小的常數(shù)讨勤。
對于一顆包含N個關(guān)鍵字的M階樹來說箭跳,其最大高度不會超過log(M/2)(N+1)/2
上述定理證明:
要使樹的高度最大,則每個結(jié)點應(yīng)包含盡量少的子結(jié)點潭千。
假設(shè)M階樹含有n個關(guān)鍵字谱姓,每個結(jié)點的最小子結(jié)點數(shù)為t(除根結(jié)點和葉子結(jié)點外,t=M/2)刨晴,則樹的高度為h有如下關(guān)系(從0計起):
h=0層屉来,結(jié)點數(shù)為1
h=1層路翻,結(jié)點數(shù)為2
h=2層,結(jié)點數(shù)為2t
h=3層茄靠,結(jié)點數(shù)為2t2
由于每個結(jié)點的最小子結(jié)點數(shù)為t茂契,則結(jié)點的鍵數(shù)目為t-1(根結(jié)點為1),那么:
即h ≤ logt(n+1)/2