1. 背景
當(dāng)有大量數(shù)據(jù)儲(chǔ)存在磁盤(pán)時(shí),如數(shù)據(jù)庫(kù)的查找,插入, 刪除等操作的實(shí)現(xiàn), 如果要讀取或者寫(xiě)入, 磁盤(pán)的尋道, 旋轉(zhuǎn)時(shí)間很長(zhǎng), 遠(yuǎn)大于在 內(nèi)存中的讀取,寫(xiě)入時(shí)間.
平時(shí)用的二叉排序樹(shù)搜索元素的時(shí)間復(fù)雜度雖然是 的, 但是底數(shù)還是太小, 樹(shù)高太高.
所以就出現(xiàn)了 B 樹(shù)(英文為B-Tree, 不是B減樹(shù)), 可以理解為多叉排序樹(shù). 一個(gè)結(jié)點(diǎn)可以有多個(gè)孩子, 于是增大了底數(shù), 減小了高度, 雖然比較的次數(shù)多(關(guān)鍵字?jǐn)?shù)多), 但是由于是在內(nèi)存中比較, 相較于磁盤(pán)的讀取還是很快的.
2. 定義
度為 d(degree)的 B 樹(shù)(階(order) 為 2d) 定義如下,
每個(gè)結(jié)點(diǎn)中包含有 n 個(gè)關(guān)鍵字信息: 想许。其中:
a) 為關(guān)鍵字,且關(guān)鍵字按順序升序排序
b) 為指向子樹(shù)根的接點(diǎn),
c) 關(guān)鍵字的數(shù) n 滿(mǎn)足(由此也確定了孩子結(jié)點(diǎn)的個(gè)數(shù)): (根節(jié)點(diǎn)可以少于d-1)樹(shù)中每個(gè)結(jié)點(diǎn)最多含有 2d個(gè)孩子(d>=2);
除根結(jié)點(diǎn)和葉子結(jié)點(diǎn)外,其它每個(gè)結(jié)點(diǎn)至少有 d個(gè)孩子错妖;
若根結(jié)點(diǎn)不是葉子結(jié)點(diǎn),則至少有 2 個(gè)孩子(特殊情況:沒(méi)有孩子的根結(jié)點(diǎn),即根結(jié)點(diǎn)為葉子結(jié)點(diǎn),整棵樹(shù)只有一個(gè)根節(jié)點(diǎn))八拱;
所有葉子結(jié)點(diǎn)都出現(xiàn)在同一層,葉子節(jié)點(diǎn)沒(méi)有孩子和指向孩子的指針
性質(zhì):
如下是 度為2的 B 樹(shù), 每個(gè)結(jié)點(diǎn)可能有2,3或4 個(gè)孩子, 所以也叫 2,3,4樹(shù), 等價(jià)于紅黑樹(shù)
3. 查找操作
可以看成二叉排序樹(shù)的擴(kuò)展,二叉排序樹(shù)是二路查找,B - 樹(shù)是多路查找。
節(jié)點(diǎn)內(nèi)進(jìn)行查找的時(shí)候除了順序查找之外,還可以用二分查找來(lái)提高效率骇塘。
下面是順序查找的 python 代碼
def search(self,key,withpath=False):
nd = self.root
fathers = []
while True:
i = nd.findKey(key)
if i==len(nd): fathers.append((nd,i-1,i))
else: fathers.append((nd,i,i))
if i<len(nd) and nd[i]==key:
if withpath:return nd,i,fathers
else:return nd,i
if nd.isLeafNode():
if withpath:return None,None,None
else:return None,None
nd = nd.getChd(i)
我實(shí)現(xiàn)時(shí)讓 fathers 記錄查找的路徑, 方便在實(shí)現(xiàn) delete 操作時(shí)使用(雖然有種 delete 方法可以不需要, 直接 from up to down with no pass by),
4. 插入操作
自頂向下地進(jìn)行插入操作, 最終插入在葉子結(jié)點(diǎn),
考慮到葉子結(jié)點(diǎn)如果有 2t-1 個(gè) 關(guān)鍵字, 則需要進(jìn)行分裂,
一個(gè)有 2t-1個(gè)關(guān)鍵字 結(jié)點(diǎn)分裂是這樣進(jìn)行的: 此結(jié)點(diǎn)分裂為 兩個(gè)關(guān)鍵字為 t-1個(gè)的結(jié)點(diǎn), 分別為 , , 然后再插入一個(gè)關(guān)鍵字到父親結(jié)點(diǎn).
注意同時(shí)要將孩子指針移動(dòng)正確.
所以自頂向下地查找到葉子結(jié)點(diǎn), 中間遇到 2t-1個(gè)關(guān)鍵字的結(jié)點(diǎn)就進(jìn)行分裂, 這樣如果其子結(jié)點(diǎn)進(jìn)行分裂, 上升來(lái)的一個(gè)關(guān)鍵字可以插入到父結(jié)點(diǎn)而不會(huì)超過(guò)2t-1
代碼如下
def insert(self,key):
if len(self.root)== self.degree*2-1:
self.root = self.root.split(node(isLeaf=False),self.degree)
self.nodeNum +=2
nd = self.root
while True:
idx = nd.findKey(key)
if idx<len(nd) and nd[idx] == key:return
if nd.isLeafNode():
nd.insert(idx,key)
self.keyNum+=1
return
else:
chd = nd.getChd(idx)
if len(chd)== self.degree*2-1: #ensure its keys won't excess when its chd split and u
nd = chd.split(nd,self.degree)
self.nodeNum +=1
else:
nd = chd
5. 刪除操作
刪除操作是有點(diǎn)麻煩的, 有兩種方法[1]
- Locate and delete the item, then restructure the tree to retain its invariants, OR
- Do a single pass down the tree, but before entering (visiting) a node, restructure the tree so that once the key to be deleted is encountered, it can be deleted without triggering the need for any further restructuring
5.1. 第一種方法
有如下情況
- 刪除結(jié)點(diǎn)在葉子結(jié)點(diǎn)上
結(jié)點(diǎn)內(nèi)的關(guān)鍵字個(gè)數(shù)大于d-1,可以直接刪除(大于關(guān)鍵字個(gè)數(shù)下限,刪除不影響 B - 樹(shù)特性)
-
結(jié)點(diǎn)內(nèi)的關(guān)鍵字個(gè)數(shù)等于d-1(等于關(guān)鍵字個(gè)數(shù)下限,刪除后將破壞 特性),此時(shí)需觀(guān)察該節(jié)點(diǎn)左右兄弟結(jié)點(diǎn)的關(guān)鍵字個(gè)數(shù):
a. 旋轉(zhuǎn): 如果其左右兄弟結(jié)點(diǎn)中存在關(guān)鍵字個(gè)數(shù)大于d-1 的結(jié)點(diǎn),則從關(guān)鍵字個(gè)數(shù)大于 d-1 的兄弟結(jié)點(diǎn)中借關(guān)鍵字:(這里看了網(wǎng)上的很多說(shuō)法, 都是在介紹關(guān)鍵字的操作,而沒(méi)有提到孩子結(jié)點(diǎn). 我實(shí)現(xiàn)的時(shí)候想了很久才想出來(lái): 借關(guān)鍵字時(shí), 比如從右兄弟借一個(gè)關(guān)鍵字(第一個(gè)), 此時(shí)即為左旋, 將父親結(jié)點(diǎn)對(duì)應(yīng)關(guān)鍵字移到當(dāng)前結(jié)點(diǎn), 再將右兄弟的移動(dòng)父親結(jié)點(diǎn)(因?yàn)橐獫M(mǎn)足排序性質(zhì), 類(lèi)似二叉樹(shù)的選擇) 然后進(jìn)行孩子操作, 將右兄弟的 插入到 當(dāng)前結(jié)點(diǎn)的孩子指針末尾) 左兄弟類(lèi)似, <mark>而且要注意到邊界條件, 比如當(dāng)前結(jié)點(diǎn)是第0個(gè)/最后一個(gè)孩子, 則沒(méi)有 左兄弟/右兄弟</mark>)b. 合并: 如果其左右兄弟結(jié)點(diǎn)中不存在關(guān)鍵字個(gè)數(shù)大于 t-1 的結(jié)點(diǎn),進(jìn)行結(jié)點(diǎn)合并:將其父結(jié)點(diǎn)中的關(guān)鍵字拿到下一層,與該節(jié)點(diǎn)的左右兄弟結(jié)點(diǎn)的所有關(guān)鍵字合并
<mark>同樣要注意到邊界條件, 比如當(dāng)前結(jié)點(diǎn)是第0個(gè)/最后一個(gè)孩子, 則沒(méi)有 左兄弟/右兄弟</mark> 自底向上地檢查來(lái)到這個(gè)葉子結(jié)點(diǎn)的路徑上的結(jié)點(diǎn)是否滿(mǎn)足關(guān)鍵字?jǐn)?shù)目的要求, 只要關(guān)鍵字少于d-1,則進(jìn)行旋轉(zhuǎn)(2a)或者合并(2b)操作
- 刪除結(jié)點(diǎn)在非葉子結(jié)點(diǎn)上
- 查到到該結(jié)點(diǎn), 然后轉(zhuǎn)化成 上述 葉子結(jié)點(diǎn)中情況
- 轉(zhuǎn)化過(guò)程:
a. 找到相鄰關(guān)鍵字:即需刪除關(guān)鍵字的左子樹(shù)中的最大關(guān)鍵字或右子樹(shù)中的最小關(guān)鍵字
b. 用相鄰關(guān)鍵字來(lái)覆蓋需刪除的非葉子節(jié)點(diǎn)關(guān)鍵字,再刪除原相鄰關(guān)鍵字(在;葉子上,這即為上述情況)遥缕。
python 代碼如下, delete
函數(shù)中, 查找到結(jié)點(diǎn), 用 fathers::[(父節(jié)點(diǎn), 關(guān)鍵字指針, 孩子指針)]
記錄路徑, 如果不是葉子結(jié)點(diǎn), 就再進(jìn)行查找, 并記錄結(jié)點(diǎn), 轉(zhuǎn)換關(guān)鍵字.
rebalance 就是從葉子結(jié)點(diǎn)自底向上到根結(jié)點(diǎn), 只要遇到關(guān)鍵字?jǐn)?shù)少于 2d-1 的,就進(jìn)行平衡操作(旋轉(zhuǎn), 合并)
實(shí)現(xiàn)時(shí)要很仔細(xì), 考慮邊界條件, 還有當(dāng)是左孩子的時(shí)候操作的是父結(jié)點(diǎn)的 chdIdx 的前一個(gè), 是右孩子的時(shí)候是 chdIdx 的關(guān)鍵字. 具體實(shí)現(xiàn)完整代碼見(jiàn)文末.
def delete(self,key):#to do
'''search the key, delete it , and form down to up to rebalance it '''
nd,idx ,fathers= self.search(key,withpath=True)
if nd is None : return
del nd[idx]
self.keyNum-=1
if not nd.isLeafNode():
chd = nd.getChd(idx) # find the predecessor key
while not chd.isLeafNode():
fathers.append((chd,len(chd)-1,len(chd)))
chd = chd.getChd(-1)
fathers.append((chd,len(chd)-1,len(chd)))
nd.insert(idx,chd[-1])
del chd[-1]
if len(fathers)>1:self.rebalance(fathers)
def rebalance(self,fathers):
nd,keyIdx,chdIdx = fathers.pop()
while len(nd)<self.degree-1: # rebalance tree from down to up
prt,keyIdx,chdIdx = fathers[-1]
lbro = [] if chdIdx==0 else prt.getChd(chdIdx-1)
rbro = [] if chdIdx==len(prt) else prt.getChd(chdIdx+1)
if len(lbro)<self.degree and len(rbro)<self.degree: # merge two deficient nodes
beforeNode,afterNode = None,None
if lbro ==[]:
keyIdx = chdIdx
beforeNode,afterNode = nd,rbro
else:
beforeNode,afterNode = lbro,nd
keyIdx = chdIdx-1 # important, when choosing
keys = beforeNode[:]+[prt[keyIdx]]+afterNode[:]
children = beforeNode.getChildren() + afterNode.getChildren()
isLeaf = beforeNode.isLeafNode()
prt.delChd(keyIdx+1)
del prt[keyIdx]
nd.update(keys,isLeaf,children)
prt.children[keyIdx]=nd
self.nodeNum -=1
elif len(lbro)>=self.degree: # rotate when only one sibling is deficient
keyIdx = chdIdx-1
nd.insert(0,prt[keyIdx]) # rotate keys
prt[keyIdx] = lbro[-1]
del lbro[-1]
if not nd.isLeafNode(): # if not leaf, move children
nd.insert(0,nd=lbro.getChd(-1))
lbro.delChd(-1)
else:
keyIdx = chdIdx
nd.insert(len(nd),prt[keyIdx]) # rotate keys
prt[keyIdx] = rbro[0]
del rbro[0]
if not nd.isLeafNode(): # if not leaf, move children
#note that insert(-1,ele) will make the ele be the last second one
nd.insert(len(nd),nd=rbro.getChd(0))
rbro.delChd(0)
if len(fathers)==1:
if len(self.root)==0:
self.root = nd
self.nodeNum -=1
break
nd,i,j = fathers.pop()
5.2. 第二種方法
這是算法導(dǎo)論[2]上的
例如
B-TREE-DELETE(T,k)
1 r ← root[T]
2 if n[r] = 1
3 then DISK_READ(c1[r])
4 DISK_READ(c2[r])
5 y ←c1[r]
6 z ←c2[r]
7 if n[y] = n[z] = t-1 ? Cases 2c or 3b
8 then B-TREE-MERGE-CHILD(r, 1, y, z)
9 root[T] ← y
10 FREE-NODE(r)
11 B-TREE-DELETE-NONONE(y, k)
12 else B-TREE-DELETE-NONONE (r, k)
13 else B-TREE-DELETE-NONONE (r, k)
考慮到根結(jié)點(diǎn)的特殊性,對(duì)根結(jié)點(diǎn)為1,并且兩個(gè)子結(jié)點(diǎn)都是t-1的情況進(jìn)行了特殊的處理:
先對(duì)兩個(gè)子結(jié)點(diǎn)進(jìn)行合并,然后把原來(lái)的根刪除,把樹(shù)根指向合并后的子結(jié)點(diǎn)y蓖宦。
這樣B樹(shù)的高度就減少了1。這也是B樹(shù)高度唯一會(huì)減少的情況茄厘。
除了這種情況以外,就直接調(diào)用子過(guò)程 B-TREE-DELETE-NONONE (x, k)矮冬。
B-TREE-DELETE-NONONE (x, k)
1 i ← 1
2 if leaf[x] ? Cases 1
3 then while i <= n[x] and k > keyi[x]
4 do i ← i + 1
5 if k = keyi[x]
6 then for j ← i+1 to n[x]
7 do keyj-1[x] ←keyj[x]
8 n[x] ← n[x] - 1
9 DISK-WRITE(x)
10 else error:”the key does not exist”
11 else while i <= n[x] and k > keyi[x]
12 do i ← i + 1
13 DISK-READ(ci[x])
14 y ←ci[x]
15 if i <= n[x]
16 then DISK-READ(ci+1[x])
17 z ←ci+1[x]
18 if k = keyi[x] ? Cases 2
19 then if n[y] > t-1 ? Cases 2a
20 then k′←B-TREE-SEARCH-PREDECESSOR(y)
21 B-TREE-DELETE-NONONE (y, k′)
22 keyi[x] ←k′
23 else if n[z] > t-1 ? Cases 2b
24 then k′←B-TREE-SEARCH-SUCCESSOR (z)
25 B-TREE-DELETE-NONONE (z, k′)
26 keyi[x] ←k′
27 else B-TREE-MERGE-CHILD(x, i, y, z)? Cases 2c
28 B-TREE-DELETE-NONONE (y, k)
29 else ? Cases 3
30 if i >1
31 then DISK-READ(ci-1[x])
32 p ←ci-1[x]
33 if n[y] = t-1
34 then if i>1 and n[p] >t-1 ? Cases 3a
35 then B-TREE-SHIFT-TO-RIGHT-CHILD(x,i,p,y)
36 else if i <= n[x] and n[z] > t-1 ? Cases 3a
37 then B-TREE-SHIFT-TO-LEFT-CHILD(x,i,y,z)
38 else if i>1 ? Cases 3b
39 then B-TREE-MERGE-CHILD(x, i, p, y)
40 y ← p
41 else B-TREE-MERGE-CHILD(x, i, y, z)? Cases 3b
42 B-TREE-DELETE-NONONE (y, k)
轉(zhuǎn)移到右邊的子結(jié)點(diǎn)
B-TREE-SHIFT-TO-RIGHT-CHILD(x,i,y,z)
1 n[z] ← n[z] +1
2 j ← n[z]
3 while j > 1
4 do keyj[z] ←keyj-1[z]
5 j ← j -1
6 key1[z] ←keyi[x]
7 keyi[x] ←keyn[y][y]
8 if not leaf[z]
9 then j ← n[z]
10 while j > 0
11 do cj+1[z] ←cj[z]
12 j ← j -1
13 c1[z] ←cn[y]+1[y]
14 n[y] ← n[y] -1
15 DISK-WRITE(y)
16 DISK-WRITE(z)
17 DISK-WRITE(x)
轉(zhuǎn)移到左邊的子結(jié)點(diǎn)
B-TREE-SHIFT-TO-LEFT-CHILD(x,i,y,z)
1 n[y] ← n[y] +1
2 keyn[y][y] ← keyi[x]
3 keyi[x] ←key1[z]
4 n[z] ← n[z] -1
5 j ← 1
6 while j <= n[z]
7 do keyj[z] ←keyj+1[z]
8 j ← j +1
9 if not leaf[z]
10 then cn[y]+1[y] ←c1[z]
11 j ← 1
12 while j <= n[z]+1
13 do cj[z] ←cj+1[z]
14 j ← j + 1
15 DISK-WRITE(y)
16 DISK-WRITE(z)
17 DISK-WRITE(x)
6. B+樹(shù)
B+ 樹(shù)[3]是 B- 樹(shù)的變體,與B樹(shù)不同的地方在于:
- 非葉子結(jié)點(diǎn)的子樹(shù)指針與關(guān)鍵字個(gè)數(shù)相同;
- 非葉子結(jié)點(diǎn)的子樹(shù)指針 指向關(guān)鍵字值屬于 的子樹(shù)(B- 樹(shù)是開(kāi)區(qū)間)次哈;
- 為所有葉子結(jié)點(diǎn)增加一個(gè)鏈指針胎署;
- 所有關(guān)鍵字都在葉子結(jié)點(diǎn)出現(xiàn)
B+ 的搜索與 B- 樹(shù)也基本相同,區(qū)別是 B+ 樹(shù)只有達(dá)到葉子結(jié)點(diǎn)才命中(B- 樹(shù)可以在非葉子結(jié)點(diǎn)命中),其性能也等價(jià)于在關(guān)鍵字全集做一次二分查找;
下面摘自 wiki[4]
查找
查找以典型的方式進(jìn)行,類(lèi)似于二叉查找樹(shù)窑滞。起始于根節(jié)點(diǎn),自頂向下遍歷樹(shù),選擇其分離值在要查找值的任意一邊的子指針琼牧。在節(jié)點(diǎn)內(nèi)部典型的使用是二分查找來(lái)確定這個(gè)位置恢筝。
插入
節(jié)點(diǎn)要處于違規(guī)狀態(tài),它必須包含在可接受范圍之外數(shù)目的元素。
- 首先,查找要插入其中的節(jié)點(diǎn)的位置巨坊。接著把值插入這個(gè)節(jié)點(diǎn)中撬槽。
- 如果沒(méi)有節(jié)點(diǎn)處于違規(guī)狀態(tài)則處理結(jié)束。
- 如果某個(gè)節(jié)點(diǎn)有過(guò)多元素,則把它分裂為兩個(gè)節(jié)點(diǎn),每個(gè)都有最小數(shù)目的元素趾撵。在樹(shù)上遞歸向上繼續(xù)這個(gè)處理直到到達(dá)根節(jié)點(diǎn),如果根節(jié)點(diǎn)被分裂,則創(chuàng)建一個(gè)新根節(jié)點(diǎn)恢氯。為了使它工作,元素的最小和最大數(shù)目典型的必須選擇為使最小數(shù)不小于最大數(shù)的一半。
刪除
- 首先,查找要?jiǎng)h除的值鼓寺。接著從包含它的節(jié)點(diǎn)中刪除這個(gè)值妈候。
- 如果沒(méi)有節(jié)點(diǎn)處于違規(guī)狀態(tài)則處理結(jié)束苦银。
- 如果節(jié)點(diǎn)處于違規(guī)狀態(tài)則有兩種可能情況:
- 它的兄弟節(jié)點(diǎn),就是同一個(gè)父節(jié)點(diǎn)的子節(jié)點(diǎn),可以把一個(gè)或多個(gè)它的子節(jié)點(diǎn)轉(zhuǎn)移到當(dāng)前節(jié)點(diǎn),而把它返回為合法狀態(tài)纺念。如果是這樣,在更改父節(jié)點(diǎn)和兩個(gè)兄弟節(jié)點(diǎn)的分離值之后處理結(jié)束。
- 它的兄弟節(jié)點(diǎn)由于處在低邊界上而沒(méi)有額外的子節(jié)點(diǎn)烟逊。在這種情況下把兩個(gè)兄弟節(jié)點(diǎn)合并到一個(gè)單一的節(jié)點(diǎn)中,而且我們遞歸到父節(jié)點(diǎn)上,因?yàn)樗粍h除了一個(gè)子節(jié)點(diǎn)。持續(xù)這個(gè)處理直到當(dāng)前節(jié)點(diǎn)是合法狀態(tài)或者到達(dá)根節(jié)點(diǎn),在其上根節(jié)點(diǎn)的子節(jié)點(diǎn)被合并而且合并后的節(jié)點(diǎn)成為新的根節(jié)點(diǎn)访雪。
由于葉子結(jié)點(diǎn)間有指向下一個(gè)葉子的指針, 便于遍歷, 以及區(qū)間查找, 所以數(shù)據(jù)庫(kù)的以及操作系統(tǒng)文件系統(tǒng)的實(shí)現(xiàn)常用 B+樹(shù),
7. B*樹(shù)
B-tree [5] 是 B+-tree 的變體,在 B+ 樹(shù)的基礎(chǔ)上 (所有的葉子結(jié)點(diǎn)中包含了全部關(guān)鍵字的信息,及指向含有這些關(guān)鍵字記錄的指針),B * 樹(shù)中非根和非葉子結(jié)點(diǎn)再增加指向兄弟的指針;B 樹(shù)定義了非葉子結(jié)點(diǎn)關(guān)鍵字個(gè)數(shù)至少為 (2/3)*M,即塊的最低使用率為 2/3(代替 B+ 樹(shù)的 1/2)
8. 代碼實(shí)現(xiàn)與測(cè)試
8.1. 測(cè)試
if __name__ =='__main__':
bt = bTree()
from random import shuffle,sample
n = 20
lst = [i for i in range(n)]
shuffle(lst)
test= sample(lst,len(lst)//4)
print(f'building b-tree with {lst}')
for i in lst:
bt.insert(i)
#print(f'inserting {i})
#print(bt)
print(bt)
print(f'serching {test}')
for i in test:
nd,idx = bt.search(i)
print(f'node: {repr(nd)}[{idx}]== {i}')
for i in test:
print(f'deleting {i}')
bt.delete(i)
print(bt)
8.2. python 實(shí)現(xiàn)
class node:
def __init__(self,keys=None,isLeaf = True,children=None):
if keys is None:keys=[]
if children is None: children =[]
self.keys = keys
self.isLeaf = isLeaf
self.children = []
def __getitem__(self,i):
return self.keys[i]
def __delitem__(self,i):
del self.keys[i]
def __setitem__(self,i,k):
self.keys[i] = k
def __len__(self):
return len(self.keys)
def __repr__(self):
return str(self.keys)
def __str__(self):
children = ','.join([str(nd.keys) for nd in self.children])
return f'keys: {self.keys}\nchildren: {children}\nisLeaf: {self.isLeaf}'
def getChd(self,i):
return self.children[i]
def delChd(self,i):
del self.children[i]
def setChd(self,i,chd):
self.children[i] = chd
def getChildren(self,begin=0,end=None):
if end is None:return self.children[begin:]
return self.children[begin:end]
def findKey(self,key):
for i,k in enumerate(self.keys):
if k>=key:
return i
return len(self)
def update(self,keys=None,isLeaf=None,children=None):
if keys is not None:self.keys = keys
if children is not None:self.children = children
if isLeaf is not None: self.isLeaf = isLeaf
def insert(self,i,key=None,nd=None):
if key is not None:self.keys.insert(i,key)
if not self.isLeaf and nd is not None: self.children.insert(i,nd)
def isLeafNode(self):return self.isLeaf
def split(self,prt,t):
# form new two nodes
k = self[t-1]
nd1 = node()
nd2 = node()
nd1.keys,nd2.keys = self[:t-1], self[t:] # note that t is 1 bigger than key index
nd1.isLeaf = nd2.isLeaf = self.isLeaf
if not self.isLeaf:
# note that children index is one bigger than key index, and all children included
nd1.children, nd2.children = self.children[0:t], self.children[t:]
# connect them to parent
idx = prt.findKey(k)
if prt.children !=[]: prt.children.remove(self) # remove the original node
prt.insert(idx,k,nd2)
prt.insert(idx,nd = nd1)
return prt
class bTree:
def __init__(self,degree=2):
self.root = node()
self.degree=degree
self.nodeNum = 1
self.keyNum = 0
def search(self,key,withpath=False):
nd = self.root
fathers = []
while True:
i = nd.findKey(key)
if i==len(nd): fathers.append((nd,i-1,i))
else: fathers.append((nd,i,i))
if i<len(nd) and nd[i]==key:
if withpath:return nd,i,fathers
else:return nd,i
if nd.isLeafNode():
if withpath:return None,None,None
else:return None,None
nd = nd.getChd(i)
def insert(self,key):
if len(self.root)== self.degree*2-1:
self.root = self.root.split(node(isLeaf=False),self.degree)
self.nodeNum +=2
nd = self.root
while True:
idx = nd.findKey(key)
if idx<len(nd) and nd[idx] == key:return
if nd.isLeafNode():
nd.insert(idx,key)
self.keyNum+=1
return
else:
chd = nd.getChd(idx)
if len(chd)== self.degree*2-1: #ensure its keys won't excess when its chd split and u
nd = chd.split(nd,self.degree)
self.nodeNum +=1
else:
nd = chd
def delete(self,key):#to do
'''search the key, delete it , and form down to up to rebalance it '''
nd,idx ,fathers= self.search(key,withpath=True)
if nd is None : return
del nd[idx]
self.keyNum-=1
if not nd.isLeafNode():
chd = nd.getChd(idx) # find the predecessor key
while not chd.isLeafNode():
fathers.append((chd,len(chd)-1,len(chd)))
chd = chd.getChd(-1)
fathers.append((chd,len(chd)-1,len(chd)))
nd.insert(idx,chd[-1])
del chd[-1]
if len(fathers)>1:self.rebalance(fathers)
def rebalance(self,fathers):
nd,keyIdx,chdIdx = fathers.pop()
while len(nd)<self.degree-1: # rebalance tree from down to up
prt,keyIdx,chdIdx = fathers[-1]
lbro = [] if chdIdx==0 else prt.getChd(chdIdx-1)
rbro = [] if chdIdx==len(prt) else prt.getChd(chdIdx+1)
if len(lbro)<self.degree and len(rbro)<self.degree: # merge two deficient nodes
beforeNode,afterNode = None,None
if lbro ==[]:
keyIdx = chdIdx
beforeNode,afterNode = nd,rbro
else:
beforeNode,afterNode = lbro,nd
keyIdx = chdIdx-1 # important, when choosing
keys = beforeNode[:]+[prt[keyIdx]]+afterNode[:]
children = beforeNode.getChildren() + afterNode.getChildren()
isLeaf = beforeNode.isLeafNode()
prt.delChd(keyIdx+1)
del prt[keyIdx]
nd.update(keys,isLeaf,children)
prt.children[keyIdx]=nd
self.nodeNum -=1
elif len(lbro)>=self.degree: # rotate when only one sibling is deficient
keyIdx = chdIdx-1
nd.insert(0,prt[keyIdx]) # rotate keys
prt[keyIdx] = lbro[-1]
del lbro[-1]
if not nd.isLeafNode(): # if not leaf, move children
nd.insert(0,nd=lbro.getChd(-1))
lbro.delChd(-1)
else:
keyIdx = chdIdx
nd.insert(len(nd),prt[keyIdx]) # rotate keys
prt[keyIdx] = rbro[0]
del rbro[0]
if not nd.isLeafNode(): # if not leaf, move children
#note that insert(-1,ele) will make the ele be the last second one
nd.insert(len(nd),nd=rbro.getChd(0))
rbro.delChd(0)
if len(fathers)==1:
if len(self.root)==0:
self.root = nd
self.nodeNum -=1
break
nd,i,j = fathers.pop()
def __str__(self):
head= '\n'+'-'*30+'B Tree'+'-'*30
tail= '-'*30+'the end'+'-'*30+'\n'
lst = [[head],[f'node num: {self.nodeNum}, key num: {self.keyNum}']]
cur = []
ndNum =0
ndTotal= 1
que = [self.root]
while que!=[]:
nd = que.pop(0)
cur.append(repr(nd))
ndNum+=1
que+=nd.getChildren()
if ndNum==ndTotal:
lst.append(cur)
cur = []
ndNum = 0
ndTotal =len(que)
lst.append([tail])
lst = [','.join(li) for li in lst]
return '\n'.join(lst)
def __iter__(self,nd = None):
if nd is None: nd = self.root
que = [nd]
while que !=[]:
nd = que.pop(0)
yield nd
if nd.isLeafNode():continue
for i in range(len(nd)+1):
que.append(nd.getChd(i))