一傲武、導(dǎo)言
生成樹(shù)(spanning tree):在圖論中涵防,無(wú)向圖G=(V,E)的生成樹(shù)(spanning tree)是具有G的全部頂點(diǎn)躯舔,但邊數(shù)最少的聯(lián)通子圖滞谢。假設(shè)G中一共有n個(gè)頂點(diǎn),一顆生成樹(shù)滿足下列條件:
(1)n個(gè)頂點(diǎn)锁保;
(2)n-1條邊薯酝;
(3)n個(gè)頂點(diǎn)聯(lián)通;
(4)一個(gè)圖的生成樹(shù)可能有多個(gè)爽柒。
最小生成樹(shù)(minimum spanning tree吴菠, MST)/最小生成森林:聯(lián)通加權(quán)無(wú)向圖中邊緣權(quán)重加和最小的生成樹(shù)。給定無(wú)向圖G=(V,E)浩村,(u,v)代表頂點(diǎn)u與頂點(diǎn)v的邊做葵,w(u,v)代表此邊的權(quán)重,若存在生成樹(shù)T使得:
最小心墅,則T為G的最小生成樹(shù)蜂挪。對(duì)于非連通無(wú)向圖來(lái)說(shuō)重挑,它的每一連通分量同樣有最小生成樹(shù),它們的并被稱為最小生成森林棠涮。最小生成樹(shù)除了繼承生成樹(shù)的性質(zhì)之外谬哀,還存在下面兩個(gè)特點(diǎn):
- 當(dāng)圖的每一條邊的權(quán)值都相同時(shí),該圖的所有生成樹(shù)都是最小生成樹(shù)严肪;
- 如果圖的每一條邊的權(quán)值都互不相同史煎,那么最小生成樹(shù)將只有一個(gè)。
最小生成樹(shù)示例:
最小生成樹(shù)MST的實(shí)際應(yīng)用:
- 構(gòu)建成本最小的連接網(wǎng)絡(luò):電網(wǎng)驳糯,計(jì)算機(jī)網(wǎng)絡(luò)篇梭、交通網(wǎng)絡(luò)、供應(yīng)鏈網(wǎng)絡(luò)酝枢;
-
最小生成樹(shù)聚類:考慮數(shù)據(jù)集D恬偷,計(jì)算兩點(diǎn)之間的距離當(dāng)作邊緣權(quán)重(一般采用歐式距離)。最小生成樹(shù)聚類思想為帘睦,首先通過(guò)Prim等算法對(duì)數(shù)據(jù)集生成最小生成樹(shù)袍患,然后根據(jù)給定的閾值對(duì)最小生成樹(shù)的邊權(quán)重進(jìn)行掃描,當(dāng)邊緣權(quán)重大于設(shè)定的閾值時(shí)竣付,將對(duì)應(yīng)的邊切斷诡延。對(duì)所有數(shù)據(jù)執(zhí)行上述操作后,剩下的還相互連接的部分即為相同的類或簇古胆。
- 反洗錢:核心交易網(wǎng)絡(luò)發(fā)現(xiàn)+核心交易路線分
二肆良、Prim算法介紹
Prim's Algorithm也翻譯做普里姆算法,是1930年捷克數(shù)學(xué)家算法沃伊捷赫·亞爾尼克發(fā)現(xiàn)逸绎;并在1957年由美國(guó)計(jì)算機(jī)科學(xué)家羅伯特·普里姆獨(dú)立發(fā)現(xiàn)惹恃;1959年,艾茲格·迪科斯徹再次發(fā)現(xiàn)了該算法棺牧。因此巫糙,在某些場(chǎng)合,普里姆算法又被稱為DJP算法陨帆、亞爾尼克算法或普里姆-亞爾尼克算法曲秉。
Prim算法的思想:從任意一個(gè)頂點(diǎn)開(kāi)始采蚀,每次選擇與當(dāng)前頂點(diǎn)最近的一個(gè)頂點(diǎn)疲牵,并將兩點(diǎn)之間的邊加入到樹(shù)中。算法描述如下:
- 輸入:加權(quán)無(wú)向圖G榆鼠, 頂點(diǎn)集合為V纲爸,邊集合為E;
- 初始化: 妆够,s為集合V中選擇的任意起始點(diǎn)识啦,负蚊;
- 重復(fù)下列操作,直到:
3.1 在集合E中選取權(quán)值最小的邊(u, v)颓哮,其中u為集合中的元素家妆,而v則是V中沒(méi)有加入的頂點(diǎn)(如果存在有多條滿足前述條件即具有相同權(quán)值的邊,則可任意選取其中之一)冕茅;
3.2 將v加入集合中伤极,將(u, v)加入集合中;- 輸出:使用集合和來(lái)描述所得到的最小生成樹(shù)姨伤。
下面對(duì)算法的圖例描述:
輸入:
說(shuō)明: 此為原始加權(quán)無(wú)向連通圖哨坪,每條邊的數(shù)字代表其權(quán)重;
第1步:頂點(diǎn)A乍楚、B当编、E和F通過(guò)單條邊與D相連。A是距離D最近的頂點(diǎn)徒溪,因此將A及對(duì)應(yīng)邊AD以高亮表示
第2步:下一個(gè)頂點(diǎn)為距離D或A最近的頂點(diǎn)忿偷。B距D為9,距A為7词渤,E為15牵舱,F(xiàn)為6。因此缺虐,F(xiàn)距D或A最近芜壁,因此將頂點(diǎn)F與相應(yīng)邊DF以高亮表示。
第3步:算法繼續(xù)重復(fù)上面的步驟高氮。距離A為7的頂點(diǎn)B被高亮表示慧妄。
第4步:在當(dāng)前情況下,可以在C剪芍、E與G間進(jìn)行選擇塞淹。C距B為8,E距B為7罪裹,G距F為11饱普。E最近,因此將頂點(diǎn)E與相應(yīng)邊BE高亮表示状共。
第5步:這里套耕,可供選擇的頂點(diǎn)只有C和G。C距E為5峡继,G距E為9冯袍,故選取C,并與邊EC一同高亮表示。
第6步:頂點(diǎn)G是唯一剩下的頂點(diǎn)康愤,它距F為11儡循,距E為9,E最近征冷,故高亮表示G及相應(yīng)邊EG择膝。
輸出:所有頂點(diǎn)均已被選取,圖中綠色部分即為連通圖的最小生成樹(shù)检激。在此例中调榄,最小生成樹(shù)的權(quán)值之和為39。
三呵扛、Kruskal算法介紹
Kruskal是另一個(gè)計(jì)算最小生成樹(shù)的算法每庆,由Joseph Kruskal在1956年發(fā)表,屬于貪婪算法今穿。
算法原理:將每個(gè)頂點(diǎn)放入其自身的數(shù)據(jù)集中缤灵,然后按照權(quán)重的升序來(lái)選擇邊。當(dāng)選擇每條邊時(shí)蓝晒,判斷定義邊的頂點(diǎn)是否在不同的數(shù)據(jù)集中腮出。如果是,將此邊插入最小生成樹(shù)的集合中芝薇,將集合中包含每個(gè)頂點(diǎn)的聯(lián)合體取出胚嘲。算法描述如下:
- 新建圖G,G中擁有原圖中相同的節(jié)點(diǎn)洛二,但不包含邊馋劈;
- 將原圖中所有的邊按權(quán)值從小到大排訊;
- 從權(quán)值最小的邊開(kāi)始晾嘶,如果這條邊連接的兩個(gè)節(jié)點(diǎn)在G中不在同一個(gè)分量妓雾,則添加這條邊到圖G中;
重復(fù)3垒迂,直到圖G中所有的節(jié)點(diǎn)在同一個(gè)聯(lián)通分量中
Kruskal算法的演示如下:
假設(shè)有一張圖G械姻,包含若干點(diǎn)和邊。
第1步:將所有的邊的長(zhǎng)度排序机断,用排序的結(jié)果作為選擇邊的依據(jù)楷拳。資源排序,對(duì)局部最優(yōu)的資源進(jìn)行選擇吏奸,排序完成后欢揖,率先選擇了邊AD。
第2步:在剩下的邊中尋找苦丁,找到了CE浸颓。這里邊的權(quán)重也是5
第3步:依次類推找到了DF物臂,AB旺拉,BE产上。
第4步:下面繼續(xù)選擇, BC或者EF盡管現(xiàn)在長(zhǎng)度為8的邊是最小的未選擇的邊蛾狗。但是現(xiàn)在他們已經(jīng)連通了(對(duì)于BC可以通過(guò)CE,EB來(lái)連接晋涣,類似的EF可以通過(guò)EB,BA,AD,DF來(lái)接連)。所以不需要選擇他們沉桌。類似的BD也已經(jīng)連通了(這里上圖的連通線用紅色表示了)谢鹊。最后就剩下EG和FG了。當(dāng)然我們選擇了EG留凭。(等判斷是否聯(lián)通)
四佃扼、Prim和Kruskal算法的R及python實(shí)現(xiàn)
1. R 實(shí)現(xiàn) Prim機(jī)Kruskal算法:igraph包mst函數(shù), RBGL包的mstree.prim/mstree.kruskal
RBGL代碼示例:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Rgraphviz")
BiocManager::install("RGBL")
require(RGBL)
require(XML)
require(Rgraphviz)
require(ape) # DNA sequences
# 1. simple DNA example
cat(">No305",
"NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT",
">No304",
"ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT",
">No306",
"ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT",
">No307",
"ATTCGAATAACACAGCCACTTCTAAAAATTATCAATCACT",
file = "exdna.fas", sep = "\n")
data <- read.dna("exdna.fas", format = "fasta")
#generate a distance matrix
dist <- dist.dna(data,model="raw", as.matrix=TRUE)
#creates an undirected graph
dist.g<-as(dist,Class="graphNEL")
#generates the minimum spanning tree using the prim algorithm
ms<-mstree.prim(dist.g)
fromto<-cbind(ms$edgeList[1,],ms$edgeList[2,],ms$weight[1,])
adjMST<-ftM2adjM(as.matrix(fromto[,1:2]),W=fromto[,3],edgemode="undirected")
am.graph<-new("graphAM", adjMat=adjMST )
plot(am.graph, attrs = list(node = list(fillcolor = "lightblue"),edge = list(arrowsize=0.5)),"neato")
#2. more complicated DNA example
a <- "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/sequence_read/"
b <- "SRR062641.filt.fastq.gz"
URL <- paste0(a, b)
download.file(URL, b)
X <- read.fastq(b)
names(X)<-paste0("DNA_1",1:length(X))
dist <- dist.dna(X[1:30],model="raw", as.matrix=TRUE)
#creates an undirected graph
dist.g<-as(dist,Class="graphNEL")
#generates the minimum spanning tree using kruskal algorithm
ms<-mstree.kruskal(dist.g)
fromto<-cbind(ms$edgeList[1,],ms$edgeList[2,],ms$weight[1,])
adjMST<-ftM2adjM(as.matrix(fromto[,1:2]),W=fromto[,3],edgemode="undirected")
am.graph<-new("graphAM", adjMat=adjMST )
plot(am.graph, attrs = list(node = list(fillcolor = "lightblue"),edge = list(arrowsize=0.5)),"neato")
結(jié)果示例:
> ms
$edgeList
[,1] [,2] [,3] [,4]
from "No305" "No306" "No305" "No306"
to "No305" "No304" "No306" "No307"
$weights
[,1] [,2] [,3] [,4]
weight 0 0.02631579 0.02631579 0.07894737
$nodes
[1] "No305" "No304" "No306" "No307"
> ms
$edgeList
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
from "DNA_114" "DNA_118" "DNA_119" "DNA_18" "DNA_11" "DNA_15" "DNA_110"
to "DNA_128" "DNA_124" "DNA_118" "DNA_14" "DNA_129" "DNA_117" "DNA_127"
[,8] [,9] [,10] [,11] [,12] [,13] [,14]
from "DNA_120" "DNA_121" "DNA_122" "DNA_114" "DNA_119" "DNA_19" "DNA_120"
to "DNA_122" "DNA_14" "DNA_124" "DNA_11" "DNA_121" "DNA_126" "DNA_128"
[,15] [,16] [,17] [,18] [,19] [,20] [,21]
from "DNA_17" "DNA_111" "DNA_116" "DNA_12" "DNA_15" "DNA_110" "DNA_110"
to "DNA_129" "DNA_129" "DNA_118" "DNA_17" "DNA_115" "DNA_11" "DNA_16"
[,22] [,23] [,24] [,25] [,26] [,27] [,28]
from "DNA_110" "DNA_111" "DNA_112" "DNA_125" "DNA_125" "DNA_121" "DNA_113"
to "DNA_126" "DNA_112" "DNA_15" "DNA_130" "DNA_127" "DNA_123" "DNA_19"
[,29]
from "DNA_11"
to "DNA_13"
$weights
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
weight 0.5833333 0.59375 0.59375 0.625 0.6354167 0.6458333 0.6458333 0.6458333
[,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
weight 0.6458333 0.6458333 0.6458333 0.65625 0.65625 0.65625 0.65625 0.6666667
[,17] [,18] [,19] [,20] [,21] [,22] [,23]
weight 0.6666667 0.6770833 0.6770833 0.6770833 0.6770833 0.6770833 0.6770833
[,24] [,25] [,26] [,27] [,28] [,29]
weight 0.6770833 0.6770833 0.6770833 0.6875 0.6875 0.6979167
$nodes
[1] "DNA_11" "DNA_12" "DNA_13" "DNA_14" "DNA_15" "DNA_16" "DNA_17"
[8] "DNA_18" "DNA_19" "DNA_110" "DNA_111" "DNA_112" "DNA_113" "DNA_114"
[15] "DNA_115" "DNA_116" "DNA_117" "DNA_118" "DNA_119" "DNA_120" "DNA_121"
[22] "DNA_122" "DNA_123" "DNA_124" "DNA_125" "DNA_126" "DNA_127" "DNA_128"
[29] "DNA_129" "DNA_130"
2. Python實(shí)現(xiàn)Prim和Kruskal算法
Prim算法示例:
import random
import time
def random_matrix_genetor(vex_num=10):
'''
隨機(jī)圖頂點(diǎn)矩陣生成器
輸入:頂點(diǎn)個(gè)數(shù),即矩陣維數(shù)
'''
data_matrix=[]
for i in range(vex_num):
one_list=[]
for j in range(vex_num):
one_list.append(random.randint(1, 100))
data_matrix.append(one_list)
return data_matrix
def prim(data_matrix):
'''
prim 算法
'''
vex_num=len(data_matrix)
prims=[]
weights=[]
flag_list=[False]*vex_num
node=0
for i in range(vex_num):
prims.append(0)
weights.append(0)
flag_list[node]=True
for i in range(vex_num):
weights[i]=data_matrix[node][i]
for i in range(vex_num-1):
min_value=float('inf')
for j in range(vex_num):
if weights[j]!=float('inf') and weights[j]<min_value and not flag_list[j]:
min_value=weights[j]
node=j
if node==0:
return
flag_list[node]=True
for m in range(vex_num):
if weights[m]>data_matrix[node][m] and not flag_list[m]:
weights[m]=data_matrix[node][m]
prims[m]=node
return weights, prims
def main_test_func(vex_num=10):
'''
主測(cè)試函數(shù)
'''
start_time=time.time()
data_matrix=random_matrix_genetor(vex_num)
weights, prims=prim(data_matrix)
print(weights)
print(prims)
end_time=time.time()
return end_time-start_time
if __name__=='__main__':
data_matrix=random_matrix_genetor(10)
weights, prims=prim(data_matrix)
print(weights)
print(prims)
time_list=[]
print('----------------------------10頂點(diǎn)測(cè)試-------------------------------------')
time10=main_test_func(10)
time_list.append(time10)
print('----------------------------50頂點(diǎn)測(cè)試-------------------------------------')
time50=main_test_func(50)
time_list.append(time50)
print('----------------------------100頂點(diǎn)測(cè)試-------------------------------------')
time100=main_test_func(100)
time_list.append(time100)
print('---------------------------------時(shí)間消耗對(duì)比--------------------------------')
for one_time in time_list:
print(one_time)
輸出:
[26, 11, 12, 36, 13, 1, 4, 4, 6, 4]
[0, 7, 3, 0, 5, 7, 5, 2, 1, 7]
----------------------------10頂點(diǎn)測(cè)試-------------------------------------
[71, 4, 39, 8, 20, 17, 4, 13, 10, 1]
[0, 3, 5, 4, 0, 6, 4, 4, 0, 8]
----------------------------50頂點(diǎn)測(cè)試-------------------------------------
[60, 4, 4, 3, 4, 3, 9, 7, 1, 1, 3, 4, 2, 6, 5, 4, 5, 1, 2, 4, 3, 3, 2, 5, 4, 4, 4, 1, 4, 3, 5, 6, 2, 3, 5, 5, 3, 4, 2, 3, 4, 6, 3, 2, 2, 3, 4, 3, 3, 3]
[0, 27, 21, 37, 29, 1, 17, 16, 27, 27, 9, 21, 37, 23, 13, 37, 49, 34, 33, 26, 2, 42, 27, 40, 47, 16, 33, 15, 35, 26, 11, 40, 24, 24, 12, 13, 49, 11, 29, 19, 48, 7, 4, 7, 49, 5, 15, 0, 10, 45]
----------------------------100頂點(diǎn)測(cè)試-------------------------------------
[68, 1, 2, 2, 1, 3, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 1, 3, 3, 1, 2, 3, 1, 2, 1, 1, 2, 1, 2, 3, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 3, 2, 2, 2, 3, 1, 1, 1, 2, 1, 3, 1, 2, 2, 1, 2, 2, 4, 2, 2, 1, 1, 1, 1, 3, 1, 2, 1, 3, 2, 2, 2, 1, 1, 1, 8, 5, 1, 2, 1, 2, 3, 3, 1, 2, 3, 6, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2]
[0, 4, 7, 1, 42, 99, 97, 11, 68, 21, 28, 27, 50, 26, 49, 6, 7, 22, 44, 80, 95, 3, 34, 30, 41, 49, 35, 52, 31, 12, 0, 63, 22, 90, 59, 15, 86, 74, 36, 82, 27, 32, 91, 45, 2, 82, 91, 55, 66, 34, 59, 86, 63, 52, 82, 56, 0, 82, 98, 0, 7, 38, 59, 26, 39, 31, 37, 56, 65, 87, 57, 2, 62, 56, 94, 28, 46, 38, 37, 29, 68, 14, 38, 54, 54, 84, 14, 6, 42, 27, 57, 12, 51, 4, 29, 12, 95, 30, 47, 35]
---------------------------------時(shí)間消耗對(duì)比--------------------------------
0.0002651214599609375
0.004545927047729492
0.020392179489135742
Kruskal算法示例:
node = dict()
rank = dict()
def make_set(point):
node[point] = point
rank[point] = 0
def find(point):
if node[point] != point:
node[point] = find(node[point])
return node[point]
def merge(point1, point2):
root1 = find(point1)
root2 = find(point2)
if root1 != root2:
if rank[root1] > rank[root2]:
node[root2] = root1
else:
node[root1] = root2
if rank[root1] == rank[root2] : rank[root2] += 1
def Kruskal(graph):
for vertice in graph['vertices']:
make_set(vertice)
mst = set()
edges = list(graph['edges'])
edges.sort()
for edge in edges:
weight, v1, v2 = edge
if find(v1) != find(v2):
merge(v1 , v2)
mst.add(edge)
return mst
graph = {
'vertices': ['A', 'B', 'C', 'D','E'],
'edges': set([
(1, 'A', 'B'),
(5, 'A', 'C'),
(3, 'A', 'D'),
(4, 'B', 'C'),
(2, 'B', 'D'),
(1, 'C', 'D'),
(8, 'B', 'E'),
(3, 'A', 'E'),
])
}
print(Kruskal(graph))
輸出:
{(1, 'C', 'D'), (1, 'A', 'B'), (3, 'A', 'E'), (2, 'B', 'D')}
參考資料
[1] Spanning tree, wiki: https://zh.wikipedia.org/wiki/%E6%9C%80%E5%B0%8F%E7%94%9F%E6%88%90%E6%A0%91;
[2]Minimum spanning tree, wiki: https://zh.wikipedia.org/wiki/%E7%94%9F%E6%88%90%E6%A0%91;
[3] 最小生成樹(shù)聚類: http://dataminer.me/2017/10/20/%E6%9C%80%E5%B0%8F%E7%94%9F%E6%88%90%E6%A0%91%E8%81%9A%E7%B1%BB/;
[4] 最小生成樹(shù)-Prim算法和Kruskal算法: https://www.cnblogs.com/biyeymyhjob/archive/2012/07/30/2615542.html;
[5] prim's algorithm, wiki: https://zh.wikipedia.org/wiki/%E6%99%AE%E6%9E%97%E5%A7%86%E7%AE%97%E6%B3%95蔼夜;
[6]圖的基本算法(最小生成樹(shù)): http://www.reibang.com/p/efcd21494dff;
[7]算法導(dǎo)論--最小生成樹(shù)(Kruskal和Prim算法):https://blog.csdn.net/luoshixian099/article/details/51908175;
[8]Kruskal算法兼耀,wiki:https://zh.wikipedia.org/wiki/%E5%85%8B%E9%B2%81%E6%96%AF%E5%85%8B%E5%B0%94%E6%BC%94%E7%AE%97%E6%B3%95;
[9] RGBL installation: http://www.bioconductor.org/packages/release/bioc/html/RBGL.html;
[10]mstree.kruskal: https://www.rdocumentation.org/packages/RBGL/versions/1.48.1/topics/mstree.kruskal;
[11]Draw Minimum Spanning Tree (Mst) With Pie Charts In R: https://www.biostars.org/p/18876/;
[12]graphviz installation:https://www.bioconductor.org/packages/release/bioc/html/Rgraphviz.html;
[13]ape, read.dna 函數(shù):https://www.rdocumentation.org/packages/ape/versions/5.3/topics/read.dna;
[14]Minimum spanning tree in igraph: https://igraph.org/r/doc/mst.html;
[15]python實(shí)現(xiàn)Prim算法求解加權(quán)連通圖的最小生成樹(shù)問(wèn)題:https://blog.csdn.net/Together_CZ/article/details/74783631;
[16] Github: https://github.com/qiwsir/algorithm/blob/master/kruskal_algorithm.md;
[17]知乎:https://zhuanlan.zhihu.com/p/61628249