在講Class Extension的時(shí)候:有一個(gè)十分重要的定義绩脆。
Let P be a prefix class with encoding P,and let (x,i) and (y,j) denote any two elements in the class.And let denote the class representing extensions of element(x,i).
Tree Mining Problem
Let D be a database of trees (ie:a forest).
and let subtree for some
.
Each occurence of S can be identified by its match label,which is given as the set of matching positions (in T) for nodes in S.
what's the match label?
let be the nodes in T,so
let be the nodes in S,so
then S has a match label
1: for all k = 1,...,m
2:branch in S iff
is an ancestor of
in T.
這里有兩個(gè)Condition.不是很懂惕味。
注意:是指
這個(gè)node的label. t 和 s 如同n一樣的作用。沒有其他的意思名挥。
個(gè)人認(rèn)為:
then S has a match label
應(yīng)該改成:
then S has a match label 使得
第4節(jié):使用Scope-List 來加快子樹的支持度的計(jì)算禀倔。
Scope-List Representation:
概念:
X is a k-subtree of a tree T.
refer to the last node of X.
We use the notation to refer to the scope-list of X.
Each element of the scope-list is a triple(t,m,s).
where t is a tree id (tid) means X.
m is a match label of the (k-1) length prefix of X (base T) (個(gè)人添加)。
(recall that the prefix match label gives the positions of nodes in T that match the prefix愧杯。)
(Since a given prefix can occur multiple times in a tree ,X can be associated with multiple match label as well as multiple scopes.)
s is the scope of the last item
有了上述的概念之后捎谨。
4.1:Frequent Subtree Enumeration
Computing and
:
Suppose that the initial database is in the horizontal string encode format.
所以D里面的T是一條條串。
看懂代碼里面的描述形式:
TreeMiner(D,minsup):
= { classes [] frequent 1-subtrees };
= { classes [P]1 of frequent 2-subtrees };
for alldo Enumerate-Frequent-Subtrees(
);
//注意
Enumerate-Frequent-Subtrees():
for each elementdo
for each elementdo
R = {(x,i)+(y,j)};
if for any,r is frequent then
Enumerate-Frequent-Subtrees()
里面好這個(gè)函數(shù),每次傳入的是一個(gè) 前綴最后加入的是x元素的集合检吆。
其實(shí)是 前綴集合的子集。就是最后加入的