摘要:圖神經(jīng)網(wǎng)絡(luò)
没卸,GraphSAGE
GraphSAGE快速開始
找到了github排名第一的GraphSAGE項目,下載下來
git clone https://github.com/williamleif/GraphSAGE.git
再下載一下所需要的蛋白質(zhì)數(shù)據(jù)ppi.zip
鏈接在http://snap.stanford.edu/graphsage/
直接啟動項目下的example_supervised.sh
./example_supervised.sh
...
Iter: 0083 train_loss= 0.47060 train_f1_mic= 0.55874 train_f1_mac= 0.38880 val_loss= 0.45000 val_f1_mic= 0.56650 val_f1_mac= 0.38983 time= 0.06750
Optimization Finished!
Full validation stats: loss= 0.45913 f1_micro= 0.57371 f1_macro= 0.40486 time= 0.95981
Writing test set stats to file (don't peak!)
數(shù)據(jù)介紹
PPI是指兩種或以上的蛋白質(zhì)結(jié)合的過程谤专,如果兩個蛋白質(zhì)共同參與一個生命過程或者協(xié)同完成某一功能,都被看作這兩個蛋白質(zhì)之間存在相互作用。多個蛋白質(zhì)之間的復(fù)雜的相互作用關(guān)系可以用PPI網(wǎng)絡(luò)來描述。
下面從作者代碼開始看數(shù)據(jù)源区赵,作者在main中l(wèi)oad_data讀取數(shù)據(jù)源,其中FLAGS.train_prefix為./example_data/ppi
train_data = load_data(FLAGS.train_prefix)
def load_data(prefix, normalize=True, load_walks=False):
G_data = json.load(open(prefix + "-G.json"))
G = json_graph.node_link_graph(G_data)
G_data.keys()
Out[11]: dict_keys(['directed', 'graph', 'nodes', 'links', 'multigraph'])
看一下G_data的nodes屬性
G_data['nodes']
{'test': False, 'id': 998, 'val': False},
{'test': False, 'id': 999, 'val': False},
...]
G_data['nodes'][0]
Out[20]: {'test': False, 'id': 0, 'val': False}
G_data['nodes'][-1]
Out[19]: {'test': True, 'id': 56943, 'val': False}
圖中一共包含56944個節(jié)點浪南,每個節(jié)點的屬性包括id
笼才,test
,val
络凿,分別表示出是否是測試集和驗證集骡送,看下G_data的links屬性
len(G_data['links'])
Out[22]: 818716
G_data['links'][-1]
Out[23]: {'source': 56939, 'target': 56939}
圖上一共有818716條邊,每個邊的屬性包括source
起點和target
終點絮记,調(diào)用networkx的json_graph讀取為圖數(shù)據(jù)摔踱,圖數(shù)據(jù)的節(jié)點數(shù)和邊數(shù)和數(shù)據(jù)源一致
len(G.edges())
Out[28]: 818716
len(G.nodes())
Out[29]: 56944
下面讀取節(jié)點特征向量
if os.path.exists(prefix + "-feats.npy"):
feats = np.load(prefix + "-feats.npy")
else:
print("No features present.. Only identity features will be used.")
feats = None
看一下features的維度為56944節(jié)點,50個特征怨愤,特征是稀疏的1,0
feats.shape
Out[33]: (56944, 50)
下一步讀取id映射關(guān)系派敷,實際上是一一映射的
id_map = json.load(open(prefix + "-id_map.json"))
id_map = {conversion(k): int(v) for k, v in id_map.items()}
id_map[56943]
Out[45]: 56943
下面讀取y值標(biāo)簽
class_map = json.load(open(prefix + "-class_map.json"))
if isinstance(list(class_map.values())[0], list):
lab_conversion = lambda n: n
else:
lab_conversion = lambda n: int(n)
class_map = {conversion(k): lab_conversion(v) for k, v in class_map.items()}
label是字典格式,key是節(jié)點id撰洗,value是label值篮愉,是一個121長度的列表,其中有多個1差导,剩余是0试躏,是一個multihot模式
len(class_map)
Out[55]: 56944
下一步數(shù)據(jù)清洗,去掉節(jié)點中沒有test和val屬性的臟數(shù)據(jù)
broken_count = 0
for node in G.nodes():
if not 'val' in G.node[node] or not 'test' in G.node[node]:
G.remove_node(node)
broken_count += 1
print("Removed {:d} nodes that lacked proper annotations due to networkx versioning issues".format(broken_count))
下一步給邊打上train_removed屬性设褐,如果邊的兩個節(jié)點有任意一個test或者val屬性是True颠蕴,則將邊屬性置為train_removed
for edge in G.edges():
if (G.node[edge[0]]['val'] or G.node[edge[1]]['val'] or
G.node[edge[0]]['test'] or G.node[edge[1]]['test']):
G[edge[0]][edge[1]]['train_removed'] = True
else:
G[edge[0]][edge[1]]['train_removed'] = False
下一步對訓(xùn)練集節(jié)點進行標(biāo)準化,即列方向均值為0络断,標(biāo)準差為1
if normalize and not feats is None:
from sklearn.preprocessing import StandardScaler
train_ids = np.array([id_map[n] for n in G.nodes() if not G.node[n]['val'] and not G.node[n]['test']])
train_feats = feats[train_ids]
scaler = StandardScaler()
scaler.fit(train_feats)
feats = scaler.transform(feats)
最終load_data返回G, feats, id_map, walks, class_map裁替,分別是圖對象,節(jié)點特征矩陣貌笨,idmap弱判,walks空列表和y值字典
模型訓(xùn)練部分
數(shù)據(jù)準備
直接看train函數(shù),接受剛才得到的load_data數(shù)據(jù)
G = train_data[0]
features = train_data[1]
id_map = train_data[2]
class_map = train_data[4]
if isinstance(list(class_map.values())[0], list):
num_classes = len(list(class_map.values())[0])
else:
num_classes = len(set(class_map.values()))
首先拿到了tuple每個位置對象锥惋,并且得到num_classes=121昌腰,接下來對features做了一次padding,在最下面加了一行50*0膀跌,具體原因未知
if not features is None:
# pad with dummy zero vector
features = np.vstack([features, np.zeros((features.shape[1],))])
下面定義了一個和walks一樣的列表context_pairs遭商,默認是空列表
context_pairs = train_data[3] if FLAGS.random_context else None
下面定義placeholder
placeholders = construct_placeholders(num_classes)
跟一下這個函數(shù)construct_placeholders
def construct_placeholders(num_classes):
# Define placeholders
placeholders = {
'labels': tf.placeholder(tf.float32, shape=(None, num_classes), name='labels'),
'batch': tf.placeholder(tf.int32, shape=(None), name='batch1'),
'dropout': tf.placeholder_with_default(0., shape=(), name='dropout'),
'batch_size': tf.placeholder(tf.int32, name='batch_size'),
}
return placeholders
占位符定義了labels
,batch
捅伤,dropout
劫流,batch_size
并且定義了tensor的名稱。下面看這個函數(shù),名字叫小批量祠汇,看起來像生成一個批量的訓(xùn)練樣本
minibatch = NodeMinibatchIterator(G,
id_map,
placeholders,
class_map,
num_classes,
batch_size=FLAGS.batch_size,
max_degree=FLAGS.max_degree,
context_pairs=context_pairs)
跟一下這個類
def __init__(self, G, id2idx,
placeholders, label_map, num_classes,
batch_size=100, max_degree=25,
**kwargs):
self.G = G
self.nodes = G.nodes()
self.id2idx = id2idx # id_map
self.placeholders = placeholders
self.batch_size = batch_size # 512
self.max_degree = max_degree # 128
self.batch_num = 0
self.label_map = label_map # class_map
self.num_classes = num_classes # 121
以上是一幫初始值仍秤,重點看一下max_degree=128
,代表鄰接矩陣下采樣的最大值可很,說白了每個節(jié)點取最大鄰接節(jié)點128個诗力。下面看這行代碼生成鄰接列表
和度列表
self.adj, self.deg = self.construct_adj()
跟一下construct_adj
,簡單而言作者創(chuàng)建了adj
和deg
兩個array我抠,其中adj的shape=(56945, 128)
苇本,deg的shape=(56944,)
def construct_adj(self):
adj = len(self.id2idx) * np.ones((len(self.id2idx) + 1, self.max_degree))
deg = np.zeros((len(self.id2idx),))
for nodeid in self.G.nodes():
if self.G.node[nodeid]['test'] or self.G.node[nodeid]['val']:
continue
# 只取訓(xùn)練
# 獲取nodeid的鄰居節(jié)點,且邊不為train_removed的節(jié)點id
neighbors = np.array([self.id2idx[neighbor]
for neighbor in self.G.neighbors(nodeid)
if (not self.G[nodeid][neighbor]['train_removed'])])
# nodeid的度
deg[self.id2idx[nodeid]] = len(neighbors)
if len(neighbors) == 0:
continue
if len(neighbors) > self.max_degree:
# 抽取最大max_degree鄰居數(shù)菜拓,不放回抽樣
neighbors = np.random.choice(neighbors, self.max_degree, replace=False)
elif len(neighbors) < self.max_degree:
# 鄰居不足128瓣窄,有放回抽樣到128
neighbors = np.random.choice(neighbors, self.max_degree, replace=True)
adj[self.id2idx[nodeid], :] = neighbors
return adj, deg
下面只對訓(xùn)練集求每個節(jié)點的度和鄰居,如果鄰居超過max_degree=128則進行階段尘惧,如果少于則重復(fù)抽樣直到達到128康栈,最后adj每一行代表一個節(jié)點的鄰居列表,dag的每一個元素代表對應(yīng)位置的度喷橙∩睹矗看一下最后的結(jié)果
minibatch.adj[0]
Out[115]:
array([ 766., 1101., 766., 1101., 766., 1101., 372., 1101., 766.,
1101., 372., 766., 372., 1101., 766., 1101., 372., 372.,
372., 372., 1101., 766., 372., 766., 372., 372., 1101.,
372., 1101., 372., 372., 372., 766., 1101., 1101., 766.,
766., 766., 1101., 372., 372., 766., 1101., 372., 766.,
766., 766., 766., 1101., 766., 372., 1101., 372., 766.,
372., 766., 1101., 766., 372., 766., 766., 372., 766.,
1101., 766., 1101., 1101., 766., 372., 372., 1101., 372.,
766., 1101., 1101., 372., 1101., 1101., 372., 372., 1101.,
766., 1101., 1101., 1101., 766., 372., 766., 766., 1101.,
766., 766., 1101., 766., 372., 766., 766., 1101., 1101.,
766., 766., 372., 1101., 1101., 372., 1101., 372., 1101.,
766., 766., 372., 1101., 1101., 1101., 372., 1101., 1101.,
372., 766., 1101., 766., 372., 1101., 372., 766., 372.,
372., 766.])
minibatch.deg[0]
Out[116]: 3.0
set(minibatch.adj[0])
Out[117]: {372.0, 766.0, 1101.0}
節(jié)點0的鄰居只有三個,adj對其進行了采樣到128個贰逾,在看測試數(shù)據(jù)部分construct_test_adj
def construct_test_adj(self):
adj = len(self.id2idx) * np.ones((len(self.id2idx) + 1, self.max_degree))
for nodeid in self.G.nodes():
neighbors = np.array([self.id2idx[neighbor]
for neighbor in self.G.neighbors(nodeid)])
if len(neighbors) == 0:
continue
if len(neighbors) > self.max_degree:
neighbors = np.random.choice(neighbors, self.max_degree, replace=False)
elif len(neighbors) < self.max_degree:
neighbors = np.random.choice(neighbors, self.max_degree, replace=True)
adj[self.id2idx[nodeid], :] = neighbors
return adj
區(qū)別就是不限制節(jié)點的訓(xùn)練測試驗證屬性了悬荣,但是還是要做128的采樣,同時只輸出adj疙剑。來看初始化最后收尾
self.val_nodes = [n for n in self.G.nodes() if self.G.node[n]['val']]
self.test_nodes = [n for n in self.G.nodes() if self.G.node[n]['test']]
self.no_train_nodes_set = set(self.val_nodes + self.test_nodes)
self.train_nodes = set(G.nodes()).difference(self.no_train_nodes_set)
# don't train on nodes that only have edges to test set
self.train_nodes = [n for n in self.train_nodes if self.deg[id2idx[n]] > 0]
分別獲得val_nodes
節(jié)點氯迂,test_nodes
節(jié)點,train_nodes
節(jié)點言缤,并且把在訓(xùn)練集中沒有邊的節(jié)點去掉嚼蚀。
繼續(xù)回到train函數(shù),作者定義了占位符和minibatch.adj維度(56945, 128)一致管挟,將他賦值為tensor變量轿曙,不隨模型迭代而改變,命名為adj_info
adj_info_ph = tf.placeholder(tf.int32, shape=minibatch.adj.shape)
adj_info = tf.Variable(adj_info_ph, trainable=False, name="adj_info")
GraphSAGE采樣
下面到了采樣部分僻孝,默認采用graphsage_mean
模式
if FLAGS.model == 'graphsage_mean':
# Create model
sampler = UniformNeighborSampler(adj_info)
if FLAGS.samples_3 != 0:
layer_infos = [SAGEInfo("node", sampler, FLAGS.samples_1, FLAGS.dim_1),
SAGEInfo("node", sampler, FLAGS.samples_2, FLAGS.dim_2),
SAGEInfo("node", sampler, FLAGS.samples_3, FLAGS.dim_2)]
elif FLAGS.samples_2 != 0:
layer_infos = [SAGEInfo("node", sampler, FLAGS.samples_1, FLAGS.dim_1),
SAGEInfo("node", sampler, FLAGS.samples_2, FLAGS.dim_2)]
else:
layer_infos = [SAGEInfo("node", sampler, FLAGS.samples_1, FLAGS.dim_1)]
跟進UniformNeighborSampler
class UniformNeighborSampler(Layer):
"""
Uniformly samples neighbors.
Assumes that adj lists are padded with random re-sampling
"""
def __init__(self, adj_info, **kwargs):
super(UniformNeighborSampler, self).__init__(**kwargs)
self.adj_info = adj_info
def _call(self, inputs):
ids, num_samples = inputs
adj_lists = tf.nn.embedding_lookup(self.adj_info, ids)
adj_lists = tf.transpose(tf.random_shuffle(tf.transpose(adj_lists)))
adj_lists = tf.slice(adj_lists, [0, 0], [-1, num_samples])
return adj_lists
初始化傳入adj_info导帝,_call復(fù)寫父類Layer,目前還沒有被調(diào)用穿铆,先接著往下看
if FLAGS.samples_3 != 0:
layer_infos = [SAGEInfo("node", sampler, FLAGS.samples_1, FLAGS.dim_1),
SAGEInfo("node", sampler, FLAGS.samples_2, FLAGS.dim_2),
SAGEInfo("node", sampler, FLAGS.samples_3, FLAGS.dim_2)]
elif FLAGS.samples_2 != 0:
layer_infos = [SAGEInfo("node", sampler, FLAGS.samples_1, FLAGS.dim_1),
SAGEInfo("node", sampler, FLAGS.samples_2, FLAGS.dim_2)]
else:
layer_infos = [SAGEInfo("node", sampler, FLAGS.samples_1, FLAGS.dim_1)]
其中samples_3
默認等于0您单,samples_2=10
,samples_1=25
荞雏,繼續(xù)緊跟SAGEInfo
SAGEInfo = namedtuple("SAGEInfo",
['layer_name', # name of the layer (to get feature embedding etc.)
'neigh_sampler', # callable neigh_sampler constructor
'num_samples',
'output_dim' # the output (i.e., hidden) dimension
])
作者定義了一個namedtuple用于存儲模型層信息虐秦,namedtuple類似于字典可以將一組不可變的變量集合起來進行統(tǒng)一存放和讀取平酿。例如
a = SAGEInfo(layer_name=1, neigh_sampler=2, num_samples=3, output_dim=4)
可以使用.
或者依舊使用元祖的索引方式取到值
a[0]
Out[156]: 1
a.neigh_sampler
Out[157]: 2
因此如下代碼定義了2層,第一層的layer_name=node悦陋,取樣器為sampler染服,鄰居數(shù)25,輸出維度128叨恨;第二層layer_name=node,取樣器為sampler挖垛,鄰居數(shù)10痒钝,輸出維度128。
layer_infos = [SAGEInfo("node", sampler, FLAGS.samples_1, FLAGS.dim_1), # 128
SAGEInfo("node", sampler, FLAGS.samples_2, FLAGS.dim_2)] # 128
模型構(gòu)建
下面來看模型SupervisedGraphsage
model = SupervisedGraphsage(num_classes, placeholders,
features,
adj_info,
minibatch.deg,
layer_infos,
model_size=FLAGS.model_size,
sigmoid_loss=FLAGS.sigmoid,
identity_dim=FLAGS.identity_dim,
logging=True)
追進看
if aggregator_type == "mean":
self.aggregator_cls = MeanAggregator
MeanAggregator
是一個繼承Layer的類用來均值聚合痢毒,先放一邊送矩,下面就是一幫數(shù)據(jù)初始化
# get info from placeholders...
self.inputs1 = placeholders["batch"] # tf.placeholder(tf.int32, shape=(None), name='batch1')
self.model_size = model_size # small
self.adj_info = adj # tf.Variable(adj_info_ph, trainable=False, name="adj_info")
if identity_dim > 0: # 0
self.embeds = tf.get_variable("node_embeddings", [adj.get_shape().as_list()[0], identity_dim])
else:
self.embeds = None
if features is None:
if identity_dim == 0:
raise Exception("Must have a positive value for identity feature dimension if no input features given.")
self.features = self.embeds
else:
self.features = tf.Variable(tf.constant(features, dtype=tf.float32), trainable=False) # np.vstack([features, np.zeros((features.shape[1],))])
if not self.embeds is None:
# is None
self.features = tf.concat([self.embeds, self.features], axis=1)
self.degrees = degrees
self.concat = concat # True
self.num_classes = num_classes # 121
self.sigmoid_loss = sigmoid_loss # False
self.dims = [(0 if features is None else features.shape[1]) + identity_dim] # 50
self.dims.extend([layer_infos[i].output_dim for i in range(len(layer_infos))]) # range(2),[50,128,128]
self.batch_size = placeholders["batch_size"] # tf.placeholder(tf.int32, name='batch_size')
self.placeholders = placeholders
self.layer_infos = layer_infos
self.optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate) # 0.01
self.build()
看build,第一行我就嚇了一跳
samples1, support_sizes1 = self.sample(self.inputs1, self.layer_infos)
采樣模塊
sample
方法繼承父類models.SampleAndAggregate
哪替,輸入輸入的批次數(shù)據(jù)和層信息
def sample(self, inputs, layer_infos, batch_size=None):
""" Sample neighbors to be the supportive fields for multi-layer convolutions.
Args:
inputs: batch inputs
batch_size: the number of inputs (different for batch inputs and negative samples).
"""
if batch_size is None:
batch_size = self.batch_size
samples = [inputs] # [tf.placeholder(tf.int32, shape=(None), name='batch1')]
# size of convolution support at each layer per node
support_size = 1
support_sizes = [support_size]
for k in range(len(layer_infos)): # range(2)
t = len(layer_infos) - k - 1 # 1, 0
support_size *= layer_infos[t].num_samples # 10, 25
sampler = layer_infos[t].neigh_sampler # 采樣器
# 開始調(diào)用__call__
# [tf.placeholder(tf.int32, shape=(None), name='batch1')]
# 10
node = sampler((samples[k], layer_infos[t].num_samples))
# 采樣的節(jié)點納入輸入數(shù)據(jù)
# support_size * batch_size將所有節(jié)點全部鋪開
samples.append(tf.reshape(node, [support_size * batch_size, ]))
support_sizes.append(support_size)
return samples, support_sizes
這個里面在根據(jù)輸入節(jié)點數(shù)據(jù)做2層卷積和1層卷積采樣栋荸,將2層采樣得到的鄰居節(jié)點加入數(shù)據(jù)集,第一層以第二層的節(jié)點為中心節(jié)點再找一層25個近鄰凭舶,這個地方和下面這個公式已經(jīng)投起來了
以作者的例子來看晌块,最終模型是一個兩層的模型,每個中心節(jié)點是兩層之后的特征表示帅霜,為了獲得每個節(jié)點2階之后的聚合表達匆背,必定要先拿到k-1層
(第一階之后)的鄰居節(jié)點的特征表示在k層
第二層進行操作,第二層采樣數(shù)量是10身冀,因此先找到輸入節(jié)點采樣鄰居10個钝尸,這樣k-1層的節(jié)點就獲得了,下一步為了獲得k-1層的特征表示必須獲得k-2層
(第一階之前)的鄰居節(jié)點做第一階的聚合搂根,第一階的采樣是25珍促,因此以k-1層采樣的節(jié)點為中心,取每個鄰居25剩愧。畫個示意圖如下:
下面來看采樣的實現(xiàn)猪叙,直接看sampler的_call函數(shù)
def _call(self, inputs):
# 輸入數(shù)據(jù),取樣個數(shù)
ids, num_samples = inputs
# 獲得節(jié)點采樣的128個鄰居列表
adj_lists = tf.nn.embedding_lookup(self.adj_info, ids)
# 行內(nèi)統(tǒng)一shuffle算子亂序隙咸,各個節(jié)點的沐悦;鄰居統(tǒng)一亂序
adj_lists = tf.transpose(tf.random_shuffle(tf.transpose(adj_lists)))
# 截取,從左頂點開始五督,截取所有行藏否,25列
adj_lists = tf.slice(adj_lists, [0, 0], [-1, num_samples]) # 25
return adj_lists
輸入的inputs是一個元組,首先根據(jù)tf.nn.embedding_lookup
拿到了該批次輸入數(shù)據(jù)ids的鄰居列表充包,因為embedding_lookup表是之前構(gòu)建好的各節(jié)點的鄰居列表minibatch.adj的占位符副签,這也解釋了之前創(chuàng)建minibatch.adj鄰居列表的目的遥椿,如果鄰居不足采樣Sk就會重復(fù)抽樣。下一步通過tf.transpose
+tf.random_shuffle
對每個節(jié)點的鄰居列表進行shuffle淆储,最后通過tf.slice
截取每個節(jié)點128個鄰居的前25個(num_samples冠场,第一層25個,第二層10個)本砰,結(jié)果返回鄰居列表adj_lists
碴裙。在回到sample函數(shù),第一層拿到鄰居之后有新增了batch_size×25
個節(jié)點点额,因此samples在append的時候需要reshape[support_size * batch_size, ]
打回原形舔株,第二層在第一層的基礎(chǔ)上再做embedding_lookup,始終保證tf.slice處理一個二維矩陣
samples.append(tf.reshape(node, [support_size * batch_size, ]))
第二層的節(jié)點再錯reshape統(tǒng)一維度之后append到samples中还棱,最后輸出samples
,support_sizes
载慈,分別對應(yīng)[inputs, node_layer_2, node_layer_1]
, [1, 10, 250]
。
聚合模塊
下面繼續(xù)看build構(gòu)建模型
# [25, 10]
num_samples = [layer_info.num_samples for layer_info in self.layer_infos]
self.outputs1, self.aggregators = self.aggregate(samples1, [self.features], self.dims, num_samples,
support_sizes1, concat=self.concat, model_size=self.model_size)
看輸出的變量名是拿到了聚合對象和聚合之后的節(jié)點表示珍手,跟進aggregate
def aggregate(self, samples, input_features, dims, num_samples, support_sizes, batch_size=None,
aggregators=None, name=None, concat=False, model_size="small"):
""" At each layer, aggregate hidden representations of neighbors to compute the hidden representations
at next layer.
Args:
samples: a list of samples of variable hops away for convolving at each layer of the
network. Length is the number of layers + 1. Each is a vector of node indices.
input_features: the input features for each sample of various hops away.
dims: a list of dimensions of the hidden representations from the input layer to the
final layer. Length is the number of layers + 1.
num_samples: list of number of samples for each layer.
support_sizes: the number of nodes to gather information from for each layer.
batch_size: the number of inputs (different for batch inputs and negative samples).
Returns:
The hidden representation at the final layer for all nodes in batch
"""
先看函數(shù)的注釋办铡,這個函數(shù)返回了該批次下所有節(jié)點在最后一層
的節(jié)點表征,其他參數(shù)如下:
-
samples
:list格式琳要,記錄每層的節(jié)點寡具,list長度等于卷積層數(shù)+1,list中每個元素是幾點索引組成的向量稚补,案例中是[inputs, node_layer_2, node_layer_1] -
input_features
:記錄不同跳距的每個節(jié)點的特征向量 -
dims
:list格式晒杈,記錄從原始輸入到每一層卷積后節(jié)點的表征維度,案例中默認是[50,128,128]孔厉,長度是卷積層數(shù)+1 -
num_samples
:list格式拯钻,記錄每一層卷積采樣個數(shù),案例中是[25, 10] -
support_sizes
:list格式撰豺,表示為每層收集信息的節(jié)點數(shù)粪般,案例中是[1, 10, 250]
下面看該函數(shù)的實現(xiàn)部分,先對samples的每一層做了embedding_lookup拿到每一層的features污桦,分別是批次中中心節(jié)點的features亩歹,執(zhí)行2層卷積的鄰居節(jié)點features,執(zhí)行1層卷積的鄰居節(jié)點features
# length: number of layers + 1
hidden = [tf.nn.embedding_lookup(input_features, node_samples) for node_samples in samples]
下面判斷是否傳入aggregators對象凡橱,否則新建一個aggregators列表
new_agg = aggregators is None
if new_agg:
aggregators = []
下面看迭代部分小作,一共2層迭代計算兩次
for layer in range(len(num_samples)): # [25, 10], 2
if new_agg:
# concat=True, layer=1, dim_mult = 1
dim_mult = 2 if concat and (layer != 0) else 1
# aggregator at current layer
if layer == len(num_samples) - 1: # 1
aggregator = self.aggregator_cls(dim_mult * dims[layer], dims[layer + 1], act=lambda x: x,
dropout=self.placeholders['dropout'],
name=name, concat=concat, model_size=model_size)
else:
# layer = 0,dim_mult=1,dims[layer]=50,dims[layer + 1]=128
aggregator = self.aggregator_cls(dim_mult * dims[layer], dims[layer + 1],
dropout=self.placeholders['dropout'], # tf.placeholder_with_default(0., shape=(), name='dropout')
name=name, concat=concat, model_size=model_size)
aggregators.append(aggregator)
else:
aggregator = aggregators[layer]
這個地方通過各種判斷實例化了aggregator
對象,在本案例中由于在實例化SupervisedGraphsage
使用了默認的aggregator_type="mean"
因此此處aggregator是MeanAggregator
if aggregator_type == "mean":
self.aggregator_cls = MeanAggregator
下面重點來看計算更新過程
next_hidden = []
# as layer increases, the number of support nodes needed decreases
for hop in range(len(num_samples) - layer): # range(2)
# dim_mult = 1
dim_mult = 2 if concat and (layer != 0) else 1
neigh_dims = [batch_size * support_sizes[hop], # [1, 10, 250][0]
num_samples[len(num_samples) - hop - 1], # num_samples[1] = 10
dim_mult * dims[layer]] # 50
# hidden[0]=input的features, hidden[1]=node_layer_2的features, batch_size,10,50
h = aggregator((hidden[hop],
tf.reshape(hidden[hop + 1], neigh_dims)))
next_hidden.append(h)
hidden = next_hidden
開始有點看不懂了稼钩,跟進MeanAggregator的_call
def _call(self, inputs):
# hidden[0]=input的features, reshape(hidden[1]=node_layer_2的features, batch_size,10,50)
# 自己的特征向量顾稀,鄰居的特征向量
self_vecs, neigh_vecs = inputs
neigh_vecs = tf.nn.dropout(neigh_vecs, 1 - self.dropout)
self_vecs = tf.nn.dropout(self_vecs, 1 - self.dropout)
# 鄰居的特征向量聚合求均值
neigh_means = tf.reduce_mean(neigh_vecs, axis=1)
# [nodes] x [out_dim]
# 鄰居節(jié)點聚合均值之后的W
from_neighs = tf.matmul(neigh_means, self.vars['neigh_weights'])
# 自身節(jié)點的W
from_self = tf.matmul(self_vecs, self.vars["self_weights"])
if not self.concat: # concat=True
output = tf.add_n([from_self, from_neighs])
else: # ok
output = tf.concat([from_self, from_neighs], axis=1)
# bias
if self.bias:
output += self.vars['bias']
return self.act(output)
看了這個_call之后大概知道它在干嘛了,這一步在做鄰居聚合求均值
=>W×鄰居
=>W×自身
=>向量concat
=>激活函數(shù)輸出
坝撑,對應(yīng)公式中這段內(nèi)容
單獨看一下兩個W權(quán)重部分
# self.name=layer_1
with tf.variable_scope(self.name + name + '_vars'):
# neigh_input_dim=50, output_dim=128
self.vars['neigh_weights'] = glorot([neigh_input_dim, output_dim],
name='neigh_weights')
# neigh_input_dim=50, output_dim=128
self.vars['self_weights'] = glorot([input_dim, output_dim],
name='self_weights')
if self.bias: # False
self.vars['bias'] = zeros([self.output_dim], name='bias')
作者的代碼實現(xiàn)看起來和論文不太一樣静秆,作者是分別給鄰居和自身的特征向量乘以了一個單獨的W再concat粮揉,而論文中是先concat再一起給一個W,并且論文中最后激活函數(shù)后給了L2歸一化抚笔,但是作者沒有這個邏輯扶认。
繼續(xù)分析這段代碼多糠,這里是整個項目的重中之重,姨伤,鎖定layer=0
仑乌,此時dim_mult * dims[layer], dims[layer + 1]分別是50和128翅睛,輸入是50維
,輸出是128維
郭卫,再看hop政勃,當(dāng)hop=0
時
h = aggregator((hidden[hop],
tf.reshape(hidden[hop + 1], neigh_dims)))
next_hidden.append(h)
卷積層對原始節(jié)點
和周邊鄰居(采樣10)
做了一次卷積聚合操作屈留,計算出h0
存入next_hidden鸽疾,當(dāng)hop=1時,卷積層對第二階的節(jié)點
和周圍鄰居節(jié)點(采樣25)
做了一次卷積聚合操作训貌,此時是用的同一個聚合器制肮,就是說聚合器的W權(quán)重是共享
的,計算出h1
存入next_hidden递沪,將next_hidden更新為hidden給下一層計算使用豺鼻,在下一層中layer=1
,此時dim_mult=2
導(dǎo)致輸入dim_mult * dims[layer]變成1282因為在上一步中做了concat*款慨,輸出維度不變還是128儒飒,第二層卷積最終沒有激活函數(shù)看樣子要直接輸出了。在layer=1的情況下hop只能等于0
h = aggregator((hidden[hop],
tf.reshape(hidden[hop + 1], neigh_dims)))
next_hidden.append(h)
此時上邊這段代碼的含義就是以上一層卷積之后的中心節(jié)點向量作為自身檩奠,以上一層卷積之后的鄰居節(jié)點向量作為鄰居桩了,再做一次卷積,輸出得到中心節(jié)點的最終向量表示埠戳,存入next_hidden更新到hidden井誉,最終輸出的hidden[0]
就是全部中心節(jié)點的最終特征向量
,這個特征向量長度應(yīng)該是128 × 2
整胃,aggregators返回了第一層卷積和第二層卷積的MeanAggregator對象颗圣。用圖示表示一下這個過程就是這樣的
再接著看build下面就輕松了
self.outputs1 = tf.nn.l2_normalize(self.outputs1, 1)
dim_mult = 2 if self.concat else 1 # 2
# 128 * 2, 121,沒有激活函數(shù)
self.node_pred = layers.Dense(dim_mult * self.dims[-1], self.num_classes,
dropout=self.placeholders['dropout'],
act=lambda x: x)
# TF graph management
self.node_preds = self.node_pred(self.outputs1)
作者在第二層后面加入了L2歸一化,然后加入全連接
屁使,輸入是128 × 2在岂, 輸出是121,沒有激活函數(shù)蛮寂,相當(dāng)于回歸預(yù)測蔽午,也是調(diào)用的layers下的Dense的_call函數(shù),這個Dense里面會做dropout酬蹋。
loss模塊
到這里模型看完了可以休息一下祠丝,半條命已經(jīng)看掉了疾呻,下面看loss模塊,這個模塊也相當(dāng)重要啊写半。
# self._loss()
def _loss(self):
# Weight decay loss
for aggregator in self.aggregators:
for var in aggregator.vars.values():
self.loss += FLAGS.weight_decay * tf.nn.l2_loss(var)
for var in self.node_pred.vars.values():
self.loss += FLAGS.weight_decay * tf.nn.l2_loss(var)
# classification loss
if self.sigmoid_loss:
self.loss += tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=self.node_preds,
labels=self.placeholders['labels']))
else:
self.loss += tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=self.node_preds,
labels=self.placeholders['labels']))
tf.summary.scalar('loss', self.loss)
先簡單來看上面都是給卷積部分和全連接部分的W假的l2 loss岸蜗,下面是殘差,用的是softmax的交叉熵叠蝇,這個地方是label會有多個1璃岳,試一下softmax_cross_entropy_with_logits能否用于這種情況
import numpy as np
A = np.array([[1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]])
with tf.Session() as sess:
print(sess.run(tf.nn.softmax(A)))
print(sess.run(tf.nn.softmax_cross_entropy_with_logits(
labels=[[0, 0, 1, 0, 0, 1],
[0, 0, 0, 1, 0, 1]],
logits=A)))
# [3.91238663 2.91238663]
可以沒啥問題,照算無誤悔捶,下面看模型層的vars部分铃慷,卷積部分的W就是neigh_weights
,self_weights
蜕该,bias
(默認沒有)犁柜,由此可見同一層的aggregator是權(quán)重共享
的。全連接的vars是weights
和bais
(默認有)堂淡,最后的loss是所有W的l2 loss+softmax殘差loss馋缅。
優(yōu)化器模塊
優(yōu)化器模塊使用了
grads_and_vars = self.optimizer.compute_gradients(self.loss)
clipped_grads_and_vars = [(tf.clip_by_value(grad, -5.0, 5.0) if grad is not None else None, var)
for grad, var in grads_and_vars]
self.grad, _ = clipped_grads_and_vars[0]
self.opt_op = self.optimizer.apply_gradients(clipped_grads_and_vars)
這個地方手動計算出梯度,并且對梯度進行了修剪
绢淀,限制了梯度的最大最小值萤悴,最終優(yōu)化的目標(biāo)是這個optimizer.apply_gradients(clipped_grads_and_vars)
預(yù)測模塊
預(yù)測模塊比較簡單,使用softmax拿到全連接之后的預(yù)測值
def predict(self):
if self.sigmoid_loss:
return tf.nn.sigmoid(self.node_preds)
else:
return tf.nn.softmax(self.node_preds)
ok到現(xiàn)在為止皆的,整個模型網(wǎng)絡(luò)已經(jīng)構(gòu)建完成了覆履。
模型訓(xùn)練
一開始指定了一些tensorflow的配置
# 記錄設(shè)備指派情況
config = tf.ConfigProto(log_device_placement=FLAGS.log_device_placement) # False
# 控制顯存動態(tài)增長
config.gpu_options.allow_growth = True
# config.gpu_options.per_process_gpu_memory_fraction = GPU_MEM_FRACTION
# 如果你指定的設(shè)備不存在,允許TF自動分配設(shè)
config.allow_soft_placement = True
下面創(chuàng)建Session并且對summary做準備寫入文件地址
# Initialize session
sess = tf.Session(config=config)
merged = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter(log_dir(), sess.graph)
下面這一步神奇了
# Init variables
sess.run(tf.global_variables_initializer(), feed_dict={adj_info_ph: minibatch.adj})
本來就對上面的這一段代碼有疑問费薄,作者設(shè)置了一個占位符硝全,又賦值給一個tf.Variable
adj_info_ph = tf.placeholder(tf.int32, shape=minibatch.adj.shape)
adj_info = tf.Variable(adj_info_ph, trainable=False, name="adj_info")
這樣直接在Session中run adj_info feed_dict adj_info_ph是直接報錯的,所以作者的這種寫法給出了答案楞抡,將placeholder直接在runtf.global_variables_initializer()
的時候feed_dict傳值柳沙,測試一下
tf.reset_default_graph()
adj_info_ph = tf.placeholder(tf.int32, shape=(2, 2))
adj_info = tf.Variable(adj_info_ph, trainable=False)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer(), feed_dict={adj_info_ph: np.array([[2, 2], [1, 3]])})
print(sess.run(adj_info)
[[2 2]
[1 3]]
太強了作者真乃天人也。這下在global_variables_initializer之后直接run對應(yīng)的tf.Variable即可不需要再傳值了拌倍,相當(dāng)于這個傳入的是全局變量
赂鲤,每個epoch下是不變的。換成以下這種寫法直接報錯
tf.reset_default_graph()
adj_info_ph = tf.placeholder(tf.int32, shape=(2, 2))
adj_info = tf.Variable(adj_info_ph, trainable=False)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(adj_info, feed_dict={adj_info_ph: np.array([[2, 2], [1, 3]])}))
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype int32 and shape [2,2]
因此初始化adj_info必須使用global_variables_initializer
在全局指定
下面定義了小東西和訓(xùn)練的鄰接列表和測試的鄰接列表柱恤,這個地方相當(dāng)于對adj_info重新賦值給新的變量数初,原始的adj_info值不變。
下面看真正批次迭代訓(xùn)練部分
for epoch in range(FLAGS.epochs): # 10
minibatch.shuffle()
iter = 0
print('Epoch: %04d' % (epoch + 1))
epoch_val_costs.append(0)
while not minibatch.end():
# Construct feed dictionary
feed_dict, labels = minibatch.next_minibatch_feed_dict()
此時輪到minibatch顯神威了梗顺,只見他先shuffle
了一下泡孩,跟進一下
def shuffle(self):
""" Re-shuffle the training set.
Also reset the batch number.
"""
self.train_nodes = np.random.permutation(self.train_nodes)
self.batch_num = 0
其中train_nodes是一個列表,列表中記錄了訓(xùn)練節(jié)點的索引值寺谤,這個地方使用np.random.permutation
打亂列表順序仑鸥,并將batch_num重置為0吮播。
下面開始判斷minibatch.end()
,顯然這個是False眼俊,看下代碼
def end(self):
return self.batch_num * self.batch_size >= len(self.train_nodes)
其中batch_num初始化之后是0意狠,batch_size實在實例化NodeMinibatchIterator
中設(shè)置512,len(train_nodes)=44906疮胖,顯然作者就是判斷多輪batch_num之后是否超出train_nodes的長度环戈,如果沒有都可以繼續(xù)采小批量,訓(xùn)練樣本還可以再利用澎灸。
下面重點來了minibatch.next_minibatch_feed_dict()
這個函數(shù)肯定是生成小批量的院塞,跟進
def next_minibatch_feed_dict(self):
start_idx = self.batch_num * self.batch_size # 0 * 512 = 0
self.batch_num += 1 # 1
end_idx = min(start_idx + self.batch_size, len(self.train_nodes)) # min(512, 44906)
batch_nodes = self.train_nodes[start_idx: end_idx]
return self.batch_feed_dict(batch_nodes)
這個函數(shù)很好理解,實際上就是獲得開始索引和結(jié)束索引性昭,在train_nodes上順序滑動取批量數(shù)據(jù)拦止,end_idx如果超出了train_nodes最大值則取train_nodes最后一個值的索引,跟進batch_feed_dict
def batch_feed_dict(self, batch_nodes, val=False):
batch1id = batch_nodes
batch1 = [self.id2idx[n] for n in batch1id]
# 獲得每個節(jié)點的label
labels = np.vstack([self._make_label_vec(node) for node in batch1id])
feed_dict = dict()
feed_dict.update({self.placeholders['batch_size']: len(batch1)})
feed_dict.update({self.placeholders['batch']: batch1})
feed_dict.update({self.placeholders['labels']: labels})
return feed_dict, labels
這個也很好理解糜颠,這個函數(shù)里面獲得了labels汹族,這個函數(shù)_make_label_vec
就是通過label_map
獲得了node對應(yīng)的label表示,是一個multihot的array格式括蝠。最后將實際的batch_size
,batch
饭聚,labels
賦值給對應(yīng)的占位符忌警,返回feed_dict和實際的labels矩陣。
總結(jié)一下沒調(diào)用一次minibatch.next_minibatch_feed_dict()
秒梳,就會在train_nodes中順序滑動512個節(jié)點進入批次訓(xùn)練法绵,因此最終的輸入是節(jié)點索引的列表
。每次調(diào)用之前先用minibatch.end()
判斷酪碘,如果train_nodes全部用完就會跳出循環(huán)重新下一輪迭代整個訓(xùn)練樣本重來朋譬。
下面繼續(xù)看訓(xùn)練部分
feed_dict.update({placeholders['dropout']: FLAGS.dropout}) # 0
所有模型中卷積和全連接的dropout都是0,下面開始訓(xùn)練
# Training step
outs = sess.run([merged, model.opt_op, model.loss, model.preds], feed_dict=feed_dict)
train_cost = outs[2]
主要是基于feed_dict的批次節(jié)點列表和labels值訓(xùn)練model.opt_op
兴垦,得到model.loss
徙赢。下面開始驗證部分
if iter % FLAGS.validate_iter == 0: # 5000
# Validation
sess.run(val_adj_info.op)
if FLAGS.validate_batch_size == -1:
val_cost, val_f1_mic, val_f1_mac, duration = incremental_evaluate(sess, model, minibatch,
FLAGS.batch_size)
else:
val_cost, val_f1_mic, val_f1_mac, duration = evaluate(sess, model, minibatch,
FLAGS.validate_batch_size)
sess.run(train_adj_info.op)
epoch_val_costs[-1] += val_cost
if total_steps % FLAGS.print_every == 0:
summary_writer.add_summary(outs[0], total_steps)
每隔5000個iter驗證一下sess.run(val_adj_info.op)
,這個是什么意思探越?默認的validate_batch_size
是256狡赐,直接看這個
val_cost, val_f1_mic, val_f1_mac, duration = evaluate(sess, model, minibatch, FLAGS.validate_batch_size)
跟進evaluate
def evaluate(sess, model, minibatch_iter, size=None):
t_test = time.time()
feed_dict_val, labels = minibatch_iter.node_val_feed_dict(size)
node_outs_val = sess.run([model.preds, model.loss],
feed_dict=feed_dict_val)
mic, mac = calc_f1(labels, node_outs_val[0])
return node_outs_val[1], mic, mac, (time.time() - t_test)
其中minibatch_iter
就是minibatch,直接看minibatch_iter.node_val_feed_dict(256)
def node_val_feed_dict(self, size=None, test=False):
if test:
val_nodes = self.test_nodes
else:
val_nodes = self.val_nodes
if not size is None:
val_nodes = np.random.choice(val_nodes, size, replace=True)
# add a dummy neighbor
ret_val = self.batch_feed_dict(val_nodes)
return ret_val[0], ret_val[1]
這個好理解钦幔,從val_nodes
中有放回地抽取256個節(jié)點索引枕屉,調(diào)用batch_feed_dict
返回feed_dict和labels列表。下一步驗證部分直接預(yù)測拿softmax和loss
node_outs_val = sess.run([model.preds, model.loss],
feed_dict=feed_dict_val)
下一步計算micro_f1
和macro_f1
mic, mac = calc_f1(labels, node_outs_val[0])
看一下這個函數(shù)calc_f1
有點意思
def calc_f1(y_true, y_pred):
if not FLAGS.sigmoid:
y_true = np.argmax(y_true, axis=1)
y_pred = np.argmax(y_pred, axis=1)
else:
y_pred[y_pred > 0.5] = 1
y_pred[y_pred <= 0.5] = 0
return metrics.f1_score(y_true, y_pred, average="micro"), metrics.f1_score(y_true, y_pred, average="macro")
其中這兩步鲤氢,直接把121的每個元素置為0或者1搀擂,在調(diào)用metrics.f1_score
的時候?qū)嶋H上
y_pred[y_pred > 0.5] = 1
y_pred[y_pred <= 0.5] = 0
相當(dāng)于作者最后用序列中每個位置預(yù)測1的F1值作為評價指標(biāo)西潘,測試一下sklearn.metrics的f1_score
metrics.f1_score([[1, 0, 0], [1, 0, 1]], [[0, 0, 0], [1, 1, 1]], average="micro")
Out[210]: 0.6666666666666666
首先看精確率,右邊是預(yù)測的哨颂,預(yù)測3個1喷市,其中第一個和第三個是對的,精確率2/3咆蒿,再看召回率东抹,一共3個1,其中第二個和第三個被命中了沃测,召回率是2/3缭黔,因此f1=0.6666,micro表示正規(guī)的計算方式蒂破,macro會對樣本分布中的0,1加權(quán)馏谨。顯然,如果預(yù)測的multihot和實際的multihot接近附迷,f1值應(yīng)該越高惧互。因此整體上evaluate
返回了驗證的loss,兩個f1值以及耗時喇伯。
最終驗證完畢loss會自增到之前的0上
sess.run(train_adj_info.op)
epoch_val_costs[-1] += val_cost
然后每隔5個total_steps
喊儡,記錄一次summary
if total_steps % FLAGS.print_every == 0: # 5
summary_writer.add_summary(outs[0], total_steps)
然后直接看格式化輸出部分
if total_steps % FLAGS.print_every == 0:
train_f1_mic, train_f1_mac = calc_f1(labels, outs[-1])
print("Iter:", '%04d' % iter,
"train_loss=", "{:.5f}".format(train_cost),
"train_f1_mic=", "{:.5f}".format(train_f1_mic),
"train_f1_mac=", "{:.5f}".format(train_f1_mac),
"val_loss=", "{:.5f}".format(val_cost),
"val_f1_mic=", "{:.5f}".format(val_f1_mic),
"val_f1_mac=", "{:.5f}".format(val_f1_mac),
"time=", "{:.5f}".format(avg_time))
iter += 1
total_steps += 1
if total_steps > FLAGS.max_total_steps: # 10000000000
break
if total_steps > FLAGS.max_total_steps: # 10000000000
break
作者定義了兩個計數(shù)器iter
和total_steps
,其中iter是單個Epoch內(nèi)的稻据,可以算出iter的最大值是44906 / 512=87左右艾猜。total_steps是總共的,模型每訓(xùn)練一個batch這兩個計數(shù)器都+1捻悯。由于FLAGS.validate_iter
設(shè)置了5000匆赃,所以iter % FLAGS.validate_iter == 0
永遠不成立,因此只有iter=0的時候首次驗證一次今缚,下一次驗證只能等下一個Epoch算柳。所以每個Epoch下train的信息5輪迭代輸出一次,而val信息一輪之內(nèi)都是一樣的姓言。
Iter: 0066 train_loss= 0.46198 train_f1_mic= 0.56420 train_f1_mac= 0.38442 val_loss= 0.47892 val_f1_mic= 0.58327 val_f1_mac= 0.42079 time= 0.07254
Iter: 0071 train_loss= 0.46786 train_f1_mic= 0.56934 train_f1_mac= 0.41866 val_loss= 0.47892 val_f1_mic= 0.58327 val_f1_mac= 0.42079 time= 0.07247
Iter: 0076 train_loss= 0.45099 train_f1_mic= 0.58254 train_f1_mac= 0.41838 val_loss= 0.47892 val_f1_mic= 0.58327 val_f1_mac= 0.42079 time= 0.07238
Iter: 0081 train_loss= 0.46226 train_f1_mic= 0.57681 train_f1_mac= 0.41835 val_loss= 0.47892 val_f1_mic= 0.58327 val_f1_mac= 0.42079 time= 0.07240
Iter: 0086 train_loss= 0.45582 train_f1_mic= 0.60693 train_f1_mac= 0.45930 val_loss= 0.47892 val_f1_mic= 0.58327 val_f1_mac= 0.42079 time= 0.07235
Epoch: 0010
Iter: 0003 train_loss= 0.45136 train_f1_mic= 0.57961 train_f1_mac= 0.42155 val_loss= 0.45077 val_f1_mic= 0.56654 val_f1_mac= 0.38990 time= 0.07236
Iter: 0008 train_loss= 0.46137 train_f1_mic= 0.59752 train_f1_mac= 0.45520 val_loss= 0.45077 val_f1_mic= 0.56654 val_f1_mac= 0.38990 time= 0.07231
Iter: 0013 train_loss= 0.45797 train_f1_mic= 0.57825 train_f1_mac= 0.42601 val_loss= 0.45077 val_f1_mic= 0.56654 val_f1_mac= 0.38990 time= 0.07225
Iter: 0018 train_loss= 0.45655 train_f1_mic= 0.58694 train_f1_mac= 0.42978 val_loss= 0.45077 val_f1_mic= 0.56654 val_f1_mac= 0.38990 time= 0.07218
Iter: 0023 train_loss= 0.46944 train_f1_mic= 0.59518 train_f1_mac= 0.44362 val_loss= 0.45077 val_f1_mic= 0.56654 val_f1_mac= 0.38990 time= 0.07215
下面看一下訓(xùn)練結(jié)束的部分
print("Optimization Finished!")
sess.run(val_adj_info.op)
val_cost, val_f1_mic, val_f1_mac, duration = incremental_evaluate(sess, model, minibatch, FLAGS.batch_size)
print("Full validation stats:",
"loss=", "{:.5f}".format(val_cost),
"f1_micro=", "{:.5f}".format(val_f1_mic),
"f1_macro=", "{:.5f}".format(val_f1_mac),
"time=", "{:.5f}".format(duration))
模型達到設(shè)置的Epoch=10的時候停止瞬项,最后做了一步驗證集的測試,跟進incremental_evaluate
def incremental_evaluate(sess, model, minibatch_iter, size, test=False):
t_test = time.time()
finished = False
val_losses = []
val_preds = []
labels = []
iter_num = 0
finished = False
while not finished:
feed_dict_val, batch_labels, finished, _ = minibatch_iter.incremental_node_val_feed_dict(size, iter_num,
test=test) # False
node_outs_val = sess.run([model.preds, model.loss],
feed_dict=feed_dict_val)
val_preds.append(node_outs_val[0])
labels.append(batch_labels)
val_losses.append(node_outs_val[1])
iter_num += 1
val_preds = np.vstack(val_preds)
labels = np.vstack(labels)
f1_scores = calc_f1(labels, val_preds)
return np.mean(val_losses), f1_scores[0], f1_scores[1], (time.time() - t_test)
跟進incremental_node_val_feed_dict
def incremental_node_val_feed_dict(self, size, iter_num, test=False):
if test:
val_nodes = self.test_nodes
else:
val_nodes = self.val_nodes
# [0: min(512,6514)]
val_node_subset = val_nodes[iter_num * size:min((iter_num + 1) * size,
len(val_nodes))]
# add a dummy neighbor
ret_val = self.batch_feed_dict(val_node_subset)
return ret_val[0], ret_val[1], (iter_num + 1) * size >= len(val_nodes), val_node_subset # False, val_node_subset
這個也很簡單何荚,就是每個512個滑動讀取一下val滥壕,批量預(yù)測,最后整合在一起求f1和平均每個批次的loss兽泣。最后把驗證集和測試集的f1情況和loss情況寫入本地文件
with open(log_dir() + "val_stats.txt", "w") as fp:
fp.write("loss={:.5f} f1_micro={:.5f} f1_macro={:.5f} time={:.5f}".
format(val_cost, val_f1_mic, val_f1_mac, duration))
print("Writing test set stats to file (don't peak!)")
val_cost, val_f1_mic, val_f1_mac, duration = incremental_evaluate(sess, model, minibatch, FLAGS.batch_size,
test=True)
with open(log_dir() + "test_stats.txt", "w") as fp:
fp.write("loss={:.5f} f1_micro={:.5f} f1_macro={:.5f}".
format(val_cost, val_f1_mic, val_f1_mac))
全部流程結(jié)束
代碼反思
(1)數(shù)據(jù)鏈路分析
重新梳理一下輸入數(shù)據(jù)绎橘,以及輸入數(shù)據(jù)怎么在模型內(nèi)部流轉(zhuǎn)的,整個流轉(zhuǎn)就是采樣部分
+神經(jīng)網(wǎng)絡(luò)計算
部分,先看采樣部分
以案例中的一批次512個節(jié)點為例称鳞,采樣部分一次性將1跳鄰居和2跳鄰居全部采樣遍歷了出來涮较,實現(xiàn)是存儲了每個節(jié)點的鄰接列表
,從中間抽取指定數(shù)據(jù)量的鄰居冈止,下一步拿到所有相關(guān)節(jié)點的特征向量狂票,實現(xiàn)是采樣之后拉直
,基于訓(xùn)練集的節(jié)點特征矩陣進行embedding_lookup熙暴,最后每一跳距離下的節(jié)點集合的特征矩陣都是一個二維的闺属。
下面看神經(jīng)網(wǎng)絡(luò)計算部分
整理了之后其實不復(fù)雜,神經(jīng)網(wǎng)絡(luò)的學(xué)習(xí)參數(shù)只有兩層的W周霉,每層的W是權(quán)重共享的掂器,剩下的就是一個全連接,還有一部分大頭是一開始的1跳2跳采樣俱箱,這一塊完全可以和模型隔離国瓮,放在數(shù)據(jù)處理模塊。
(2)使用這個代碼預(yù)測
這個代碼似乎不能在新數(shù)據(jù)上預(yù)測狞谱,原因是作者將鄰接列表傳入tensor乃摹,并且在全局直接初始化好
sess.run(tf.global_variables_initializer(), feed_dict={adj_info_ph: minibatch.adj})
導(dǎo)致在預(yù)測時傳入新的鄰接列表會報錯,issue里面也有這個問題
因此要修改源碼跟衅,只需要將采樣過程搬出來即可孵睬,模型層只有純的兩層卷積和全連接,所有跳的節(jié)點的特征矩陣在進模型之前全部處理好伶跷。
(3)為什么要使用0向量padding
不知道掰读,感覺數(shù)據(jù)集最下面的0向量沒有發(fā)揮作用。