圖神經(jīng)網(wǎng)絡：GCN源代碼完全解讀（tensorflow）

摘要：圖神經(jīng)網(wǎng)絡，GCN，scipy

找了github上搜gcn排名第一的GCN項目分析一下它的代碼實現(xiàn)。

快速開始

git clone下載代碼后簡單地修改調(diào)試一下绣夺，運行train.py

root@ubuntu:/home/git/gcn/gcn# python train.py
Epoch: 0001 train_loss= 1.95334 train_acc= 0.10000 val_loss= 1.95048 val_acc= 0.16400 time= 0.68464
Epoch: 0002 train_loss= 1.94804 train_acc= 0.27857 val_loss= 1.94673 val_acc= 0.37000 time= 0.01166
...
Epoch: 0200 train_loss= 0.56547 train_acc= 0.97857 val_loss= 1.04744 val_acc= 0.78600 time= 0.01720
Optimization Finished!
Test set results: cost= 1.00715 accuracy= 0.81600 time= 0.00546

可以跑，本地的tensorflow版本是1.14.0

數(shù)據(jù)源分析

train.py在上面定義了可配參數(shù)欢揖，接著讀取數(shù)據(jù)源

adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)

看下load_data方法陶耍，該工程數(shù)據(jù)源有3套，每套有8個數(shù)據(jù)文件她混，以文件后綴作為標識物臂，以默認的croa數(shù)據(jù)集為例，包含以下數(shù)據(jù)文件

x：numpy的稀疏矩陣格式产上，size=(140, 1433)，代表訓練集140個節(jié)點的特征向量蛾狗，用稀疏矩陣的原因是特征向量以onehot形式展開
y：numpy array格式晋涣，size=(140, 7)，代表訓練集140個節(jié)點的y值沉桌，以onehot的形式展開谢鹊，有7個類別
tx：numpy的稀疏矩陣格式，size=(1000, 1433)留凭，代表測試集1000個節(jié)點的特征向量
ty：numpy array格式佃扼，size=(1000, 7)，代表測試集1000個節(jié)點的y值
graph：圖關(guān)系蔼夜，字典格式兼耀，key為節(jié)點，value為鄰居列表

Cora數(shù)據(jù)集由機器學習論文組成求冷。這些論文分為以下七個類別之一：基于案例瘤运，遺傳算法，神經(jīng)網(wǎng)絡匠题，概率方法拯坟，強化學習，規(guī)則學習韭山，理論郁季。篩選出引用或被至少一篇其他論文引用（有關(guān)聯(lián)關(guān)系）冷溃，整個語料庫中有2708篇論文，在詞干堵塞和去除詞尾后梦裂，只剩下1433（特征維度）個唯一的單詞似枕，文檔頻率小于10的所有單詞都被刪除夺颤。該數(shù)據(jù)源做GCN的目的是根據(jù)論文的引用關(guān)系（圖）和論文中詞出現(xiàn)的onehot矩陣（特征向量）佑附，預測出論文的類型（節(jié)點分類）莹桅。
load_data函數(shù)內(nèi)部主要是將所有數(shù)據(jù)聚合在一起分割訓練贡耽，驗證和測試架忌，訓練集的索引是從0~140嘴办，驗證集從140~640十艾，測試集從1708～2707芬为，如下代碼

    # 獲得三個數(shù)據(jù)集對應在總特征向量矩陣的索引值
    idx_test = test_idx_range.tolist()
    idx_train = range(len(y))
    idx_val = range(len(y), len(y) + 500)

最終返回所有節(jié)點的鄰接矩陣（nx.adjacency_matrix實現(xiàn)）, 2708個節(jié)點的特征向量（lil_matrix稀疏矩陣）, 訓練派近、驗證攀唯、測試的y值矩陣（帶有mask掩碼）, 以及訓練、驗證渴丸、測試的掩碼侯嘀。

節(jié)點特征處理

下一步進入以下代碼，默認模型是gcn谱轨，該段代碼是在模型構(gòu)建之前將節(jié)點特征向量處理完成

# 將特征從稀疏矩陣戒幔，行歸一化之后，轉(zhuǎn)化成coo稀疏矩陣土童，輸出坐標诗茎，值，shape
features = preprocess_features(features)
if FLAGS.model == 'gcn':
    # 對稱歸一化 D-0.5*A*D-0.5
    support = [preprocess_adj(adj)]
    num_supports = 1
    # 模型設定為GCN
    model_func = GCN

首先看preprocess_features献汗，目的是對節(jié)點的特征向量做行L1歸一化敢订，每一行的和是1，具體實現(xiàn)是創(chuàng)建了一個每一行和的倒數(shù)的對角矩陣乘以特征向量（和度的-1乘X獲得鄰居求和的平均值同理）

def preprocess_features(features):
    """Row-normalize feature matrix and convert to tuple representation"""
    rowsum = np.array(features.sum(1))
    # 和的倒數(shù)
    r_inv = np.power(rowsum, -1).flatten()
    r_inv[np.isinf(r_inv)] = 0.
    r_mat_inv = sp.diags(r_inv)  # 對角陣 (2708, 2708)
    features = r_mat_inv.dot(features)  # 點乘對每一行做行標準化
    return sparse_to_tuple(features)

在標準化之后調(diào)用sparse_to_tuple將特征轉(zhuǎn)化為一個tuple罢吃，跟以下這個函數(shù)

def sparse_to_tuple(sparse_mx):
    """Convert sparse matrix to tuple representation."""
    def to_tuple(mx):
        if not sp.isspmatrix_coo(mx):
            # 轉(zhuǎn)化為coo格式的稀疏矩陣
            # 行列坐標和值
            mx = mx.tocoo()
        coords = np.vstack((mx.row, mx.col)).transpose()
        values = mx.data
        shape = mx.shape
        return coords, values, shape

    if isinstance(sparse_mx, list):
        for i in range(len(sparse_mx)):
            sparse_mx[i] = to_tuple(sparse_mx[i])
    else:
        sparse_mx = to_tuple(sparse_mx)

    return sparse_mx

直接定位到sparse_mx = to_tuple(sparse_mx)這一行再看to_tuple楚午，實際上是將原來的feature從csr_matrix轉(zhuǎn)化為coo_matrix，并且輸出特征向量coords, values, shape（有值位置的坐標尿招，值矾柜，特征向量的shape）三要素作為元組。這里有兩個矩陣就谜，分別是近接矩陣和節(jié)點特性向量矩陣把沼，由于這兩個都是1,0稀疏格式因此采用scipy的稀疏矩陣格式，其中鄰接矩陣采用csr_matrix方便計算對稱歸一化吁伺，而特征矩陣采用的是先lil_matrix方便做行切片饮睬，最后轉(zhuǎn)化為coo_matrix，原因是特征矩陣需要使用占位符placeholder傳入模型內(nèi)部篮奄，而鄰接矩陣是全局共享不變的不需要占位符捆愁，而稀疏站位符tf.sparse_placeholder的格式是（行列索引割去，值，shape）和coo_matrix對應昼丑，因此代碼中最后轉(zhuǎn)化為coo_matrix呻逆。
在對features處理完畢后在看還有兩行代碼

num_supports = 1
model_func = GCN

第二個很明顯采用GCN類作為模型，第一行如果是GCN模式直接寫死是1菩帝，不糾結(jié)咖城。

scipy.sparse的多種稀疏矩陣的區(qū)別

這里主要看一下代碼中用到的三種稀疏向量表示csr_matrix，lil_matrix和coo_matrix

csr_matrix：壓縮稀疏行矩陣呼奢，該種格式常用于稀疏矩陣的運算宜雀，以及高效的行切片操作
lil_matrix：基于行連接存儲的稀疏矩陣，該種格式用于高效地添加握础、刪除辐董、查找元素，同時高效的行切片操作
coo_matrix：坐標格式的矩陣禀综，不同稀疏格式間轉(zhuǎn)換效率高简烘，coo_matrix不支持元素的存取和增刪，一旦創(chuàng)建之后定枷，除了將之轉(zhuǎn)換成其它格式的矩陣孤澎，幾乎無法對其做任何操作和矩陣運算

代碼實操一下先看一下coo_matrix，需要指定值欠窒，坐標亥至，維度三個要素即可確定一個稀疏矩陣，中這方式將稀疏矩陣內(nèi)容拆分贱迟，很明顯方便轉(zhuǎn)化為其他類型，但是不發(fā)進行矩陣計算和切片操作

import scipy.sparse as sp
data = [1, 1, 2]
row = [0, 1, 1]
col = [0, 1, 2]
matrix = sp.coo_matrix((data, (row, col)), shape=(3, 3))
matrix.todense()
# 輸出
matrix([[1, 0, 0],
        [0, 1, 2],
        [0, 0, 0]])

第二個是csr_matrix絮供，data是矩陣的非零值衣吠，indices是和非零值一一對應的所在行的列位置，indptr是總計非零值的個數(shù)壤靶，第一個元素默認是0缚俏，從第二個元素開始記錄每行非零的值個數(shù)，這個再結(jié)合data按照順序就可以確定一個稀疏矩陣

indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
matrix = sp.csr_matrix((data, indices, indptr), shape=(3, 3))
matrix.todense()
# 輸出
matrix([[1, 0, 2],
        [0, 0, 3],
        [4, 5, 6]])

第三個是lil_matrix沒有找到初始化創(chuàng)建的案例贮乳，直接看一下他在切片數(shù)據(jù)之后更新忧换，增加數(shù)據(jù)的威力

import scipy.sparse as sp
data = [1, 1, 2]
row = [0, 1, 1]
col = [0, 1, 2]
matrix = sp.coo_matrix((data, (row, col)), shape=(3, 3))
# 輸出
        [[1, 0, 0],
        [0, 1, 2],
        [0, 0, 0]]
# 轉(zhuǎn)化為lil_matrix
matrix = matrix.tolil()
# 改變某個元素，第0行第2個位置更新為第1行第1個位置
matrix[0, 2] = matrix[1, 1] 
matrix.todense()
# 輸出
        [[1, 0, 1],
        [0, 1, 2],
        [0, 0, 0]]
# 更新指定的多個行
matrix[[0, 2]] = matrix[1]
matrix.todense()
# 輸出
        [[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]]

試一下其他稀疏矩陣能不能完成同樣的更新操作

# 把轉(zhuǎn)化為lil_matrix注釋掉
# matrix = matrix.tolil()
matrix[0, 2] = matrix[1, 1]
TypeError: 'coo_matrix' object is not subscriptable

coo_matrix不行不支持下標向拆，再看一下csr_matrix是可以完成同樣任務的亚茬，但是對比和coo_matrix看一下效率

import time
t1 = time.time()
for i in range(1000):
    data = [1, 1, 2, 10, 100]
    row = [0, 1, 1, 4, 4]
    col = [0, 1, 2, 3, 4]
    matrix = sp.coo_matrix((data, (row, col)), shape=(5, 5))
    # matrix = matrix.tolil()
    matrix = matrix.tocsr()
    matrix[0, 2] = matrix[1, 1]
    matrix[[0, 2]] = matrix[1]
t2 = time.time()
print(t2 - t1)
SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  self._set_intXint(row, col, x.flat[0])

耗時比lil_matrix更高并且已經(jīng)爆出警告更改csr_矩陣的稀疏結(jié)構(gòu)代價高昂。lil_matrix更有效浓恳。這也是為什么作者在做特征向量位置調(diào)整時采用lil_matrix格式刹缝。

tf.sparse_placeholder稀疏占位符

下面繼續(xù)研究tf.sparse_placeholder碗暗，看下他說怎么和coo_matrix配合使用的。

row = np.array([0, 0, 1, 3])  # 第幾行
col = np.array([0, 2, 1, 3])  # 第幾列
data = np.array([4, 9, 7, 5])  # 值
tmp = sp.coo_matrix((data, (row, col)), shape=(4, 4))

x = tf.sparse_placeholder(tf.float32)  # 輸入數(shù)據(jù)類型
with tf.Session() as sess:
    indices = np.mat([tmp.tocoo().row, tmp.tocoo().col]).transpose()
    values = tmp.tocoo().data
    shape = tmp.tocoo().shape
    # feed_dict的傳入格式是三元組（坐標梢夯，非零值言疗，維度）
    sp_ten = sess.run(x, feed_dict={x: (indices, values, shape)})
    print("-----------tf.sparse_placeholder效果")
    print(sp_ten)
    dense_tensor = tf.sparse_tensor_to_dense(sp_ten)
    print("-----------tf.sparse_placeholder轉(zhuǎn)化為稠密矩陣")
    print(sess.run(dense_tensor))

-----------tf.sparse_placeholder效果
SparseTensorValue(indices=array([[0, 0],
       [0, 2],
       [1, 1],
       [3, 3]]), values=array([4., 9., 7., 5.], dtype=float32), dense_shape=array([4, 4]))
-----------tf.sparse_placeholder轉(zhuǎn)化為稠密矩陣
[[4. 0. 9. 0.]
 [0. 7. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 5.]]

結(jié)論就是coo_matrix轉(zhuǎn)化為三元組格式可以直接傳入tf.sparse_placeholder中，作者的代碼也是這樣實現(xiàn)的颂砸。且看訓練在這一行實現(xiàn)

feed_dict_val = construct_feed_dict(features, support, labels, mask, placeholders)

跟一下這個函數(shù)construct_feed_dict

    feed_dict = dict()
    feed_dict.update({placeholders['labels']: labels})
    feed_dict.update({placeholders['labels_mask']: labels_mask})
    feed_dict.update({placeholders['features']: features})

在看placeholders['features']這個在train.py中定義到全局

'features': tf.sparse_placeholder(tf.float32, shape=tf.constant(features[2], dtype=tf.int64))

這下就實現(xiàn)了tf.sparse_placeholder和coo_matrix的對接

模型構(gòu)建

基礎(chǔ)數(shù)據(jù)分割和格式轉(zhuǎn)化完成之后噪奄，進入模型訓練，第一步定義占位符

placeholders = {
    'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],
    'features': tf.sparse_placeholder(tf.float32, shape=tf.constant(features[2], dtype=tf.int64)),
    'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),
    'labels_mask': tf.placeholder(tf.int32),
    'dropout': tf.placeholder_with_default(0., shape=()),
    'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout
}

作者采用可key人乓，value的格式定義了placeholders字典勤篮，先看下他在下面是怎么調(diào)用傳值的

feed_dict = construct_feed_dict(features, support, y_train, train_mask, placeholders)
outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)

以上兩行構(gòu)建了feed_dict，看下construct_feed_dict

feed_dict = dict()
feed_dict.update({placeholders['labels']: labels})
feed_dict.update({placeholders['labels_mask']: labels_mask})

construct_feed_dict拿到了在train.py定義的placeholders撒蟀，placeholders拿到指定的key替換為placeholders中的value（各種tensorflow tensor對象）作為key叙谨，以具體的值作為value，裝進feat_dict中保屯，feat_dict中一對kv的形式如下

{<tf.Tensor 'Placeholder_5:0' shape=(?, 7) dtype=float32>: array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 1., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])}

區(qū)別于傳統(tǒng)的將placeholder賦值給一個內(nèi)存中額對象手负，在feat_dict中用這個對象作為key，作者直接拿的是tensor對象作為key姑尺，這個地方使用同一個字典拿到同一個value的方式確保tensor對象引用唯一竟终，如果是新建了一個tensor對象就算是新建的語句一樣也會匹配不到tensor對象和值的關(guān)系，以dropout為例看一下模型內(nèi)部怎么調(diào)用以及外部怎么灌入數(shù)據(jù)的

class GraphConvolution(Layer):
    """Graph convolution layer."""
    def __init__(self, input_dim, output_dim, placeholders, dropout=0.,
                 sparse_inputs=False, act=tf.nn.relu, bias=False,
                 featureless=False, **kwargs):
        super(GraphConvolution, self).__init__(**kwargs)

        if dropout:
            self.dropout = placeholders['dropout']
        else:
            self.dropout = 0.

以上在GCN的卷積層定義而了一個dropout對象賦值為tf.placeholder_with_default(0., shape=())的占位符切蟋，在feat_dict中kv對如下

<tf.Tensor 'PlaceholderWithDefault:0' shape=() dtype=float32>: 0.5}

而tf.Tensor 'PlaceholderWithDefault:0' shape=()是通過placeholders['dropout']獲取的统捶，看這一行

feed_dict.update({placeholders['dropout']: FLAGS.dropout})

因此這個placeholders['dropout']是同一個tensor對象在這個地方實現(xiàn)了tensor引用傳值作為feat_dict的key。
下面一個一個看一下定義這些占位符的目的柄粹，其中features喘鸟，labels很好理解，看下下面幾個到底在干嘛

support：對稱歸一化的領(lǐng)結(jié)矩陣驻右，稀疏矩陣輸入的列表什黑，可以有多個稀疏矩陣，個數(shù)由num_supports控制堪夭，在GCN中num_supports為1愕把，在模型中support用來和WX相乘
labels_mask：y值的屏蔽，屏蔽非當前數(shù)據(jù)集y對loss和acc的計算影響森爽。實際使用train_mask灌入數(shù)據(jù)恨豁，train_mask是2708個布爾值，前140個為True爬迟，相當于把非訓練集的y值給屏蔽了橘蜜。在模型中在計算loss和accuracy時需要用到，下面具體分析付呕。
dropout：dropout在GCN原理中沒有單獨寫到扮匠，在模型層中dropout添加在節(jié)點向量矩陣X中捧请，即每一階的H中
num_features_nonzero：節(jié)點特征矩陣中非零值的個數(shù)，等于棒搜，傳入的值是features三元組中的features[1].shape=49216疹蛉，一個輔助變量，作用是生成和稀疏矩陣中有值位置想匹配的mask力麸，具體是結(jié)合tf.sparse_retain使用可款，下面再具體分析

在往下面就是構(gòu)建模型了

# Create model
model = model_func(placeholders, input_dim=features[2][1], logging=True)

模型實例化傳入了placeholders，input_dim克蚂，logging

placeholders：傳入placeholder集合闺鲸，使得在模型層能夠拿到對應的占位符在模型內(nèi)部賦值到對應變量
input_dim：節(jié)點特征向量的維度，本例中是1433埃叭，這個變量的作用是在模型層創(chuàng)建與之相對應的W矩陣的輸入維度
logging：布爾值摸恍，作用是一個開關(guān)是否在訓練過程中使用tf.summary.histogram記錄訓練分析結(jié)果

build模塊

下一步看具體的model_func類，在上面代碼中model_func賦值于GCN赤屋，看GCN類立镶，GCN繼承了Model類，重寫了Model的_loss类早，_accuracy媚媒，predict，_build模塊涩僻，先看GCN的初始化

    def __init__(self, placeholders, input_dim, **kwargs):
        super(GCN, self).__init__(**kwargs)

        self.inputs = placeholders['features']
        self.input_dim = input_dim
        # self.input_dim = self.inputs.get_shape().as_list()[1]  # To be supported in future Tensorflow versions
        self.output_dim = placeholders['labels'].get_shape().as_list()[1]
        self.placeholders = placeholders

        self.optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)

        self.build()

這段代在模型內(nèi)部拿到了所有placeholders占位符缭召，并且將節(jié)點向量矩陣和inputs進行連接，設置了特征維度input_dim逆日，輸出維度output_dim嵌巷，定義了模型內(nèi)部的placeholders（里面有全部占位符信息包括鄰接矩陣），定義了優(yōu)化器室抽，最后調(diào)用主類的build方法完成GCN所有內(nèi)部節(jié)點對象的構(gòu)建搪哪。看一下主類的build

    def _build(self):
        # 主類不實現(xiàn)狠半，子類必須實現(xiàn)，否則報錯NotImplementedError
        raise NotImplementedError

    def build(self):
        """ Wrapper for _build() """
        with tf.variable_scope(self.name):
            # 子類定義layer
            self._build()

        # Build sequential layer model
        self.activations.append(self.inputs)  # placeholders['features']
        # 開始對子颤难；類型定義的layer遍歷
        for layer in self.layers:
            hidden = layer(self.activations[-1])  # GraphConvolution inputs,拿到上一階的輸入
            self.activations.append(hidden)  # _call拿到一階的輸出
        self.outputs = self.activations[-1]  # 最新的輸出

        # Store model variables for easy access
        variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)
        self.vars = {var.name: var for var in variables}

        # Build metrics
        # 子類定義計算loss
        self._loss()
        # 子類定義計算acc
        self._accuracy()

        self.opt_op = self.optimizer.minimize(self.loss)

主類的build先調(diào)用_build神年，_build在子類中被重寫，看一下子類的_build

    def _build(self):

        self.layers.append(GraphConvolution(input_dim=self.input_dim,  # 1433
                                            output_dim=FLAGS.hidden1,  # 16
                                            placeholders=self.placeholders,
                                            act=tf.nn.relu,
                                            dropout=True,
                                            sparse_inputs=True,
                                            logging=self.logging))

        self.layers.append(GraphConvolution(input_dim=FLAGS.hidden1,  # 16
                                            output_dim=self.output_dim,  # 7
                                            placeholders=self.placeholders,
                                            act=lambda x: x,  # 沒有激活函數(shù)
                                            dropout=True,
                                            logging=self.logging))

子類_build相當硬核行嗤，定義了兩層GCN卷積類對象已日，看一下self.layers對象，在主類初始化中是一個空列表

self.layers = []

因此_build將主類中的layers空列表填充了2階卷積操作栅屏，可見作者的模型包含了2階圖卷積飘千。下面繼續(xù)看主類中的build操作

        # Build sequential layer model
        self.activations.append(self.inputs)  # placeholders['features']
        # 開始對子堂鲜；類型定義的layer遍歷
        for layer in self.layers:
            hidden = layer(self.activations[-1])  # GraphConvolution inputs,拿到上一階的輸入
            self.activations.append(hidden)  # _call拿到一階的輸出
        self.outputs = self.activations[-1]  # 最新的輸出

activations是每一階的節(jié)點特征向量，第一行代碼其實是將原始節(jié)點向量加入到activations列表中作為第一層也就是X护奈，下面開始遍歷layers缔莲，每一個layer是一個GraphConvolution類對象，這里將self.activations[-1]（上一階的節(jié)點特征向量）傳入類中實際是直接執(zhí)行了GraphConvolution類的_call方法霉旗，先瞄一眼主類Layer

    def __call__(self, inputs):
        with tf.name_scope(self.name):
            if self.logging and not self.sparse_inputs:
                tf.summary.histogram(self.name + '/inputs', inputs)
            outputs = self._call(inputs)
            if self.logging:
                tf.summary.histogram(self.name + '/outputs', outputs)
            return outputs

__call__的作用是直接傳值給實例化后的類對象痴奏，可以直接執(zhí)行call定義的函數(shù)，在call中作者調(diào)用了_call方法厌秒，因此hidden = layer(self.activations[-1])這行代碼就是計算出了最新的這一階節(jié)點的特征向量矩陣读拆，然后填充到activations中給下一層計算使用，最終的節(jié)點向量輸出等于activations的最后一個元素鸵闪，賦值給outputs檐晕。
下面是拿到所有圖變量，在下面save load模型ckpt文件是會用到

        # Store model variables for easy access
        variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)
        self.vars = {var.name: var for var in variables}

不妨打印一下self.vars看下到底有哪些變量是需要神經(jīng)網(wǎng)絡訓練的

{'gcn/graphconvolution_1_vars/weights_0:0': 
<tf.Variable 'gcn/graphconvolution_1_vars/weights_0:0' shape=(1433, 16) dtype=float32_ref>, 
'gcn/graphconvolution_2_vars/weights_0:0': 
<tf.Variable 'gcn/graphconvolution_2_vars/weights_0:0' shape=(16, 7) dtype=float32_ref>}

參數(shù)里面只有兩層卷積的W蚌讼，shape分別是(1433, 16)和(16, 7)辟灰，并沒有全連接，卷積最后一層維度7已經(jīng)和y值一致啦逆，可以直接softmax伞矩。

loss模塊

下面開始定義loss

        self._loss()
        self._accuracy()
        self.opt_op = self.optimizer.minimize(self.loss)

_loss在子類覆寫

    def _loss(self):
        # Weight decay loss
        for var in self.layers[0].vars.values():
            # 參數(shù)l2 loss W
            self.loss += FLAGS.weight_decay * tf.nn.l2_loss(var)

        # Cross entropy error
        self.loss += masked_softmax_cross_entropy(self.outputs, self.placeholders['labels'],
                                                  self.placeholders['labels_mask'])

看一下self.layers[0].vars，這個layers是GraphConvolution中的對象夏志，他有繼承基類Layer中的self.vars = {}乃坤，這個字典在GraphConvolution初始化時被填充如下

        with tf.variable_scope(self.name + '_vars'):
            for i in range(len(self.support)):
                # 設置W，1433 × 16
                self.vars['weights_' + str(i)] = glorot([input_dim, output_dim],
                                                        name='weights_' + str(i))
            if self.bias:
                # DAXW沒有偏執(zhí)
                self.vars['bias'] = zeros([output_dim], name='bias')

由于support=1沟蔑，vars添加了weights_0的glorot([input_dim, output_dim],name='weights_' + str(i))的tensor對象湿诊，跟一下這個glorot

def glorot(shape, name=None):
    """Glorot & Bengio (AISTATS 2010) init."""
    init_range = np.sqrt(6.0/(shape[0]+shape[1]))
    initial = tf.random_uniform(shape, minval=-init_range, maxval=init_range, dtype=tf.float32)
    return tf.Variable(initial, name=name)

簡單來看是glorot初始化，shape=(1433, 16)和(16, 7)瘦材，如果使用bias厅须，再加一個[16]和[7]的0值偏置，進一步看一下命名空間食棕，這段代碼最上面聲明了命名空間with tf.variable_scope(self.name + '_vars')朗和，其中self.name 由基類Layer初始化定義

        if not name:
            layer = self.__class__.__name__.lower()
            name = layer + '_' + str(get_layer_uid(layer))

由于self.__class__.__name__.lower()在多次實例化類之后輸出的名字是一樣的（就是類的名字），因此作者在名字的基礎(chǔ)上（GraphConvolution）增加了下標簿晓，實現(xiàn)方式是在全局記錄了名字在全局內(nèi)存中出現(xiàn)的次數(shù)眶拉，以次數(shù)作為下標

def get_layer_uid(layer_name=''):
    """Helper function, assigns unique layer IDs."""
    if layer_name not in _LAYER_UIDS:
        _LAYER_UIDS[layer_name] = 1
        return 1
    else:
        _LAYER_UIDS[layer_name] += 1
        return _LAYER_UIDS[layer_name]

因此結(jié)合上主類Model中的命名空間

    def build(self):
        """ Wrapper for _build() """
        with tf.variable_scope(self.name):
            # 子類定義layer
            self._build()

在雙命名空間加持下最終的變量名是gcn/graphconvolution_1_vars/weights_0:0和gcn/graphconvolution_2_vars/weights_0:0，回過頭來看loss憔儿，作者給所有卷積W增加了L2 loss忆植，self.loss += FLAGS.weight_decay * tf.nn.l2_loss(var)，接下來進入主要的loss，輸出和y值的交叉熵

 self.loss += masked_softmax_cross_entropy(self.outputs, self.placeholders['labels'],
                                                  self.placeholders['labels_mask'])

跟一下這個masked_softmax_cross_entropy

def masked_softmax_cross_entropy(preds, labels, mask):
    """Softmax cross-entropy loss with masking."""
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=preds, labels=labels)
    mask = tf.cast(mask, dtype=tf.float32)
    mask /= tf.reduce_mean(mask)
    loss *= mask
    return tf.reduce_mean(loss)

首先作者用tf.nn.softmax_cross_entropy_with_logits求出了每一行訓練樣本的softmax交叉熵朝刊，具體是直接把第二層卷積的結(jié)果（140,7）直接softmax之后耀里，與（140,7）的y計算交叉熵，然后屏蔽掉值中非訓練集的y值拾氓，避免這些結(jié)果算進loss里面去冯挎，作者將placeholders['labels_mask'])（實際上是train_mask）從[True,True...False]轉(zhuǎn)化為[1,1,1,...0]（前140個元素是1，屬于訓練集）痪枫，mask /= tf.reduce_mean(mask)目的是在return的時候?qū)oss的均值開始包括了其他遮蔽的值织堂，因此此時在分子做擴大補充，那mask就是[19.34,19.34,19.34...0]即遮蔽掉的為0奶陈，沒遮蔽的全部除以140/2708易阳，最后每一行的交叉熵和每一行對應的mask值相乘得到最終的loss，至此loss模塊結(jié)束吃粒。

accuracy模塊

下一步看_accuracy在子類中的覆寫

    def _accuracy(self):
        self.accuracy = masked_accuracy(self.outputs, self.placeholders['labels'],
                                        self.placeholders['labels_mask'])

基本格式是和masked_softmax_cross_entropy一樣的

def masked_accuracy(preds, labels, mask):
    """Accuracy with masking."""
    correct_prediction = tf.equal(tf.argmax(preds, 1), tf.argmax(labels, 1))
    accuracy_all = tf.cast(correct_prediction, tf.float32)
    mask = tf.cast(mask, dtype=tf.float32)
    mask /= tf.reduce_mean(mask)
    accuracy_all *= mask
    return tf.reduce_mean(accuracy_all)

這個地方preds和labels是打開的潦俺，因此既可以用在訓練也可以用在測試。首先對比一下preds（shape=(2708,7)）和labels（shape=(2708,7)）每一行最大值的索引是否一致tf.argmax(preds, 1)其中1代表shape-1即從內(nèi)向外的第一層求最大值的索引位置徐勃，進一步將布爾轉(zhuǎn)化為1,0事示，然后mask除以140/2708（以訓練集為例）再通過reduce_mean抹平，實際上最后的結(jié)果就是140個y值預測的準確率僻肖。

優(yōu)化器模塊

優(yōu)化器模塊一行代碼

self.opt_op = self.optimizer.minimize(self.loss)

其中優(yōu)化器在子類中申明肖爵，采用的adam優(yōu)化器

self.optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)

GCN卷積模塊

現(xiàn)在整個模型基本清晰了掉過頭來看一下卷積部分，鎖定這個卷積類GraphConvolution臀脏，主要看這個_call劝堪，主類Layer中直接函數(shù)化call里面調(diào)用了_call拿到輸出

    def _call(self, inputs):
        x = inputs

        # dropout X dropout
        if self.sparse_inputs:
            x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
        else:
            x = tf.nn.dropout(x, 1-self.dropout)

        # convolve
        supports = list()
        for i in range(len(self.support)):
            if not self.featureless:
                # X × W
                pre_sup = dot(x, self.vars['weights_' + str(i)],
                              sparse=self.sparse_inputs)
            else:
                pre_sup = self.vars['weights_' + str(i)]
            # X × W * 對稱歸一化的A
            support = dot(self.support[i], pre_sup, sparse=True)
            supports.append(support)
        output = tf.add_n(supports)

        # bias
        if self.bias:
            output += self.vars['bias']

        return self.act(output)  # relu

首先這個函數(shù)（整個類實例化之后）的輸入是inputs，實際上是每階節(jié)點向量矩陣揉稚，初始階段就是features（X）秒啦，因此在一開始模型進行了一次判斷輸入是否是稀疏格式，明顯第一次是搀玖，從第二次開始就不是了余境，下面作者對輸入的features做了dropout，先看下不是稀疏數(shù)據(jù)時直接調(diào)用了tf.nn.dropout函數(shù)接口灌诅，默認的self.dropout是Flags中的0.5芳来，因此這個地方會對輸如的矩陣中1/2的值全部大為0，剩下的值全部除以1/（1/-0.5）就是乘以2倍猜拾，這個地方的目的是保證在dropout之后矩陣輸出的期望盡量一致（就是和一致）即舌，再看一下稀疏輸入的dropout實現(xiàn)

def sparse_dropout(x, keep_prob, noise_shape):
    """Dropout for sparse tensors."""
    random_tensor = keep_prob
    random_tensor += tf.random_uniform(noise_shape)  # 49216 個0~1隨機數(shù)
    dropout_mask = tf.cast(tf.floor(random_tensor), dtype=tf.bool)
    pre_out = tf.sparse_retain(x, dropout_mask)
    return pre_out * (1./keep_prob)

noise_shape是49216，是稀疏矩陣中所有有值的數(shù)字個數(shù)关带，作者先用keep_prob加上了一個49216維的0-1的隨機數(shù)侥涵，然后向下取整為0,1最終1的概率和keep_prob是一致的，下面是關(guān)鍵的一步sparse_retain宋雏，他的目的是保留指定的稀疏矩陣中的非空值芜飘，其他的置為0，輸入還是采取三元組（坐標磨总，值嗦明，shape），測試一下

import tensorflow as tf

a = [[0, 0], [1, 0], [2, 1], [3, 1]]
b = [1, 2, 3, 4]
shape = [4, 2]
c = tf.sparse_placeholder(tf.float32)
d = tf.sparse_retain(c, tf.convert_to_tensor([1, 0, 1, 1]))

with tf.Session() as sess:
    print(sess.run(c, feed_dict={c: (a, b, shape)}))
    print(sess.run(d, feed_dict={c: (a, b, shape)}))
    print(sess.run(tf.sparse_tensor_to_dense(d), feed_dict={c: (a, b, shape)}))

以上測試代碼d就是將c的稀疏矩陣進行了[True, False, True, True]的mask之后的dropout結(jié)果蚪燕，結(jié)果如下

SparseTensorValue(indices=array([[0, 0],
       [1, 0],
       [2, 1],
       [3, 1]]), values=array([1., 2., 3., 4.], dtype=float32), dense_shape=array([4, 2]))
SparseTensorValue(indices=array([[0, 0],
       [2, 1],
       [3, 1]]), values=array([1., 3., 4.], dtype=float32), dense_shape=array([4, 2]))
[[1. 0.]
 [0. 0.]
 [0. 3.]
 [0. 4.]]

實際上是吧第二個位置（False）的值置為0娶牌，注意這個地方mask的個數(shù)是根據(jù)值的個數(shù)確定的不是根據(jù)輸入矩陣行的格數(shù)，如果mask長度和值個數(shù)不一致馆纳，默認以0在后面補齊诗良。最后使用pre_out * (1./keep_prob)其他非0值擴大倍數(shù)，同理是保證輸出的期望一致鲁驶。
下面繼續(xù)看卷積計算部分鉴裹，直接看這行

pre_sup = dot(x, self.vars['weights_' + str(i)],
                              sparse=self.sparse_inputs)

這行在做X*W，看下dot函數(shù)

def dot(x, y, sparse=False):
    """Wrapper for tf.matmul (sparse vs dense)."""
    if sparse:
        res = tf.sparse_tensor_dense_matmul(x, y)
    else:
        res = tf.matmul(x, y)
    return res

實際上就是判斷self.sparse_inputs是稀疏走tf.sparse_tensor_dense_matmul钥弯，不是稀疏走tf.matmul径荔，其中tf.sparse_tensor_dense_matmul的輸入第一個元素是稀疏矩陣，第二個元素是稠密矩陣脆霎，測試一下

a = [[0, 0], [1, 0], [1, 1], [2, 1], [3, 1]]
b = [1, 2, 2, 3, 4]
shape = [4, 2]
c = tf.sparse_placeholder(tf.float32)
d = tf.convert_to_tensor([[10.0, 1.0], [5.0, 2.0]])

with tf.Session() as sess:
    print(sess.run(tf.sparse_tensor_to_dense(c), feed_dict={c: (a, b, shape)}))
    print(sess.run(tf.sparse_tensor_dense_matmul(c, d), feed_dict={c: (a, b, shape)}))

輸出如下总处，可以看到稀疏矩陣乘以稠密矩陣可以正常相乘

[[1. 0.]
 [2. 2.]
 [0. 3.]
 [0. 4.]]
[[10.  1.]
 [30.  6.]
 [15.  6.]
 [20.  8.]]

接著繼續(xù)看GCN卷積計算部分

            # X × W * 對稱歸一化的A
            support = dot(self.support[i], pre_sup, sparse=True)

CGN中作者指定了len(support)=1，這個地方直接是對稱歸一化的A乘以X × W 睛蛛，最后作者指定了卷積后的偏置鹦马，如果有的話就是和卷積第二個維度一致的0矩陣

        if self.bias:
            output += self.vars['bias']

在最后套用激活函數(shù)輸出self.act(output)，這個act在實例化卷積核的時候指定為tf.nn.relu玖院，至此模型層全部結(jié)束菠红。

訓練模型

模型訓練再整體看一下這段代碼

# Train model
for epoch in range(FLAGS.epochs):

    t = time.time()
    # Construct feed dictionary
    # features：節(jié)點特征向量，support：對稱歸一化的A
    feed_dict = construct_feed_dict(features, support, y_train, train_mask, placeholders)
    feed_dict.update({placeholders['dropout']: FLAGS.dropout})

    # Training step
    outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)

    # Validation
    cost, acc, duration = evaluate(features, support, y_val, val_mask, placeholders)
    cost_val.append(cost)

    # Print results
    print("Epoch:", '%04d' % (epoch + 1), "train_loss=", "{:.5f}".format(outs[1]),
          "train_acc=", "{:.5f}".format(outs[2]), "val_loss=", "{:.5f}".format(cost),
          "val_acc=", "{:.5f}".format(acc), "time=", "{:.5f}".format(time.time() - t))

    # 最新的loss比最近10輪的loss均值還大
    if epoch > FLAGS.early_stopping and cost_val[-1] > np.mean(cost_val[-(FLAGS.early_stopping+1):-1]):
        print("Early stopping...")
        break

模型默認epoch=200难菌，每輪都把全部訓練數(shù)據(jù)灌進去訓練试溯，outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)這行代碼拿到了訓練的loss和acc，同時每一輪在訓練之后也驗證一次cost, acc, duration = evaluate(features, support, y_val, val_mask, placeholders)郊酒，驗證的數(shù)據(jù)量大小是500遇绞，索引從141到640，同時會記錄下每輪驗證集的loss變化

cost, acc, duration = evaluate(features, support, y_val, val_mask, placeholders)
cost_val.append(cost)

下面的代碼打印出訓練和驗證的loss和acc每輪的變化和每輪的訓練驗證時間

# Print results
    print("Epoch:", '%04d' % (epoch + 1), "train_loss=", "{:.5f}".format(outs[1]),
          "train_acc=", "{:.5f}".format(outs[2]), "val_loss=", "{:.5f}".format(cost),
          "val_acc=", "{:.5f}".format(acc), "time=", "{:.5f}".format(time.time() - t))

最后指定早停燎窘，超過10輪后最新的loss比最近10輪的loss均值還大就早停

    if epoch > FLAGS.early_stopping and cost_val[-1] > np.mean(cost_val[-(FLAGS.early_stopping+1):-1]):
        print("Early stopping...")
        break

模型測試

# Testing
test_cost, test_acc, test_duration = evaluate(features, support, y_test, test_mask, placeholders)
print("Test set results:", "cost=", "{:.5f}".format(test_cost),
      "accuracy=", "{:.5f}".format(test_acc), "time=", "{:.5f}".format(test_duration))

代碼格式和訓練驗證是一樣的摹闽，看下evaluate函數(shù)

# Define model evaluation function
def evaluate(features, support, labels, mask, placeholders):
    t_test = time.time()
    feed_dict_val = construct_feed_dict(features, support, labels, mask, placeholders)
    outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)
    return outs_val[0], outs_val[1], (time.time() - t_test)

主要看最后一行outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)，sess不run優(yōu)化器褐健，僅僅把loss和acc跑出來付鹿，到此全部GCN代碼跟讀結(jié)束澜汤。

模型預測

這一段作者沒有在train.py中寫，但是模型層給出了predict接口舵匾，這個函數(shù)不接受任何輸入俊抵，直接對模型內(nèi)部的output做softmax輸出，稍微拿出來加工一下坐梯，看一下測試集的混淆矩陣

# 在最后增加如下代碼
feed_dict_val = construct_feed_dict(features, support, y_test, test_mask, placeholders)
outs_val = sess.run(model.predict(), feed_dict=feed_dict_val)
print("-----------測試集預測輸出")
print(outs_val[1708:])
print("-----------測試集y值")
print(y_test[1708:])
outs_val_index = np.argmax(outs_val[1708:], 1)
y_test_index = np.argmax(y_test[1708:], 1)

from sklearn.metrics import confusion_matrix, classification_report
print(classification_report(y_test_index, outs_val_index))
sr = confusion_matrix(y_test_index, outs_val_index)
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
plt.matshow(sr, cmap=plt.cm.Greens)
plt.colorbar()
for i in range(len(sr)):
    for j in range(len(sr)):
        plt.annotate(sr[i, j], xy=(j, i), horizontalalignment='center', verticalalignment='center')
plt.ylabel('True')
plt.xlabel('Predict')
plt.show()

準確率報告如下

              precision    recall  f1-score   support

           0       0.66      0.77      0.71       130
           1       0.84      0.87      0.85        91
           2       0.88      0.90      0.89       144
           3       0.91      0.78      0.84       319
           4       0.79      0.86      0.82       149
           5       0.82      0.76      0.79       103
           6       0.68      0.81      0.74        64

    accuracy                           0.82      1000
   macro avg       0.80      0.82      0.81      1000
weighted avg       0.83      0.82      0.82      1000

最終的混淆矩陣如下徽诲。整體準確率在80左右

測試集混淆矩陣

代碼設計反思

（1）為什么一開始數(shù)據(jù)處理需要對測試數(shù)據(jù)的順序進行排序

這個問題我看完所有代碼之后還是有困惑，作者為什么要對test單獨做shuffle（其實是從大到小排序）吵血，因為就算不做mask的index也是可以亂序的谎替，對最后的計算測試集的loss和acc毫無影響，遮蔽并不需要排序蹋辅，我注釋掉load-data()中給features和labels的test位置兩個順序重排钱贯，最后代碼照樣跑，但是測試集效果極差侦另，訓練驗證效果差不多喷舀。

Epoch: 0200 train_loss= 0.67370 train_acc= 0.96429 val_loss= 1.28556 val_acc= 0.73400 time= 0.01114
Test set results: cost= 2.18207 accuracy= 0.28500 time= 0.00725

我試試在issue找找看，有至少3個人問了跟我一樣的問題淋肾，為啥要對test做shuffle

issue

其實我沒太看懂硫麻，后來下面還有一個人評論我大概猜到了是這樣，問題是鄰接矩陣和節(jié)點特征矩陣在測試集部分錯位樊卓，因此shuffle不影響loss和acc邏輯拿愧，但是影響A*X邏輯，因為鄰接矩陣是完全按照index順序的碌尔，而特性向量在test位置是亂序的存儲在ind.cora.test.index里面浇辜，因此需要保持一致否則矩陣點乘驢頭不對馬嘴⊥倨荩看一下load_data中的networks對象的鄰接矩陣

nx.from_dict_of_lists(graph)
Out[55]: NodeView((0, 1, 2, 3, 4...2706, 2707))

鄰接矩陣的nodes是完全順序的柳洋，而ind.dataset_str.test.index這個文件單獨記錄了測試集中節(jié)點的索引位置，是亂序的叹坦，導致在stack之后features的最后1000個索引值和鄰接矩陣不一致熊镣。

（2）為什么要用mask屏蔽y

mask出現(xiàn)在代碼的loss計算和acc計算部分，其中l(wèi)oss部分直接決定模型的訓練優(yōu)化方向募书，加入mask是GCN模型導致绪囱，因為模型的訓練需要輸入全部節(jié)點的鄰接矩陣以及全部節(jié)點的特征向量，圖卷積操作也是在全部節(jié)點上點乘鄰接矩陣和特征向量完成莹捡，不論是訓練鬼吵，驗證還是測試，所有節(jié)點都需要全部進入模型訓練篮赢，因此需要在訓練計算loss時遮蔽掉非訓練的節(jié)點齿椅，同理驗證測試也是琉挖。說白了是訓練測試驗證之間數(shù)據(jù)集無法解耦，如果解耦模型無法訓練涣脚，這也是GCN的劣勢粹排。歸納以下GCN的訓練和傳統(tǒng)的DNN的劣勢：

直推式學習：無法拓展到新的圖上，只能在訓練的圖上獲得節(jié)點的向量表示和做算法應用涩澡，即預測的節(jié)點必須在訓練集中，這大大限制了工程應用場景坠敷。
全圖形式訓練：GCN無法實現(xiàn)像DNN那樣小批量batch訓練妙同，而每次必選全量的鄰接矩陣乘以全量節(jié)點的特征向量完成一次迭代，梯度更新的效率極低
數(shù)據(jù)量大不利于訓練：因為GCN需要全量的鄰接矩陣和節(jié)點向量膝迎，而由于硬件資源限制不可能全圖納入粥帚，此時需要的模式對全圖進行瘦身采樣，在一定規(guī)模的圖結(jié)構(gòu)上進行訓練限次，在其他圖上進行拓展

（3）為什么卷積最后不接全連接

看了其他的GCN分類示意圖最后一層都直接是GCN embedding之后的softmax芒涡，這里就不糾結(jié)了，我覺得可以加