textcnn用于文本分類詳解（四）

繼續(xù)礼烈。

# 確定序列的長度
sequence_length = train_input.shape[1]
print('該訓(xùn)練集中詞匯表大写乜怼：{:d}'.format(vocabulary_size))
print('一個句子序列的長度為：{:d}'.format(sequence_length))

# 構(gòu)建數(shù)據(jù)流圖
graph = tf.Graph()
with graph.as_default():
    # 訓(xùn)練數(shù)據(jù)
    with tf.name_scope('inputs'):
        inputs = tf.placeholder(tf.int32, [None, sequence_length], name='inputs')
    with tf.name_scope('labels'):
        labels = tf.placeholder(tf.float32, [None, classes_num], name='labels')
    with tf.name_scope('keep_prob'):
        keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    with tf.name_scope('l2_loss'):
        l2_loss = tf.constant(0.0, tf.float32, name='l2_loss')

    # 詞向量
    with tf.device('/cpu:0'):
        with tf.name_scope('embedding_layer'):
            # 詞嵌入庫
            embeddings = tf.Variable(tf.random_normal([vocabulary_size, embedding_size], -1.0, 1.0), name='embeddings')
            # 輸入數(shù)據(jù)是每個句子的單詞的索引id勋篓，則就可以直接查表，得到改詞的詞向量
            embed = tf.nn.embedding_lookup(embeddings, inputs, name='embed')
            # 作為卷積的直接輸入魏割。卷積要求必須有通道數(shù)譬嚣，雖然文本的厚度為1，只有一個通道钞它，但要加上
            conv_inputs = tf.expand_dims(embed, -1)

        with tf.name_scope('conv_pooling_layer'):
            # 存儲處理好后的特征,注意feature要加s拜银，不要混淆
            features_pooled = []
            for filter_height, filter_num in zip(filters_height, filter_num_per_height):
                with tf.name_scope('conv_filter'):
                    # 卷積核四個維度[高，寬遭垛，通道盐股，個數(shù)]
                    conv_filter = tf.Variable(tf.truncated_normal([filter_height, embedding_size, 1, filter_num], stddev=0.1), name='conv_filer')
                # 卷積操作
                with tf.name_scope('conv'):
                    conv = tf.nn.conv2d(conv_inputs, conv_filter, strides=[1, 1, 1, 1], padding='VALID', name='conv')
                # 偏置,一個濾波器對應(yīng)一個偏置
                with tf.name_scope('bias'):
                    bias = tf.Variable(tf.constant(0.1, shape=[filter_num]))
                # 非線性，Relu
                with tf.name_scope('Relu'):
                    feature_map = tf.nn.relu(tf.nn.bias_add(conv, bias), name='Relu')
                # 池化
                # tf.nn.max_pool(value,ksize,strides,padding)
                # value: 4維張量耻卡；ksize：包含4個元素的1維張量疯汁，對應(yīng)輸入張量每一維度窗口的大小,就是kernel size；
                with tf.name_scope('max_pooling'):
                    feature_pooled = tf.nn.max_pool(feature_map, ksize=[1, sequence_length-filter_height+1, 1, 1],
                                                    strides=[1, 1, 1, 1], padding='VALID', name='max_pooling')
                features_pooled.append(feature_pooled)

        with tf.name_scope('full_connected_layer'):
            filter_num_total = sum(filter_num_per_height)
            # 就是平鋪,tf.concat(features_pooled, 3):第4個維度進(jìn)行拼接
            features_pooled_flat = tf.reshape(tf.concat(features_pooled, 3), [-1, filter_num_total])
            # 該層要dropout
            with tf.name_scope('drop_out'):
                features_pooled_flat_drop = tf.nn.dropout(features_pooled_flat, keep_prob=keep_prob, name='drop_out')
            with tf.name_scope('weight'):
                weight = tf.Variable(tf.truncated_normal(shape=[filter_num_total, classes_num], dtype=tf.float32), name='weight')
                tf.summary.histogram('weight', weight)
            with tf.name_scope('bias'):
                bias = tf.Variable(tf.constant(0.1, shape=[classes_num]), name='bias')
                tf.summary.histogram('bias', bias)
            # L2范數(shù)正則化
            with tf.name_scope('L2'):
                l2_loss += tf.nn.l2_loss(weight)
                l2_loss += tf.nn.l2_loss(bias)
            # xw_plus_b
            with tf.name_scope('xw_plus_b'):
                scores = tf.nn.xw_plus_b(features_pooled_flat_drop, weight, bias, name='xw_plus_b')
                tf.summary.histogram('xw_plus_b', scores)
                # 保存每個標(biāo)簽值的得分卵酪，以便在預(yù)測時候使用幌蚊。將預(yù)測值放入該列表中
                tf.add_to_collection('pred_network', scores)
            # cross_entropy loss
            with tf.name_scope('softmax_cross_entropy'):
                losses = tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels, logits=scores, name='losses')
            # loss, is a scalar
            with tf.name_scope('train_loss'):
                train_loss = tf.reduce_mean(losses) + l2_lambda * l2_loss
                tf.summary.scalar('train_loss', train_loss)
            with tf.name_scope('test_loss'):
                test_loss = tf.reduce_mean(losses) + l2_lambda * l2_loss
                tf.summary.scalar('test_loss', test_loss)

            # 預(yù)測
            with tf.name_scope('prediction'):
                predictions = tf.argmax(scores, 1)
                correct_predictions = tf.equal(predictions, tf.argmax(labels, 1), name='correct_predictions')
            # accuracy
            with tf.name_scope('train_accuracy'):
                train_accuracy = tf.reduce_mean(tf.cast(correct_predictions, 'float'))
                tf.summary.scalar('train_accuracy', train_accuracy)
            # accuracy
            with tf.name_scope('test_accuracy'):
                test_accuracy = tf.reduce_mean(tf.cast(correct_predictions, 'float'))
                tf.summary.scalar('test_accuracy', test_accuracy)

這一段主要是講構(gòu)建網(wǎng)絡(luò)結(jié)構(gòu)圖，也就是搭建模型溃卡∫缍梗框架的搭建在神經(jīng)網(wǎng)絡(luò)中是很重要的。
首先我們看到這一句：

graph = tf.Graph()
with graph.as_default():

我們經(jīng)常聽人說構(gòu)建圖模型瘸羡，就我目前的理解漩仙，什么是Tensorflow呢？中文意思是張量的流動犹赖，何為"張量"队他？通俗來講就是數(shù)據(jù)，各種數(shù)據(jù)峻村。你可以把一個張量想象成一個n維的數(shù)組或列表麸折，一個張量有一個靜態(tài)類型和動態(tài)類型的維數(shù)，張量可以在圖的結(jié)點(diǎn)之間流動粘昨。
張量是所有深度學(xué)習(xí)框架中最核心的組件垢啼，因?yàn)楹罄m(xù)的所有運(yùn)算和優(yōu)化算法與數(shù)據(jù)結(jié)構(gòu)知識庫都是基于張量進(jìn)行的窜锯。
舉例來說，我們可以將任意一張RGB彩色圖片表示成一個三階張量（三個維度分別是圖片的高度芭析、寬度和色彩數(shù)據(jù)）
張量锚扎，在哪流動呢？怎么流動呢馁启？怎么流動是第三步驾孔，且讓我們把第二步慢慢走好。在哪流動进统，我的理解是在圖里流動，如果把進(jìn)行深度學(xué)習(xí)訓(xùn)練比作畫一幅畫浪听，那么圖就相當(dāng)于畫布螟碎，在代碼中添加的操作（畫中的結(jié)點(diǎn)）和數(shù)據(jù)（畫中的線條）都是畫在布上的“畫”。
tf.Graph() 表示實(shí)例化了一個類迹栓，一個用于 tensorflow 計(jì)算和表示用的數(shù)據(jù)流圖掉分。
TensorFlow的官方文檔給了兩個構(gòu)建圖模型的方法：

import tensorflow as tf
c = tf.constant(5.0)
# 看看主程序新建的一個變量是不是在默認(rèn)圖里
assert c.graph is tf.get_default_graph()

最終沒有報(bào)錯。

import tensorflow as tf
graph = tf.Graph
with g.as_default():
  c = constant(5.0)
  assert c.graph is g

也沒有報(bào)錯克伊。

image.png

接下來是一連串的name_scope酥郭，也就是設(shè)置命名空間。
命名空間的作用是什么呢?

在某個tf.name_scope()指定的區(qū)域中定義的所有對象及各種操作愿吹，他們的“name”屬性上會增加該命名區(qū)的區(qū)域名不从，用以區(qū)別對象屬于哪個區(qū)域；
將不同的對象及操作放在由tf.name_scope()指定的區(qū)域中犁跪，便于在tensorboard中展示清晰的邏輯關(guān)系圖椿息，這點(diǎn)在復(fù)雜關(guān)系圖中特別重要。
在我的理解就相當(dāng)于構(gòu)建了幾個文件夾坷衍，然后各個細(xì)分的具體文件存放在在對應(yīng)文件夾中寝优，便于組織和管理。
舉個栗子枫耳。

import tensorflow as tf;  
tf.reset_default_graph()

# 無tf.name_scope()
a = tf.constant(1,name='my_a') #定義常量
b = tf.Variable(2,name='my_b') #定義變量
c = tf.add(a,b,name='my_add') #二者相加（操作）
print("a.name = "+a.name)
print("b.name = "+b.name)
print("c.name = "+c.name)

# 有tf.name_scope()
# with tf.name_scope('cgx_name_scope'): #定義一塊名為cgx_name_scope的區(qū)域乏矾，并在其中工作
#     a = tf.constant(1,name='my_a')
#     b = tf.Variable(2,name='my_b')
#     c = tf.add(a,b,name='my_add')
# print("a.name = "+a.name)
# print("b.name = "+b.name)
# print("c.name = "+c.name)

# 保存graph用于tensorboard繪圖
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    writer = tf.summary.FileWriter("./test",sess.graph)
    print(sess.run(c))
writer.close()

輸出結(jié)果：

# 輸出結(jié)果
# 無tf.name_scope()
a.name = my_a:0
b.name = my_b:0
c.name = my_add:0

# 有tf.name_scope()
a.name = cgx_name_scope/my_a:0
b.name = cgx_name_scope/my_b:0
c.name = cgx_name_scope/my_add:0

于是，在tensorboard中迁杨，我們可以看到钻心，

image.png

OK，書歸正傳铅协，我們在默認(rèn)的圖模型里構(gòu)建了幾個命名空間inouts扔役、labels、keep_prob和l2_loss警医。
tf.device('/cpu:0')指定在第1塊cpu上運(yùn)行程序亿胸。
接下來在embedding_layers這一命名域中建立詞向量坯钦。

    with tf.device('/cpu:0'):
        with tf.name_scope('embedding_layer'):
            # 詞嵌入庫
            embeddings = tf.Variable(tf.random_normal([vocabulary_size, embedding_size], -1.0, 1.0), name='embeddings')
            # 輸入數(shù)據(jù)是每個句子的單詞的索引id，則就可以直接查表侈玄，得到改詞的詞向量
            embed = tf.nn.embedding_lookup(embeddings, inputs, name='embed')
            # 作為卷積的直接輸入婉刀。卷積要求必須有通道數(shù)，雖然文本的厚度為1序仙，只有一個通道突颊，但要加上
            conv_inputs = tf.expand_dims(embed, -1)

首先隨機(jī)初始化一個[vocabulary_size, embedding_size]維度的詞向量。
輸入數(shù)據(jù)inputs是每個句子中每個單詞的索id潘悼，通過embedding_lookup函數(shù)查表律秃，得到該詞的詞向量。
tf.expend_dims對卷積后的數(shù)據(jù)進(jìn)行升維治唤，-1表示增加一維棒动。

卷積要求必須有通道數(shù)，雖然文本的厚度為1宾添，只有一個通道船惨，但要加上。

image.png

接下來進(jìn)入大家最喜歡的淘寶推薦環(huán)節(jié)缕陕。粱锐。。哦不扛邑，卷積池化層怜浅。

        with tf.name_scope('conv_pooling_layer'):
            # 存儲處理好后的特征,注意feature要加s，不要混淆
            features_pooled = []
            for filter_height, filter_num in zip(filters_height, filter_num_per_height):
                with tf.name_scope('conv_filter'):
                    # 卷積核四個維度[高蔬崩，寬海雪，通道，個數(shù)]
                    conv_filter = tf.Variable(tf.truncated_normal([filter_height, embedding_size, 1, filter_num], stddev=0.1), name='conv_filer')
                # 卷積操作
                with tf.name_scope('conv'):
                    conv = tf.nn.conv2d(conv_inputs, conv_filter, strides=[1, 1, 1, 1], padding='VALID', name='conv')
                # 偏置,一個濾波器對應(yīng)一個偏置
                with tf.name_scope('bias'):
                    bias = tf.Variable(tf.constant(0.1, shape=[filter_num]))
                # 非線性舱殿，Relu
                with tf.name_scope('Relu'):
                    feature_map = tf.nn.relu(tf.nn.bias_add(conv, bias), name='Relu')
                # 池化
                # tf.nn.max_pool(value,ksize,strides,padding)
                # value: 4維張量奥裸；ksize：包含4個元素的1維張量，對應(yīng)輸入張量每一維度窗口的大小,就是kernel size沪袭；
                with tf.name_scope('max_pooling'):
                    feature_pooled = tf.nn.max_pool(feature_map, ksize=[1, sequence_length-filter_height+1, 1, 1],
                                                    strides=[1, 1, 1, 1], padding='VALID', name='max_pooling')
                features_pooled.append(feature_pooled)

其中湾宙，filter_height，filter_num在前面已經(jīng)定義過了冈绊。

filters_height = [2, 3, 4]
filter_num_per_height = [100, 100, 100]

這里是把filter_height侠鳄，filter_num分別打包了一下，也就是在for循環(huán)里死宣，構(gòu)建了三個高度分別為2伟恶，3，4毅该，數(shù)量分別為100博秫，100潦牛，100的卷積核。

                with tf.name_scope('conv_filter'):
                    # 卷積核四個維度[高挡育，寬巴碗，通道，個數(shù)]
                    conv_filter = tf.Variable(tf.truncated_normal([filter_height, embedding_size, 1, filter_num], stddev=0.1), name='conv_filer')
                # 卷積操作

接下來即寒，在conv_filter這一命名域中橡淆，初始化卷積核，我們用tensorflow.truncated_normal命令母赵，生成一個指定形狀并用正態(tài)分布片段的隨機(jī)值填充的張量逸爵。
卷積核有四個維度：【高、寬凹嘲、通道师倔、個數(shù)】，stddev是標(biāo)準(zhǔn)差施绎，根據(jù)這個值來產(chǎn)生正態(tài)分布溯革。mean是均值贞绳，默認(rèn)為0谷醉，這里我們采用默認(rèn)項(xiàng)。

                # 卷積操作
                with tf.name_scope('conv'):
                    conv = tf.nn.conv2d(conv_inputs, conv_filter, strides=[1, 1, 1, 1], padding='VALID', name='conv')

接下來我們在conv命名域中定義卷積操作冈闭，tensorflow.nn.conv2d是TensorFlow中2維卷積函數(shù)俱尼，關(guān)于此函數(shù)的講解我會放在另一章節(jié)。

                # 偏置,一個濾波器對應(yīng)一個偏置
                with tf.name_scope('bias'):
                    bias = tf.Variable(tf.constant(0.1, shape=[filter_num]))

設(shè)置偏置萎攒，tensorflow.constant函數(shù)將其置位0.1常量遇八。

                # 非線性，Relu
                with tf.name_scope('Relu'):
                    feature_map = tf.nn.relu(tf.nn.bias_add(conv, bias), name='Relu')

這里對偏置處理過的卷積核進(jìn)行非線性處理并賦值給feature_map耍休，一般使用ReLu函數(shù)刃永。

                # 池化
                # tf.nn.max_pool(value,ksize,strides,padding)
                # value: 4維張量；ksize：包含4個元素的1維張量羊精，對應(yīng)輸入張量每一維度窗口的大小,就是kernel size斯够；
                with tf.name_scope('max_pooling'):
                    feature_pooled = tf.nn.max_pool(feature_map, ksize=[1, sequence_length-filter_height+1, 1, 1],
                                                    strides=[1, 1, 1, 1], padding='VALID', name='max_pooling')
                    features_pooled.append(feature_pooled)

池化處理，這里我們采用最大池化技術(shù)喧锦，tf.nn.max_pool()函數(shù)读规，最后將池化處理過的單元傳入我們最初定義的features_pool[]列表中，卷積池化圖構(gòu)建完成燃少！

最后編輯于：2019.07.09 10:44:10

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末束亏，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子阵具，更是在濱河造成了極大的恐慌碍遍，老刑警劉巖定铜，帶你破解...
沈念sama閱讀 206,214評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異雀久，居然都是意外死亡宿稀，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,307評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門赖捌，熙熙樓的掌柜王于貴愁眉苦臉地迎上來祝沸，“玉大人，你說我怎么就攤上這事越庇≌秩瘢” “怎么了？”我有些...
開封第一講書人閱讀 152,543評論 0贊 341
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵卤唉，是天一觀的道長涩惑。經(jīng)常有香客問我，道長桑驱，這世上最難降的妖魔是什么竭恬？我笑而不...
開封第一講書人閱讀 55,221評論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮熬的，結(jié)果婚禮上痊硕，老公的妹妹穿的比我還像新娘。我一直安慰自己押框，他們只是感情好岔绸，可當(dāng)我...
茶點(diǎn)故事閱讀 64,224評論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著橡伞，像睡著了一般盒揉。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上兑徘，一...
開封第一講書人閱讀 49,007評論 1贊 284
城市分裂傳說
那天刚盈，我揣著相機(jī)與錄音，去河邊找鬼挂脑。笑死藕漱，一個胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的最域。我是一名探鬼主播谴分，決...
沈念sama閱讀 38,313評論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼镀脂！你這毒婦竟也來了牺蹄？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 36,956評論 0贊 259
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤薄翅，失蹤者是張志新（化名）和其女友劉穎沙兰，沒想到半個月后氓奈，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 43,441評論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡鼎天，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,925評論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年舀奶，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片斋射。...
茶點(diǎn)故事閱讀 38,018評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡育勺，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出罗岖，到底是詐尸還是另有隱情涧至，我是刑警寧澤，帶...
沈念sama閱讀 33,685評論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布桑包，位于F島的核電站南蓬，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏哑了。R本人自食惡果不足惜赘方，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,234評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望弱左。院中可真熱鬧窄陡，春花似錦、人聲如沸科贬。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,240評論 0贊 19
一樁弒父案鳖悠，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽榜掌。三九已至，卻和暖如春乘综，著一層夾襖步出監(jiān)牢的瞬間憎账，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,464評論 1贊 261
情欲美人皮
我被黑心中介騙來泰國打工卡辰，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留胞皱，地道東北人。一個月前我還...
沈念sama閱讀 45,467評論 2贊 352
代替公主和親
正文我出身青樓九妈，卻偏偏與公主長得像反砌，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子萌朱，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,762評論 2贊 345

textcnn用于文本分類詳解（四）

推薦閱讀更多精彩內(nèi)容