論文網(wǎng)址
實現(xiàn)代碼
中文相關(guān)解讀
閱讀目標(biāo):
- 文中如何可視化網(wǎng)絡(luò)的?
- 通過可視化網(wǎng)絡(luò)谊惭,作者理解了哪些信息圈盔?
一悄雅、Introduction
在related work里介紹了兩個概念:Visualization和Feature Generalization宽闲。
之前我寫過一篇feature visualization的文章握牧,這里有提到娩梨。那種方法的缺點(diǎn)是requires a careful initialization and does not give any information about the unit’s invariances.
這里提到一個詞叫做”unit’s invariances“,表面含義是單元的不變形。
這里作者對自己的網(wǎng)絡(luò)的特點(diǎn)有一個概括:
they are not just crops of input images, but rather top-down projections that reveal structures within each patch that stimulate a particular feature map.
Feature generalization具體是指:
the generalization ability of convnet features
二掸冤、Approach
(一)友雳、所用模型介紹
用到的是standard fully supervised convnet models。
彩色2D輸入——>C類概率
每一層包含:
(i)卷積層
(ii)relu層: a rectified linear function (relu(x)= max(x, 0))
(iii)[optionally] 最大池化層
(iv)[optionally]局部歸一化a local contrast operation that normalizes the responses across feature maps
前幾層是全連接卷積層層押赊,最后一層是一個softmax分類器流礁。
(二) 訓(xùn)練過程
訓(xùn)練集:{x, y}
損失函數(shù):cross-entropy loss function
比較yi^和yi
訓(xùn)練過程描述:
(這句話寫的蠻好的,我就復(fù)制粘貼過來了)
The parameters of the network (filters in the convolutional layers, weight matrices in the fully- connected layers and biases) are trained by back-propagating the derivative of the loss with respect to the parameters throughout the network, and updating the parameters via stochastic gradient descent. Details
(三)再姑、可視化的方法
目標(biāo):理解the feature activity in 中間層(intermediate layers)
做法概括:map these activities backto the input pixel space找御,用一個解卷積網(wǎng)絡(luò)去實現(xiàn)這樣的映射
解卷積可以理解為卷積的逆向操作(filtering, pooling)
具體做法:
convet的每一層都鏈接了一個deconvnet霎桅。
如果要看某一個convnet的activation,我們可以把這一層的其他activation都設(shè)為0遇革,然后把這些feature maps輸入到attached deconvnet layer瓜浸。
進(jìn)而進(jìn)行(i) unpool:
max pooling其實是不可逆的插佛,但是我們在”switch“這個變量中,記錄每一個pooling region的最大值的位置雇寇。在解卷積網(wǎng)絡(luò)中氢拥,unpooling操作用這些”switches”去把layer reconstructions放到合適的位置蚌铜。
(ii) rectify: RELU
確保feature map always positive
(iii) filter:
這一步是卷積的逆向操作。
approximately invert嫩海。
用filter的轉(zhuǎn)置冬殃,并且用于rectified maps而不是
用于the output of the layer。
In practice this means flipping each filter vertically and horizontally.
這三步需要重復(fù)until input pixel space is reached叁怪。