最近拜讀了Christian Szegedy的Rethinking the Inception Architecture for Computer Vision科吭,讓我開了許多腦洞歌亲。不得不承認(rèn),現(xiàn)在的深度學(xué)習(xí)發(fā)展太快了,稍不留神,1年前的許多方法都過時(shí)了。上面這篇文章應(yīng)該是GoogLeNet的延伸吧喘漏,Christian Szegedy在發(fā)布GoogLeNet之后,又完善了網(wǎng)絡(luò)一下(變得更復(fù)雜)华烟,然后寫了上面的文章翩迈。
可能是我讀論文讀的太少,我第一次在Inception Architecture中見到了非對(duì)稱卷積盔夜。什么意思呢帽馋?就是平常我們看到的卷積核一般都是 1x1、3x3比吭、5x5 大小绽族,Inception Architecture中用的卷積核是 7x1、1x7衩藤,先不提Inception Architecture的網(wǎng)絡(luò)結(jié)構(gòu)有多奇葩吧慢,光這個(gè)非對(duì)稱kernel就很詭異了。
于是赏表,為了有個(gè)形象的理解检诗,我就寫了個(gè)小小的測(cè)試程序:
import tensorflow as tf
x = tf.Variable(tf.ones([1, 4, 4, 1]))
w = tf.Variable(tf.ones([3, 1, 1, 1]))
output = tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
對(duì)比一下輸入和輸出(我把4維向量2維化了匈仗,方便理解):
w = [1 1 1] # 非對(duì)稱卷積核
x = [[1 1 1 1], output = [[2 3 3 2], # 輸入和輸出
[1 1 1 1], [2 3 3 2],
[1 1 1 1], [2 3 3 2],
[1 1 1 1]] [2 3 3 2]]
有了上面的運(yùn)算結(jié)果,就一目了然了逢慌,但采用這種非對(duì)稱卷積的目的何在悠轩?原因總結(jié)如下:
- 1、先進(jìn)行 n×1 卷積再進(jìn)行 1×n 卷積攻泼,與直接進(jìn)行 n×n 卷積的結(jié)果是等價(jià)的火架。原文如下:
In theory, we could go even further and argue that one can replace any n × n convolution by a 1 × n convolution followed by a n × 1 convolution
- 2、非對(duì)稱卷積能夠降低運(yùn)算量忙菠,這個(gè)很好理解吧何鸡,原來是 n×n 次乘法,改了以后牛欢,變成了 2×n 次乘法了骡男,n越大,運(yùn)算量減少的越多傍睹,原文如下:
the computational cost saving increases dramatically as n grows.
- 3隔盛、雖然可以降低運(yùn)算量,但這種方法不是哪兒都適用的拾稳,非對(duì)稱卷積在圖片大小介于12×12到20×20大小之間的時(shí)候吮炕,效果比較好,具體原因未知熊赖。。虑椎。原文如下:
In practice, we have found that employing this factorization does not work well on early layers, but it gives very good results on medium grid-sizes (On m×m feature maps, where m ranges between 12 and 20).