在查看代碼的時(shí)候炉抒,看到有代碼用到卷積層是tf.nn.conv2d梁棠,但是也有的使用的卷積層是tf.contrib.slim.conv2d靠闭,這兩個(gè)函數(shù)調(diào)用的卷積層是否一致,在查看了API的文檔塞琼,以及slim.conv2d的源碼后菠净,做如下總結(jié):
首先是常見使用的tf.nn.conv2d的函數(shù),其定義如下:
conv2d(
input,
filter,
strides,
padding,
use_cudnn_on_gpu=None,
data_format=None,
name=None
)
input指需要做卷積的輸入圖像彪杉,它要求是一個(gè)Tensor毅往,具有[batch_size, in_height, in_width, in_channels]這樣的shape,具體含義是[訓(xùn)練時(shí)一個(gè)batch的圖片數(shù)量, 圖片高度, 圖片寬度, 圖像通道數(shù)]派近,注意這是一個(gè)4維的Tensor攀唯,要求數(shù)據(jù)類型為float32和float64其中之一
filter用于指定CNN中的卷積核,它要求是一個(gè)Tensor渴丸,具有[filter_height, filter_width, in_channels, out_channels]這樣的shape侯嘀,具體含義是[卷積核的高度,卷積核的寬度谱轨,圖像通道數(shù)戒幔,卷積核個(gè)數(shù)],要求類型與參數(shù)input相同土童,有一個(gè)地方需要注意诗茎,第三維in_channels,就是參數(shù)input的第四維献汗,這里是維度一致敢订,不是數(shù)值一致罢吃。這里out_channels指定的是卷積核的個(gè)數(shù)楚午,而in_channels說明卷積核的維度與圖像的維度一致,在做卷積的時(shí)候刃麸,單個(gè)卷積核在不同維度上對(duì)應(yīng)的卷積圖片糟袁,然后將in_channels個(gè)通道上的結(jié)果相加,加上bias來得到單個(gè)卷積核卷積圖片的結(jié)果讹剔。
strides為卷積時(shí)在圖像每一維的步長(zhǎng)都哭,這是一個(gè)一維的向量,長(zhǎng)度為4吁伺,對(duì)應(yīng)的是在input的4個(gè)維度上的步長(zhǎng)
padding是string類型的變量饮睬,只能是"SAME","VALID"其中之一,這個(gè)值決定了不同的卷積方式篮奄,SAME代表卷積核可以停留圖像邊緣捆愁,VALID表示不能割去,更詳細(xì)的描述可以參考http://blog.csdn.net/mao_xiao_feng/article/details/53444333
use_cudnn_on_gpu指定是否使用cudnn加速,默認(rèn)為true
data_format是用于指定輸入的input的格式昼丑,默認(rèn)為NHWC格式
結(jié)果返回一個(gè)Tensor呻逆,這個(gè)輸出,就是我們常說的feature map
而對(duì)于tf.contrib.slim.conv2d菩帝,其函數(shù)定義如下:
convolution(inputs,
num_outputs,
kernel_size,
stride=1,
padding='SAME',
data_format=None,
rate=1,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None):
inputs同樣是指需要做卷積的輸入圖像
num_outputs指定卷積核的個(gè)數(shù)(就是filter的個(gè)數(shù))
kernel_size用于指定卷積核的維度(卷積核的寬度咖城,卷積核的高度)
stride為卷積時(shí)在圖像每一維的步長(zhǎng)
padding為padding的方式選擇,VALID或者SAME
data_format是用于指定輸入的input的格式
rate這個(gè)參數(shù)不是太理解呼奢,而且tf.nn.conv2d中也沒有宜雀,對(duì)于使用atrous convolution的膨脹率(不是太懂這個(gè)atrous convolution)
activation_fn用于激活函數(shù)的指定,默認(rèn)的為ReLU函數(shù)
normalizer_fn用于指定正則化函數(shù)
normalizer_params用于指定正則化函數(shù)的參數(shù)
weights_initializer用于指定權(quán)重的初始化程序
weights_regularizer為權(quán)重可選的正則化程序
biases_initializer用于指定biase的初始化程序
biases_regularizer: biases可選的正則化程序
reuse指定是否共享層或者和變量
variable_collections指定所有變量的集合列表或者字典
outputs_collections指定輸出被添加的集合
trainable:卷積層的參數(shù)是否可被訓(xùn)練
scope:共享變量所指的variable_scope
在上述的API中握础,可以看出去除掉初始化的部分辐董,那么兩者并沒有什么不同,只是tf.contrib.slim.conv2d提供了更多可以指定的初始化的部分禀综,而對(duì)于tf.nn.conv2d而言简烘,其指定filter的方式相比較tf.contrib.slim.conv2d來說,更加的復(fù)雜菇存。去除掉少用的初始化部分夸研,其實(shí)兩者的API可以簡(jiǎn)化如下:
tf.contrib.slim.conv2d (inputs,
num_outputs,[卷積核個(gè)數(shù)]
kernel_size,[卷積核的高度,卷積核的寬度]
stride=1,
padding='SAME',
)
tf.nn.conv2d(
input,(與上述一致)
filter,([卷積核的高度依鸥,卷積核的寬度亥至,圖像通道數(shù),卷積核個(gè)數(shù)])
strides,
padding,
)
可以說兩者是幾乎相同的贱迟,運(yùn)行下列代碼也可知這兩者一致
import tensorflow as tf
import tensorflow.contrib.slim as slim
x1 = tf.ones(shape=[1, 64, 64, 3])
w = tf.fill([5, 5, 3, 64], 1)
# print("rank is", tf.rank(x1))
y1 = tf.nn.conv2d(x1, w, strides=[1, 1, 1, 1], padding='SAME')
y2 = slim.conv2d(x1, 64, [5, 5], weights_initializer=tf.ones_initializer, padding='SAME')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
y1_value,y2_value,x1_value=sess.run([y1,y2,x1])
print("shapes are", y1_value.shape, y2_value.shape)
print(y1_value==y2_value)
print(y1_value)
print(y2_value)
最后配上tf.contrib.slim.conv2d的API英文版
def convolution(inputs,
num_outputs,
kernel_size,
stride=1,
padding='SAME',
data_format=None,
rate=1,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None):
"""Adds an N-D convolution followed by an optional batch_norm layer.
It is required that 1 <= N <= 3.
`convolution` creates a variable called `weights`, representing the
convolutional kernel, that is convolved (actually cross-correlated) with the
`inputs` to produce a `Tensor` of activations. If a `normalizer_fn` is
provided (such as `batch_norm`), it is then applied. Otherwise, if
`normalizer_fn` is None and a `biases_initializer` is provided then a `biases`
variable would be created and added the activations. Finally, if
`activation_fn` is not `None`, it is applied to the activations as well.
Performs atrous convolution with input stride/dilation rate equal to `rate`
if a value > 1 for any dimension of `rate` is specified. In this case
`stride` values != 1 are not supported.
Args:
inputs: A Tensor of rank N+2 of shape
`[batch_size] + input_spatial_shape + [in_channels]` if data_format does
not start with "NC" (default), or
`[batch_size, in_channels] + input_spatial_shape` if data_format starts
with "NC".
num_outputs: Integer, the number of output filters.
kernel_size: A sequence of N positive integers specifying the spatial
dimensions of the filters. Can be a single integer to specify the same
value for all spatial dimensions.
stride: A sequence of N positive integers specifying the stride at which to
compute output. Can be a single integer to specify the same value for all
spatial dimensions. Specifying any `stride` value != 1 is incompatible
with specifying any `rate` value != 1.
padding: One of `"VALID"` or `"SAME"`.
data_format: A string or None. Specifies whether the channel dimension of
the `input` and output is the last dimension (default, or if `data_format`
does not start with "NC"), or the second dimension (if `data_format`
starts with "NC"). For N=1, the valid values are "NWC" (default) and
"NCW". For N=2, the valid values are "NHWC" (default) and "NCHW".
For N=3, the valid values are "NDHWC" (default) and "NCDHW".
rate: A sequence of N positive integers specifying the dilation rate to use
for atrous convolution. Can be a single integer to specify the same
value for all spatial dimensions. Specifying any `rate` value != 1 is
incompatible with specifying any `stride` value != 1.
activation_fn: Activation function. The default value is a ReLU function.
Explicitly set it to None to skip it and maintain a linear activation.
normalizer_fn: Normalization function to use instead of `biases`. If
`normalizer_fn` is provided then `biases_initializer` and
`biases_regularizer` are ignored and `biases` are not created nor added.
default set to None for no normalizer function
normalizer_params: Normalization function parameters.
weights_initializer: An initializer for the weights.
weights_regularizer: Optional regularizer for the weights.
biases_initializer: An initializer for the biases. If None skip biases.
biases_regularizer: Optional regularizer for the biases.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional list of collections for all the variables or
a dictionary containing a different list of collection per variable.
outputs_collections: Collection to add the outputs.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
scope: Optional scope for `variable_scope`.
Returns:
A tensor representing the output of the operation.
Raises:
ValueError: If `data_format` is invalid.
ValueError: Both 'rate' and `stride` are not uniformly 1.