Stanford cs231n Assignment #2 實現(xiàn)CNN -- Convolutional Neural Nets

assignment2下泼舱，在cnn之前還有fully connected neural nets躁绸，batch norm和dropout的assignment裕循，dropout實現(xiàn)起來還是挺簡單的臣嚣，batch norm的話在上一章我參考了別人的解法（一開始自己想錯了，后來還沒來得及自己重新推導一遍剥哑，之后希望可以補上這個過程）硅则，fcn_net的實現(xiàn)主要依靠forward:存儲cache并逐層計算layer的輸出，backward:根據(jù)cache和dout來計算當前層的參數(shù)的梯度下降增量（dout是上一層傳遞下來的株婴，cache是當前層的怎虫，注意cache在forward過程中存下的是當前l(fā)ayer的輸入量和其他有關(guān)計算的，比如第1層(輸入層)的cache就是輸入x和第一層的weight和bias困介，如果是一些特殊的layer大审，比如dropout層需要保存當前層的mask【決定哪些輸入feature在此置0, 即np.random.rand(x.shape[0], x.shape[1]) 】）。

接下來是重點...實現(xiàn)一個CNN逻翁，畢竟fully connected nets只是一個testbed饥努。對于實現(xiàn)cnn，我會一步一步記錄自己的理解和做法八回。

1. Conv Layer Naive Forward

卷積層的pad酷愧，輸出的dimension計算如下：

注意kernel是對輸入層的所有層都進行卷積，所以如果只有一個filter那么輸出只有1層缠诅，如果有n個filter溶浴，那么輸出就有n層了。一個filter中用于卷積的kernel數(shù)取決于上一層的輸入channel的個數(shù)管引，每個channel對應(yīng)一個kernel士败，代碼：

def conv_forward_naive(x, w, b, conv_param):
  """
  A naive implementation of the forward pass for a convolutional layer.

  The input consists of N data points, each with C channels, height H and width
  W. We convolve each input with F different filters, where each filter spans
  all C channels and has height HH and width HH.

  Input:
  - x: Input data of shape (N, C, H, W)
  - w: Filter weights of shape (F, C, HH, WW)
  - b: Biases, of shape (F,)
  - conv_param: A dictionary with the following keys:
    - 'stride': The number of pixels between adjacent receptive fields in the
      horizontal and vertical directions.
    - 'pad': The number of pixels that will be used to zero-pad the input.

  Returns a tuple of:
  - out: Output data, of shape (N, F, H', W') where H' and W' are given by
    H' = 1 + (H + 2 * pad - HH) / stride
    W' = 1 + (W + 2 * pad - WW) / stride
  - cache: (x, w, b, conv_param)
  """
  out = None
  #############################################################################
  # TODO: Implement the convolutional forward pass.                           #
  # Hint: you can use the function np.pad for padding.                        #
  #############################################################################
  stride = conv_param['stride']
  pad = conv_param['pad']
  print 'pad:', pad
  # print 'stride:', stride, 'pad:', pad
  # print 'x', x.shape, 'w', w.shape
  
  x_pad = np.zeros((x.shape[0], x.shape[1], x.shape[2]+pad*2, x.shape[3]+pad*2))
  
  # pad 0 to image
  for i in range(x.shape[0]):
    for j in range(x.shape[1]):
      x_pad[i,j,:,:]=np.pad(x[i,j,:,:], pad, 'constant', constant_values=0)

  H_out = (x.shape[2]+2*pad-w.shape[2])/stride+1
  W_out = (x.shape[3]+2*pad-w.shape[3])/stride+1
  out = np.zeros((x.shape[0], w.shape[0], H_out, W_out))

  # convolution
  for i in range(H_out):
    for j in range(W_out):
      for k in range(w.shape[0]):
        out[:, k, i, j] = np.sum(w[k,:,:,:]*x_pad[:, :, i*stride:i*stride+w.shape[2], j*stride:j*stride+w.shape[3]], axis=(1,2,3))+b[k]
  
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  cache = (x, w, b, conv_param)
  return out, cache

2. Conv Layer Naive Backward

ConvLayer Connection

個人總結(jié)的感性認識：

dx一定和x的shape相同，dw一定和w的shape相同褥伴，db一定和b的shape相同谅将，dout和out的shape一定相同。只要考慮下它們的意義就可以知道重慢。dL/dx的含義（也就是代碼中的dx）是x的改變會多大程度地改變L饥臂，而x的改變也只能是在自己的shape下。
在計算的時候先將x進行pad似踱。我們知道在卷積的過程中隅熙，每一次卷積運算，對應(yīng)于上圖的2×2和2×2矩陣的作用核芽，其實就是一個相乘累加過程囚戚。對于out層的每一個元素，都是由input layer和weight矩陣對應(yīng)的相乘累加得到的轧简。以dx為例驰坊，input矩陣x中的每個元素可能會參與１到幾次不等的運算，與weight相乘將結(jié)果貢獻到out layer中哮独，比如x(0,0)只參與了一次庐橙，那么它的dx(0,0)=weight(0,0)×dout(...)假勿，而x(0,1)參與了兩次，分別貢獻到out的不同位置态鳖，那么dx(0,1)=weight(0,0)×dout(...)+weight(0,1)×dout(...)，以此類推恶导。

代碼：

def conv_backward_naive(dout, cache):
  """
  A naive implementation of the backward pass for a convolutional layer.

  Inputs:
  - dout: Upstream derivatives.
  - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

  Returns a tuple of:
  - dx: Gradient with respect to x
  - dw: Gradient with respect to w
  - db: Gradient with respect to b
  """
  dx, dw, db = None, None, None
  #############################################################################
  # TODO: Implement the convolutional backward pass.                          #
  #############################################################################
  x = cache[0]
  w = cache[1]
  b = cache[2]
  stride = cache[3]['stride']
  pad = cache[3]['pad']

  x_pad=np.pad(x, [(0,0),(0,0),(pad,pad),(pad,pad)], 'constant', constant_values=0)
  dx_pad = np.zeros((x.shape[0], x.shape[1], x.shape[2]+2*pad, x.shape[3]+2*pad))
  dw = np.zeros(w.shape)
  db = np.zeros(b.shape)
  # k浆竭，即第一維是樣本個數(shù)；f惨寿，即第二維是filter個數(shù)邦泄；xi和xj是該層輸出map的長寬。
  # 以下代碼根據(jù)上面的分析就很好理解了
  for k in range(dout.shape[0]):
    for f in range(dout.shape[1]):
      for xi in range(dout.shape[2]):
        for xj in range(dout.shape[3]):
          dx_pad[k,:,xi*stride:xi*stride+w.shape[2],xj*stride:xj*stride+w.shape[3]] += w[f,:]*dout[k,f,xi,xj]
          dw[f] += x_pad[k,:,xi*stride:xi*stride+w.shape[2],xj*stride:xj*stride+w.shape[3]]*dout[k,f,xi,xj]

  dx = dx_pad[:,:,pad:pad+x.shape[2],pad:pad+x.shape[3]]
  db = np.sum(dout, axis=(0,2,3))
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  return dx, dw, db

3. Max Pooling Naive Forward

def max_pool_forward_naive(x, pool_param):
  """
  A naive implementation of the forward pass for a max pooling layer.

  Inputs:
  - x: Input data, of shape (N, C, H, W)
  - pool_param: dictionary with the following keys:
    - 'pool_height': The height of each pooling region
    - 'pool_width': The width of each pooling region
    - 'stride': The distance between adjacent pooling regions

  Returns a tuple of:
  - out: Output data
  - cache: (x, pool_param)
  """
  out = None
  #############################################################################
  # TODO: Implement the max pooling forward pass                              #
  #############################################################################
  pool_height = pool_param['pool_height']
  pool_width = pool_param['pool_width']
  stride = pool_param['stride']

  out = np.zeros(( x.shape[0], x.shape[1],
                   (x.shape[2]-pool_height)//stride+1,
                   (x.shape[3]-pool_width)//stride+1 ))

  for n in range(x.shape[0]):
    for c in range(x.shape[1]):
      for i in range(out.shape[2]):
        for j in range(out.shape[3]):
          out[n,c,i,j] = np.max( x[n,c,i*stride:i*stride+pool_height, j*stride:j*stride+pool_width] )

  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  cache = (x, pool_param)
  return out, cache

4. Max Pooling Naive Backward

def max_pool_backward_naive(dout, cache):
  """
  A naive implementation of the backward pass for a max pooling layer.

  Inputs:
  - dout: Upstream derivatives
  - cache: A tuple of (x, pool_param) as in the forward pass.

  Returns:
  - dx: Gradient with respect to x
  """
  dx = None
  #############################################################################
  # TODO: Implement the max pooling backward pass                             #
  #############################################################################
  x = cache[0]
  dx = np.zeros(x.shape)
  pool_param = cache[1]
  stride = pool_param['stride']
  pool_height = pool_param['pool_height']
  pool_width = pool_param['pool_width']
  for n in range(x.shape[0]):
    for c in range(x.shape[1]):
      for i in range(dout.shape[2]):
        for j in range(dout.shape[3]):
          tmp = x[n,c,i*stride:i*stride+pool_height,j*stride:j*stride+pool_width]
          binary = tmp==np.max(tmp)
          dx[n,c,i*stride:i*stride+pool_height,j*stride:j*stride+pool_width] = binary*dout[n,c,i,j]
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  return dx

5. 訓練一個(Conv + Relu + Pooling) + (Affine + Relu) + (Affine) + Softmax　三層網(wǎng)絡(luò)

class ThreeLayerConvNet(object):
  """
  A three-layer convolutional network with the following architecture:
  
  conv - relu - 2x2 max pool - affine - relu - affine - softmax
  
  The network operates on minibatches of data that have shape (N, C, H, W)
  consisting of N images, each with height H and width W and with C input
  channels.
  """
  
  def __init__(self, input_dim=(3, 32, 32), num_filters=32, filter_size=7,
               hidden_dim=100, num_classes=10, weight_scale=1e-3, reg=0.0,
               dtype=np.float32):
    """
    Initialize a new network.
    
    Inputs:
    - input_dim: Tuple (C, H, W) giving size of input data
    - num_filters: Number of filters to use in the convolutional layer
    - filter_size: Size of filters to use in the convolutional layer
    - hidden_dim: Number of units to use in the fully-connected hidden layer
    - num_classes: Number of scores to produce from the final affine layer.
    - weight_scale: Scalar giving standard deviation for random initialization
      of weights.
    - reg: Scalar giving L2 regularization strength
    - dtype: numpy datatype to use for computation.
    """
    self.params = {}
    self.reg = reg
    self.dtype = dtype
    
    ############################################################################
    # TODO: Initialize weights and biases for the three-layer convolutional    #
    # network. Weights should be initialized from a Gaussian with standard     #
    # deviation equal to weight_scale; biases should be initialized to zero.   #
    # All weights and biases should be stored in the dictionary self.params.   #
    # Store weights and biases for the convolutional layer using the keys 'W1' #
    # and 'b1'; use keys 'W2' and 'b2' for the weights and biases of the       #
    # hidden affine layer, and keys 'W3' and 'b3' for the weights and biases   #
    # of the output affine layer.                                              #
    ############################################################################
    self.params['W1'] = np.random.normal(0, weight_scale, (num_filters, input_dim[0], filter_size, filter_size))
    self.params['b1'] = np.zeros(num_filters)
    W2_rol_size = num_filters * input_dim[1]/2 * input_dim[2]/2
    self.params['W2'] = np.random.normal(0, weight_scale, (W2_rol_size, hidden_dim))
    self.params['b2'] = np.zeros(hidden_dim)
    self.params['W3'] = np.random.normal(0, weight_scale, (hidden_dim, num_classes))
    self.params['b3'] = np.zeros(num_classes)
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################

    for k, v in self.params.iteritems():
      self.params[k] = v.astype(dtype)
     
 
  def loss(self, X, y=None):
    """
    Evaluate loss and gradient for the three-layer convolutional network.
    
    Input / output: Same API as TwoLayerNet in fc_net.py.
    """
    W1, b1 = self.params['W1'], self.params['b1']
    W2, b2 = self.params['W2'], self.params['b2']
    W3, b3 = self.params['W3'], self.params['b3']
    
    # pass conv_param to the forward pass for the convolutional layer
    filter_size = W1.shape[2]
    conv_param = {'stride': 1, 'pad': (filter_size - 1) / 2}

    # pass pool_param to the forward pass for the max-pooling layer
    pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}

    scores = None
    loss, grads = 0, {}
    ############################################################################
    # TODO: Implement the forward pass for the three-layer convolutional net,  #
    # computing the class scores for X and storing them in the scores          #
    # variable.                                                                #
    ############################################################################
    out1, cache1 = conv_relu_pool_forward(X, W1, b1, conv_param, pool_param)
    out2, cache2 = affine_relu_forward(out1, W2, b2)
    out3, cache3 = affine_forward(out2, W3, b3)
    score = out3
    loss, dscore = softmax_loss(out3, y)
    loss += np.sum(0.5*self.reg*np.sum(W**2) for W in [W1, W2, W3])
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################
    
    if y is None:
      return scores
    
    ############################################################################
    # TODO: Implement the backward pass for the three-layer convolutional net, #
    # storing the loss and gradients in the loss and grads variables. Compute  #
    # data loss using softmax, and make sure that grads[k] holds the gradients #
    # for self.params[k]. Don't forget to add L2 regularization!               #
    ############################################################################
    dout3, grads['W3'], grads['b3'] = affine_backward(dscore, cache3)
    dout2, grads['W2'], grads['b2'] = affine_relu_backward(dout3, cache2)
    dout1, grads['W1'], grads['b1'] = conv_relu_pool_backward(dout2, cache1)

    grads['W3'] += self.reg*W3
    grads['W2'] += self.reg*W2
    grads['W1'] += self.reg*W1
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################
    
    return loss, grads

6. Spatial Batch Normalization

We already saw that batch normalization is a very useful technique for training deep fully-connected networks. Batch normalization can also be used for convolutional networks, but we need to tweak it a bit; the modification will be called "spatial batch normalization."

Normally batch-normalization accepts inputs of shape (N, D) and produces outputs of shape (N, D), where we normalize across the minibatch dimension N. For data coming from convolutional layers, batch normalization needs to accept inputs of shape (N, C, H, W) and produce outputs of shape (N, C, H, W) where the N dimension gives the minibatch size and the (H, W) dimensions give the spatial size of the feature map.

If the feature map was produced using convolutions, then we expect the statistics of each feature channel to be relatively consistent both between different images and different locations within the same image. Therefore spatial batch normalization computes a mean and variance for each of the C feature channels by computing statistics over both the minibatch dimension N and the spatial dimensions H and W.

CNN和全連接的NN的BN方法不一樣的地方在于裂垦，Conv layer使用的Spatial BN的思想是按照Channel（RGB通道）來進行歸一化顺囊，保證不同圖和同一圖的不同位置在一個通道的分布上是一致的、歸一化的蕉拢。這部分我沒有自己去寫特碳，直接給出github上martinkersner的代碼：

def spatial_batchnorm_forward(x, gamma, beta, bn_param):

"""

 Computes the forward pass for spatial batch normalization.
 Inputs:
 - x: Input data of shape (N, C, H, W)
 - gamma: Scale parameter, of shape (C,)
 - beta: Shift parameter, of shape (C,)
 - bn_param: Dictionary with the following keys:
 - mode: 'train' or 'test'; required
 - eps: Constant for numeric stability
 - momentum: Constant for running mean / variance. momentum=0 means that old information is discarded completely at every time step, while momentum=1 means that new information is never incorporated. The default of momentum=0.9 should work well in most situations.
 - running_mean: Array of shape (D,) giving running mean of features
 - running_var Array of shape (D,) giving running variance of features

Returns a tuple of:
 - out: Output data, of shape (N, C, H, W)
 - cache: Values needed for the backward pass
 """

  out, cache = None, None

############################################################################## 
# Implement the forward pass for spatial batch normalization.
# HINT: You can implement spatial batch normalization using the vanilla #
# version of batch normalization defined above. Your implementation should #
# be very short; ours is less than five lines. 
##############################################################################

  N, C, H, W = x.shape
  x_reshaped = x.transpose(0,2,3,1).reshape(N*H*W, C)
  out_tmp, cache = batchnorm_forward(x_reshaped, gamma,   beta, bn_param)
  out = out_tmp.reshape(N, H, W, C).transpose(0, 3, 1, 2)

  return out, cache



def spatial_batchnorm_backward(dout, cache):

"""
 Computes the backward pass for spatial batch normalization.

 Inputs:
 - dout: Upstream derivatives, of shape (N, C, H, W)
 - cache: Values from the forward pass
 
 Returns a tuple of:
 - dx: Gradient with respect to inputs, of shape (N, C, H, W)
 - dgamma: Gradient with respect to scale parameter, of shape (C,)
 - dbeta: Gradient with respect to shift parameter, of shape (C,)
 """
  dx, dgamma, dbeta = None, None, None
#############################################################################
# Implement the backward pass for spatial batch normalization. 
## HINT: You can implement spatial batch normalization using the vanilla 
## version of batch normalization defined above. Your implementation should
## be very short; ours is less than five lines. 
##############################################################################

  N, C, H, W = dout.shape
  dout_reshaped = dout.transpose(0,2,3,1).reshape(N*H*W, C)
  dx_tmp, dgamma, dbeta = batchnorm_backward(dout_reshaped, cache)
  dx = dx_tmp.reshape(N, H, W, C).transpose(0, 3, 1, 2)

return dx, dgamma, dbeta

【寫在最后的tip】Tips for training：

For each network architecture that you try, you should tune the learning rate and regularization strength. When doing this there are a couple important things to keep in mind:

If the parameters are working well, you should see improvement within a few hundred iterations
Remember the course-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.

另外對于cnn的梯度反向傳播來說，可以看看更高效的做法晕换，而不是用程序中的for loop：
http://www.cnblogs.com/pinard/p/6494810.html

最后編輯于：2017.12.05 03:27:04

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末午乓，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子闸准，更是在濱河造成了極大的恐慌益愈，老刑警劉巖，帶你破解...
沈念sama閱讀 212,718評論 6贊 492
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件夷家，死亡現(xiàn)場離奇詭異蒸其，居然都是意外死亡，警方通過查閱死者的電腦和手機库快，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 90,683評論 3贊 385
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門摸袁，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人缺谴，你說我怎么就攤上這事但惶。” “怎么了湿蛔？”我有些...
開封第一講書人閱讀 158,207評論 0贊 348
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵膀曾，是天一觀的道長。經(jīng)常有香客問我阳啥，道長添谊，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 56,755評論 1贊 284
?港島之戀（遺憾婚禮）
正文為了忘掉前任察迟，我火速辦了婚禮斩狱，結(jié)果婚禮上耳高，老公的妹妹穿的比我還像新娘。我一直安慰自己所踊，他們只是感情好泌枪，可當我...
茶點故事閱讀 65,862評論 6贊 386
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著秕岛，像睡著了一般碌燕。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上继薛，一...
開封第一講書人閱讀 50,050評論 1贊 291
城市分裂傳說
那天修壕，我揣著相機與錄音，去河邊找鬼遏考。笑死慈鸠，一個胖子當著我的面吹牛，可吹牛的內(nèi)容都是我干的灌具。我是一名探鬼主播猿规，決...
沈念sama閱讀 39,136評論 3贊 410
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼碧囊，長吁一口氣：“原來是場噩夢啊……” “哼嫉父！你這毒婦竟也來了富俄？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 37,882評論 0贊 268
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤截歉，失蹤者是張志新（化名）和其女友劉穎胖腾，沒想到半個月后，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體瘪松，經(jīng)...
沈念sama閱讀 44,330評論 1贊 303
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡咸作，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 36,651評論 2贊 327
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了宵睦。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片记罚。...
茶點故事閱讀 38,789評論 1贊 341
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖壳嚎，靈堂內(nèi)的尸體忽然破棺而出桐智，到底是詐尸還是另有隱情，我是刑警寧澤烟馅，帶...
沈念sama閱讀 34,477評論 4贊 333
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布说庭，位于F島的核電站，受9級特大地震影響郑趁，放射性物質(zhì)發(fā)生泄漏刊驴。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 40,135評論 3贊 317
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望捆憎。院中可真熱鬧舅柜，春花似錦、人聲如沸躲惰。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,864評論 0贊 21
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽础拨。三九已至知举，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間太伊，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 32,099評論 1贊 267
情欲美人皮
我被黑心中介騙來泰國打工逛钻，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留僚焦，地道東北人。一個月前我還...
沈念sama閱讀 46,598評論 2贊 362
代替公主和親
正文我出身青樓曙痘，卻偏偏與公主長得像芳悲，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子边坤，可洞房花燭夜當晚...
茶點故事閱讀 43,697評論 2贊 351