Will it fail CNNs (on MNIST)?
1涨薪、-"use the raw pixel values between[0,255]"
Correct. Almost all CNN's prefer to normalize pixel value normalized between [-1,1]
2、-"initialize all the CNN weights as 0"
Correct,Network weights should be initialized randomly
3种蘸、-"Use no intercept (i.e., Wx instead of Wx+b) in the fully connect layer"
No. Network with zero intercepts will still work
如果沒(méi)有偏置的話攀芯,我們所有的分割線(Wx=0所代表的超平面就是決策邊界)都是經(jīng)過(guò)原點(diǎn)的
4甲雅、-“The batch size is too small (i.e., one sample per batch)”
No.Small batch size will still work, but make the optimization slower
5履怯、-"The batch size is too big (i.e., use the whole dataset as one batch)"
correct. We will lose the "stochastic" factor by taking whole dataset as one batch, and the optimization will fall into bad local minimum
我們將失去“隨機(jī)”因素,優(yōu)化將陷入糟糕的局部最小值
6拼窥、-"Do not shuffle the data before training"
Usually correct. Random shuffling impress CNN a lot.
最明顯的情況是戏蔑,如果您的數(shù)據(jù)是按照它們的類/目標(biāo)排序的蹋凝,則需要對(duì)數(shù)據(jù)進(jìn)行洗牌。在這里辛臊,您需要重新洗牌仙粱,以確保您的培訓(xùn)/測(cè)試/驗(yàn)證集能夠代表數(shù)據(jù)的總體分布。
Suppose 假設(shè) data is sorted in a specified order. For example a data set which is sorted base on their class. So, if you select data for training, validation, and test without considering this subject, you will select each class for different tasks, and it will fail the process.