Normalization

BN (Batch Normalization)

1.深度學(xué)習(xí)中的Batch Normalization
2.Batch Normalization導(dǎo)讀
3.keras BatchNormalization層

tf.layers.batch_normalization(
    inputs,
    axis=-1,
    momentum=0.99,
    epsilon=0.001,
    center=True,
    scale=True,
    beta_initializer=tf.zeros_initializer(),
    gamma_initializer=tf.ones_initializer(),
    moving_mean_initializer=tf.zeros_initializer(),
    moving_variance_initializer=tf.ones_initializer(),
    beta_regularizer=None,
    gamma_regularizer=None,
    beta_constraint=None,
    gamma_constraint=None,
    training=False,
    trainable=True,
    name=None,
    reuse=None,
    renorm=False,
    renorm_clipping=None,
    renorm_momentum=0.99,
    fused=None,
    virtual_batch_size=None,
    adjustment=None
)

Batch Normalization layer from http://arxiv.org/abs/1502.03167.
"Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift"
Sergey Ioffe, Christian Szegedy
Arguments:
axis: An int or list of int, the axis or axes that should be
normalized, typically the features axis/axes. For instance, after a
Conv2D layer with data_format="channels_first", set axis=1. If a
list of axes is provided, each axis in axis will be normalized
simultaneously. Default is -1 which takes uses last axis. Note: when
using multi-axis batch norm, the beta, gamma, moving_mean, and
moving_variance variables are the same rank as the input Tensor, with
dimension size 1 in all reduced (non-axis) dimensions).
momentum: Momentum for the moving average.
epsilon: Small float added to variance to avoid dividing by zero.
center: If True, add offset of beta to normalized tensor. If False, beta
is ignored.
scale: If True, multiply by gamma. If False, gamma is
not used. When the next layer is linear (also e.g. nn.relu), this can be
disabled since the scaling can be done by the next layer.
beta_initializer: Initializer for the beta weight.
gamma_initializer: Initializer for the gamma weight.
moving_mean_initializer: Initializer for the moving mean.
moving_variance_initializer: Initializer for the moving variance.
beta_regularizer: Optional regularizer for the beta weight.
gamma_regularizer: Optional regularizer for the gamma weight.
beta_constraint: An optional projection function to be applied to the beta
weight after being updated by an Optimizer (e.g. used to implement
norm constraints or value constraints for layer weights). The function
must take as input the unprojected variable and must return the
projected variable (which must have the same shape). Constraints are
not safe to use when doing asynchronous distributed training.
gamma_constraint: An optional projection function to be applied to the
gamma weight after being updated by an Optimizer.
renorm: Whether to use Batch Renormalization
(https://arxiv.org/abs/1702.03275). This adds extra variables during
training. The inference is the same for either value of this parameter.
renorm_clipping: A dictionary that may map keys 'rmax', 'rmin', 'dmax' to
scalar Tensors used to clip the renorm correction. The correction
(r, d) is used as corrected_value = normalized_value * r + d, with
r clipped to [rmin, rmax], and d to [-dmax, dmax]. Missing rmax, rmin,
dmax are set to inf, 0, inf, respectively.
renorm_momentum: Momentum used to update the moving means and standard
deviations with renorm. Unlike momentum, this affects training
and should be neither too small (which would add noise) nor too large
(which would give stale estimates). Note that momentum is still applied
to get the means and variances for inference.
fused: if None or True, use a faster, fused implementation if possible.
If False, use the system recommended implementation.
trainable: Boolean, if True also add variables to the graph collection
GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
virtual_batch_size: An int. By default, virtual_batch_size is None,
which means batch normalization is performed across the whole batch. When
virtual_batch_size is not None, instead perform "Ghost Batch
Normalization", which creates virtual sub-batches which are each
normalized separately (with shared gamma, beta, and moving statistics).
Must divide the actual batch size during execution.
adjustment: A function taking the Tensor containing the (dynamic) shape of
the input tensor and returning a pair (scale, bias) to apply to the
normalized values (before gamma and beta), only during training. For
example, if axis==-1,
adjustment = lambda shape: ( tf.random_uniform(shape[-1:], 0.93, 1.07), tf.random_uniform(shape[-1:], -0.1, 0.1))
will scale the normalized value by up to 7% up or down, then shift the
result by up to 0.1 (with independent scaling and bias for each feature
but shared across all examples), and finally apply gamma and/or beta. If
None, no adjustment is applied. Cannot be specified if
virtual_batch_size is specified.
name: A string, the name of the layer.

4.tf.layers.batch_normalization
5.CS231n筆記4-Data Preprocessing, Weights Initialization與Batch Normalization
6.Batch Renormalization : Towards Reducing Minibatch Dependence in Batch-Normalized Models

GN (Group Normalization)

Group Normalization

SN(Switchable Normalization)

深度剖析 | 可微分學(xué)習(xí)的自適配歸一化

Self-Normalization

1.Self-Normalizing Neural Networks
2.引爆機(jī)器學(xué)習(xí)圈:「自歸一化神經(jīng)網(wǎng)絡(luò)」提出新型激活函數(shù)SELU
3.自歸一化神經(jīng)網(wǎng)絡(luò)
4.SNNs github
5.Add example to compare RELU with SELU
However, looking at Kaggle challenges that are not related to vision or sequential tasks, gradient boosting, random forests, or support vector machines (SVMs) are winning most of the competitions. Deep Learning is notably absent, and for the few cases where FNNs won, they are shallow. For example, the HIGGS challenge, the Merck Molecular Activity challenge, and the Tox21 Data challenge were all won by FNNs with at most four hidden layers. Surprisingly, it is hard to find success stories with FNNs that have many hidden layers, though they would allow for different levels of abstract representations of the input.
6.ReLU、LReLU吼鳞、PReLU笆制、CReLU、ELU纺蛆、SELU
7.如何評價 Self-Normalizing Neural Networks 這篇論文?

補(bǔ)充

1.神經(jīng)網(wǎng)絡(luò)梯度與歸一化問題總結(jié)+highway network、ResNet的思考
2.ICML 2018 | Petuum提出新型正則化方法:非重疊促進(jìn)型變量選擇

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末规揪,一起剝皮案震驚了整個濱河市桥氏,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌猛铅,老刑警劉巖字支,帶你破解...
    沈念sama閱讀 212,718評論 6 492
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異,居然都是意外死亡堕伪,警方通過查閱死者的電腦和手機(jī)揖庄,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,683評論 3 385
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來欠雌,“玉大人蹄梢,你說我怎么就攤上這事「欢恚” “怎么了禁炒?”我有些...
    開封第一講書人閱讀 158,207評論 0 348
  • 文/不壞的土叔 我叫張陵,是天一觀的道長蛙酪。 經(jīng)常有香客問我齐苛,道長,這世上最難降的妖魔是什么桂塞? 我笑而不...
    開封第一講書人閱讀 56,755評論 1 284
  • 正文 為了忘掉前任凹蜂,我火速辦了婚禮,結(jié)果婚禮上阁危,老公的妹妹穿的比我還像新娘玛痊。我一直安慰自己,他們只是感情好狂打,可當(dāng)我...
    茶點(diǎn)故事閱讀 65,862評論 6 386
  • 文/花漫 我一把揭開白布擂煞。 她就那樣靜靜地躺著,像睡著了一般趴乡。 火紅的嫁衣襯著肌膚如雪对省。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 50,050評論 1 291
  • 那天晾捏,我揣著相機(jī)與錄音蒿涎,去河邊找鬼。 笑死惦辛,一個胖子當(dāng)著我的面吹牛劳秋,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播胖齐,決...
    沈念sama閱讀 39,136評論 3 410
  • 文/蒼蘭香墨 我猛地睜開眼玻淑,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了呀伙?” 一聲冷哼從身側(cè)響起补履,我...
    開封第一講書人閱讀 37,882評論 0 268
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎剿另,沒想到半個月后箫锤,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體帅腌,經(jīng)...
    沈念sama閱讀 44,330評論 1 303
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,651評論 2 327
  • 正文 我和宋清朗相戀三年麻汰,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片戚篙。...
    茶點(diǎn)故事閱讀 38,789評論 1 341
  • 序言:一個原本活蹦亂跳的男人離奇死亡五鲫,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出岔擂,到底是詐尸還是另有隱情位喂,我是刑警寧澤,帶...
    沈念sama閱讀 34,477評論 4 333
  • 正文 年R本政府宣布乱灵,位于F島的核電站塑崖,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏痛倚。R本人自食惡果不足惜规婆,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 40,135評論 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望蝉稳。 院中可真熱鬧抒蚜,春花似錦、人聲如沸耘戚。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,864評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽收津。三九已至饿这,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間撞秋,已是汗流浹背长捧。 一陣腳步聲響...
    開封第一講書人閱讀 32,099評論 1 267
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留部服,地道東北人唆姐。 一個月前我還...
    沈念sama閱讀 46,598評論 2 362
  • 正文 我出身青樓,卻偏偏與公主長得像廓八,于是被迫代替她去往敵國和親奉芦。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 43,697評論 2 351