歸一化解釋:https://zhuanlan.zhihu.com/p/35005794
- Batch Normalization (BN) 就被添加在每一個(gè)全連接和激勵(lì)函數(shù)之間
- BatchNorm:batch方向做歸一化箍铲,算NHW的均值
- LayerNorm:channel方向做歸一化增热,算CHW的均值
- InstanceNorm:一個(gè)channel內(nèi)做歸一化,算H*W的均值
- GroupNorm:將channel方向分group缕陕,然后每個(gè)group內(nèi)做歸一化味滞,算(C//G)HW的均值
歸一化解決的問題
1. Internal Covariate Shift
??訓(xùn)練深度網(wǎng)絡(luò)的時(shí)候經(jīng)常發(fā)生訓(xùn)練困難的問題樱蛤,因?yàn)椋恳淮螀?shù)迭代更新后剑鞍,上一層網(wǎng)絡(luò)的輸出數(shù)據(jù)經(jīng)過這一層網(wǎng)絡(luò)計(jì)算后昨凡,數(shù)據(jù)的分布會(huì)發(fā)生變化,為下一層網(wǎng)絡(luò)的學(xué)習(xí)帶來困難.
2. covariate shift
??描述的是由于訓(xùn)練數(shù)據(jù)和測試數(shù)據(jù)存在分布的差異性蚁署,給網(wǎng)絡(luò)的泛化性和訓(xùn)練速度帶來了影響.輸入數(shù)據(jù)分布不一致的現(xiàn)象便脊,對數(shù)據(jù)做歸一化當(dāng)然可以加快訓(xùn)練速度,能對數(shù)據(jù)做去相關(guān)性光戈,突出它們之間的分布相對差異就更好了.
批歸一化
- 在小批量(mini-batch)的訓(xùn)練案例上使用一個(gè)神經(jīng)元總結(jié)輸入的分布計(jì)算均值與方差就轧,然后用它們歸一這一神經(jīng)元在每個(gè)訓(xùn)練案例上的總結(jié)輸入。這減少了前饋神經(jīng)網(wǎng)絡(luò)中的訓(xùn)練時(shí)間田度。然而,batch normalization 的效果依賴于 mini-batch 的大小解愤。
層歸一化
1.通過計(jì)算來自單一訓(xùn)練案例中一層神經(jīng)元的所有總結(jié)輸入的均值與方差(用于歸一化的)镇饺,將 batch normalization 調(diào)換為層歸一化(layer normalization)。如同 batch normalization送讲,我們也給每一個(gè)神經(jīng)元自己的適應(yīng)偏差( adaptive bias)與增益奸笤,它們在歸一化之后、非線性(non-linearity)之前被使用哼鬓。不同于 batch normalization监右,層歸一化在訓(xùn)練以及測試時(shí)間上表現(xiàn)出完全同樣的計(jì)算能力.
組歸一化
- pytorch版本
def GroupNorm(x, gamma, beta, G, eps=1e-5):
N, C, H, W = x.shape
x = x.view([N,G,C//G,H,W])
mean = x.sum(2,keepdim=True).sum(3,keepdim=True).sum(4,keepdim=True)/(H*W*C//G)
var = torch.pow((x-mean),2).sum(2,keepdim=True).sum(3,keepdim=True).sum(4,keepdim=True)/(H*W*C//G)
x = (x-mean)/torch.sqrt(var+eps)
x = x.view([N,C,H,W])
return x*gamma+beta
import torch
import torch.nn as nn
class GroupBatchnorm2d(nn.Module):
def __init__(self, c_num, group_num = 16, eps = 1e-10):
super(GroupBatchnorm2d,self).__init__()
self.group_num = group_num
self.gamma = nn.Parameter(torch.ones(c_num, 1, 1))
self.beta = nn.Parameter(torch.zeros(c_num, 1, 1))
self.eps = eps
def forward(self, x):
N, C, H, W = x.size()
x = x.view(N, self.group_num, -1)
mean = x.mean(dim = 2, keepdim = True)
std = x.std(dim = 2, keepdim = True)
x = (x - mean) / (std+self.eps)
x = x.view(N, C, H, W)
return x * self.gamma + self.beta
- tensorflow版本